[GitHub] [lucene-solr] uschindler commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to control multi-terms expansions in a phrase

2019-11-14 Thread GitBox
uschindler commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to 
control multi-terms expansions in a phrase
URL: https://github.com/apache/lucene-solr/pull/889#issuecomment-554251305
 
 
   Hi, the trick is to use a custom rewrite method that rewrites instead to a 
query to some arraylist or whatever. Once collected enough terms stop it. I did 
this for many implementations (I think there is also one in Lucene).
   
   Nevertheless, we can still think about making getTermsEnum() public, but 
then we should clearly define how this method must be called. I'd prefer to add 
some final public convenience wrapper method in the MTQ class to allow external 
usage. Maybe this one should take a LeafReader instance.
   
   Another way is to make the one without Attributes final and public, as this 
one looks like one to be called from the outside.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9036) ExitableDirectoryReader to interrupt DocValues as well

2019-11-14 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974863#comment-16974863
 ] 

Lucene/Solr QA commented on LUCENE-9036:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
26s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
21s{color} | {color:red} core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 21s{color} 
| {color:red} core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} Release audit (RAT) {color} | {color:red}  
0m 21s{color} | {color:red} core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m  1s{color} | {color:green} Release audit (RAT) rat-sources 
passed {color} |
| {color:red}-1{color} | {color:red} Check forbidden APIs {color} | {color:red} 
 0m 21s{color} | {color:red} core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} Validate source patterns {color} | 
{color:red}  0m 21s{color} | {color:red} core in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 12s{color} 
| {color:red} core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m 20s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.cloud.SystemCollectionCompatTest |
|   | solr.handler.component.TermVectorComponentDistributedTest |
|   | solr.cloud.api.collections.ShardSplitTest |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-9036 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985880/LUCENE-9036.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 4931c0989dd |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
| compile | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/230/artifact/out/patch-compile-lucene_core.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/230/artifact/out/patch-compile-lucene_core.txt
 |
| Release audit (RAT) | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/230/artifact/out/patch-compile-lucene_core.txt
 |
| Check forbidden APIs | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/230/artifact/out/patch-compile-lucene_core.txt
 |
| Validate source patterns | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/230/artifact/out/patch-compile-lucene_core.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/230/artifact/out/patch-unit-lucene_core.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/230/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/230/testReport/ |
| modules | C: lucene/core solr/core U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/230/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> ExitableDirectoryReader to interrupt DocValues as well
> --
>
> Key: LUCENE-9036
> URL: https://issues.apache.org/jira/browse/LUCENE-9036
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mikhail Khludnev
>Priority: Major
> Attachments: LUCENE-9036.patch, LUCENE-9036.patch, LUCENE-9036.patch, 
> LUCENE-9036.patch
>
>
> This allow to make AnalyticsComponent and json.facet sensitive to time 
> allowed. 
> Does it make sense? Is it enough to check on DV creation ie per field/segment 
> or it's worth to check every Nth doc? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (LUCENE-9031) UnsupportedOperationException on highlighting Interval Query

2019-11-14 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974755#comment-16974755
 ] 

Lucene/Solr QA commented on LUCENE-9031:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} highlighter in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} queries in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}  3m 23s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-9031 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985855/LUCENE-9031.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 4931c0989dd |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/229/testReport/ |
| modules | C: lucene/highlighter lucene/queries U: lucene |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/229/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> UnsupportedOperationException on highlighting Interval Query
> 
>
> Key: LUCENE-9031
> URL: https://issues.apache.org/jira/browse/LUCENE-9031
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/queries
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.4
>
> Attachments: LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch, 
> LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When UnifiedHighlighter highlights Interval Query it encounters 
> UnsupportedOperationException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git

2019-11-14 Thread Adam Walz (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974728#comment-16974728
 ] 

Adam Walz commented on LUCENE-8987:
---

Commented on the PRs.

As for Apache License headers, I'm thinking of adding yaml front matter to all 
markdown files. The yaml will allow for more elaborate header settings - for 
instance multiline markdown in variables. I was going to use this for the solr 
security page by having variables for CVE, severity, versions affected, 
description, and mitigation. That way in jinja we can target each variable 
separately and format as a table rather than only having access to the markdown 
content.

 

It will look something like this with the yaml front matter in {{```}}
{code:java}
```
title: XML Bomb in Apache Solr versions prior to 5.0
CVE: CVE-2019-12401
severity: Medium
versions_affected: |
1.3.0 to 1.4.1
3.1.0 to 3.6.2
4.0.0 to 4.10.4 
mitigation: |
* Upgrade to Apache Solr 5.0 or later.
* Ensure your network settings are configured so that only trusted traffic
is allowed to post documents to the running Solr instances.
```

Solr versions prior to 5.0.0 are vulnerable to an XML resource
consumption attack (a.k.a. Lol Bomb) via it’s update handler. By leveraging
XML DOCTYPE and ENTITY type elements, the attacker can create a pattern
that will expand when the server parses the XML causing OOMs



{code}
 

Using front matter will also make it possible to include a license in each 
markdown file without affecting rendering.

> Move Lucene web site from svn to git
> 
>
> Key: LUCENE-8987
> URL: https://issues.apache.org/jira/browse/LUCENE-8987
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: lucene-site-repo.png
>
>
> INFRA just enabled [a new way of configuring website 
> build|https://s.apache.org/asfyaml] from a git branch, [see dev list 
> email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E].
>  It allows for automatic builds of both staging and production site, much 
> like the old CMS. We can choose to auto publish the html content of an 
> {{output/}} folder, or to have a bot build the site using 
> [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder.
> The goal of this issue is to explore how this can be done for 
> [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new 
> git repo {{lucene-site}}, copy over the site from svn, see if it can be 
> "Pelicanized" easily and then test staging. Benefits are that more people 
> will be able to edit the web site and we can take PRs from the public (with 
> GitHub preview of pages).
> Non-goals:
>  * Create a new web site or a new graphic design
>  * Change from Markdown to Asciidoc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-site] adamwalz commented on issue #7: Cleanup some files we won't need

2019-11-14 Thread GitBox
adamwalz commented on issue #7: Cleanup some files we won't need
URL: https://github.com/apache/lucene-site/pull/7#issuecomment-554168626
 
 
   Works for me, but same comment from 
https://github.com/apache/lucene-site/pull/8#issuecomment-554167795


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-site] adamwalz commented on issue #8: Simple build script

2019-11-14 Thread GitBox
adamwalz commented on issue #8: Simple build script
URL: https://github.com/apache/lucene-site/pull/8#issuecomment-554168059
 
 
   And it sounds like lucene-site is breaking ground in this effort so it may 
be viewed as a template for others


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-site] adamwalz commented on issue #8: Simple build script

2019-11-14 Thread GitBox
adamwalz commented on issue #8: Simple build script
URL: https://github.com/apache/lucene-site/pull/8#issuecomment-554167795
 
 
   lgtm. The reason I suggested tasks.py instead of a shell script is that 
tasks.py is generated by `pelican-quickstart`. Since more ASF sites are moving 
to pelican it might have been easier for jenkins to standardize on one build 
command instead of now how buildbot uses `pelican content -t theme`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter reopened LUCENE-8920:


TestFstDirectAddressing.testDeDupTails has a 44% failure rate on jenkins boxes 
since it was added by this commit (so far only on Uwe's machine, but on 
multiple OSes, and both master and 8.x)

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974723#comment-16974723
 ] 

Chris M. Hostetter edited comment on LUCENE-8920 at 11/15/19 1:19 AM:
--

TestFstDirectAddressing.testDeDupTails has a 44% failure rate (11/25 runs) on 
jenkins boxes since it was added by this commit (so far only on Uwe's machine, 
but on multiple OSes, and both master and 8.x)


was (Author: hossman):
TestFstDirectAddressing.testDeDupTails has a 44% failure rate on jenkins boxes 
since it was added by this commit (so far only on Uwe's machine, but on 
multiple OSes, and both master and 8.x)

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13909) Everything about CheckBackupStatus is broken

2019-11-14 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974710#comment-16974710
 ] 

Chris M. Hostetter edited comment on SOLR-13909 at 11/15/19 1:01 AM:
-

Ok, well – this was a much deeper hole then i originally intended on going 
down, but I'm attaching a patch that i'm still beasting ... but I think it's 
pretty solid.

For the sake of my sanity, I replaced {{CheckBackupStatus}} with a new 
{{BackupStatusChecker}} so that i could gradually phase out the old API/usage 
one instance at a time.

I started with {{TestReplicationHandlerBackup}} since it was the biggest 
consumer of this API, and quickly determined that the only way to make it 
function sanely was to make some (backwards compatible) additions/tweaks to 
{{SnapShooter}} (below).

But as i feared, fixing these broken "always sleep 1000ms multiple times" loops 
around {{CheckBackupStatus}} exposed other race conditions in the test that 
also needed fixed.

Tweaks to {{SnapShooter}}:
 * the {{"details"}} of a successful backup now include the {{"directoryName"}} 
used (which is automatically generated for the un-named backup situation
 ** This allows {{BackupStatusChecker}} to offer a 
{{waitForDifferentBackupDir}} when dealing with unnamed backups
 * when using the {{numberToKeep}} option, the {{"details"}} of a backup aren't 
populated until after the "old" backups are deleted
 ** this fixed a test race condition trying to confirm that {{numberToKeep}} 
was respected when taking a bakup
 * the {{"details"}} of _deleting_ a named backup (which is aparently an 
un-documented ReplicationHandler {{"command"}} that we have tests for) now 
include the {{"snapshotName"}} of that backup
 ** which let me also replace the similarly broken {{CheckDeleteBackupStatus}} 
class i found hidden inside {{TestReplicationHandlerBackup}}

Improvements to {{TestReplicationHandlerBackup}}:
 * Eliminated the need for a lot of 
{{Files.newDirectoryStream(Paths.get(master.getDataDir()), "snapshot*")}} code 
patterns
 ** This was a happy side effect of making the backup {{"details"}} include the 
{{"directoryName"}}
 ** we can now assert explicitly that directory name returned exists and is a 
"valid" backup, instead of searching for a glob and iterating over a 
DirectoryStream and hoping we find hte right now.
 ** This also fixed a bug in the test where it assumed 
{{Files.newDirectoryStream(...)}} was going to return files in timestamp order 
when there were multiple backups.
 * fix the "deletebackup" test to actually assert that the backup dir existed 
before deleting, and assert that it didn't exist after deleting


The fixes to the other test classes were generally much simpler and more 
straight forward

...with the notable exception of {{TestRestoreCore.testFailedRestore}} which 
had a "loop" where it was expecting a helper method to throw an 
{{AssertionError}} inside of a {{try/catch(AssertionError)}} block, but if it 
didn't then it called {{fail()}} inside of that same try block ... so no matter 
what the helper method did the test was gong to "pass" ... i fixed it to use 
{{expectThrows()}}


was (Author: hossman):
Ok, well – this was a much deeper hole then i originally intended on going 
down, but I'm attaching a patch that i'm still beasting ... but I think it's 
pretty solid.

For the sake of my sanity, I replaced {{CheckBackupStatus}} with a new 
{{BackupStatusChecker}} so that i could gradually phase out the old API/usage 
one instance at a time.

I started with {{TestReplicationHandlerBackup}} since it was the biggest 
consumer of this API, and quickly determined that the only way to make it 
function sanely was to make some (backwards compatible) additions/tweaks to 
{{SnapShooter}} (below).

But as i feared, fixing these broken "always sleep 1000ms multiple times" loops 
around {{CheckBackupStatus}} exposed other race conditions in the test that 
also needed fixed.

Tweaks to {{SnapShooter}}:
 * the {{"details"}} of a successful backup now include the {{"directoryName"}} 
used (which is automatically generated for the un-named backup situation
 ** This allows {{BackupStatusChecker}} to offer a 
{{waitForDifferentBackupDir}} when dealing with unnamed backups
 * when using the {{numberToKeep}} option, the {{"details"}} of a backup aren't 
populated until after the "old" backups are deleted
 ** this fixed a test race condition trying to confirm that {{numberToKeep}} 
was respected when taking a bakup
 * the {{"details"}} of _deleting_ a named backup (which is aparently an 
un-documented ReplicationHandler {{"command"}} that we have tests for) now 
include the {{"snapshotName"}} of that backup
 ** which let me also replace the similarly broken {{CheckDeleteBackupStatus}} 
class i found hidden inside {{TestReplicationHandlerBackup}}

Improvements to {{TestReplicationHandlerBackup}}:
 * Eliminated 

[jira] [Updated] (SOLR-13909) Everything about CheckBackupStatus is broken

2019-11-14 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13909:
--
Attachment: SOLR-13909.patch
Status: Open  (was: Open)

Ok, well – this was a much deeper hole then i originally intended on going 
down, but I'm attaching a patch that i'm still beasting ... but I think it's 
pretty solid.

For the sake of my sanity, I replaced {{CheckBackupStatus}} with a new 
{{BackupStatusChecker}} so that i could gradually phase out the old API/usage 
one instance at a time.

I started with {{TestReplicationHandlerBackup}} since it was the biggest 
consumer of this API, and quickly determined that the only way to make it 
function sanely was to make some (backwards compatible) additions/tweaks to 
{{SnapShooter}} (below).

But as i feared, fixing these broken "always sleep 1000ms multiple times" loops 
around {{CheckBackupStatus}} exposed other race conditions in the test that 
also needed fixed.

Tweaks to {{SnapShooter}}:
 * the {{"details"}} of a successful backup now include the {{"directoryName"}} 
used (which is automatically generated for the un-named backup situation
 ** This allows {{BackupStatusChecker}} to offer a 
{{waitForDifferentBackupDir}} when dealing with unnamed backups
 * when using the {{numberToKeep}} option, the {{"details"}} of a backup aren't 
populated until after the "old" backups are deleted
 ** this fixed a test race condition trying to confirm that {{numberToKeep}} 
was respected when taking a bakup
 * the {{"details"}} of _deleting_ a named backup (which is aparently an 
un-documented ReplicationHandler {{"command"}} that we have tests for) now 
include the {{"snapshotName"}} of that backup
 ** which let me also replace the similarly broken {{CheckDeleteBackupStatus}} 
class i found hidden inside {{TestReplicationHandlerBackup}}

Improvements to {{TestReplicationHandlerBackup}}:
 * Eliminated the need for a lot of 
{{Files.newDirectoryStream(Paths.get(master.getDataDir()), "snapshot*")}} code 
patterns
 ** This was a happy side effect of making the backup {{"details"}} include the 
{{"directoryName"}}
 ** we can now assert explicitly that directory name returned exists and is a 
"valid" backup, instead of searching for a glob and iterating over a 
DirectoryStream and hoping we find hte right now.
 ** This also fixed a bug in the test where it assumed 
{{Files.newDirectoryStream(...)}} was going to return files in timestamp order 
when there were multiple backups.


The fixes to the other test classes were generally much simpler and more 
straight forward

...with the notable exception of {{TestRestoreCore.testFailedRestore}} which 
had a "loop" where it was expecting a helper method to throw an 
{{AssertionError}} inside of a {{try/catch(AssertionError)}} block, but if it 
didn't then it called {{fail()}} inside of that same try block ... so no matter 
what the helper method did the test was gong to "pass" ... i fixed it to use 
{{expectThrows()}}

> Everything about CheckBackupStatus is broken
> 
>
> Key: SOLR-13909
> URL: https://issues.apache.org/jira/browse/SOLR-13909
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-13909.patch
>
>
> While working on SOLR-13872 I tried to take advantage of the existing 
> {{CheckBackupStatus}} helper class and discovered that just about every 
> aspect of this class is broken and needs fixed:
>  * doesn't use SolrClient, pulls out it's URL to do a bare HTTP request
>  * hardcoded assumption of xml - but doesn't parse it just tries to match 
> regexes against it
>  * almost every usage of this class follows the same broken "loop" pattern 
> that garuntees the test will sleep more then it needs to even after 
> {{CheckBackupStatus}} thinks the backup is a success...
> {code:java}
> CheckBackupStatus checkBackupStatus = new CheckBackupStatus(...);
> while (!checkBackupStatus.success) {
>   checkBackupStatus.fetchStatus();
>   Thread.sleep(1000);
> }
> {code}
>  * the 3 arg constructor is broken both in design and in implementation:
>  ** it appears to be useful for checking that a _new_ backup has succeeded 
> after a {{lastBackupTimestamp}} from some previously successful check
>  ** in reality it only ever reports {{success}} if it's status check 
> indicates the most recent backup has the exact {{.equals()}} time stamp as 
> {{lastBackupTimestamp}}
>  ** *AND THESE TIMESTAMPS ONLY HAVE MINUTE PRECISION*
>  ** As far as i can tell, the only the tests using the 3 arg version ever 
> pass is because of the broken loop pattern:
>  *** they ask for the status so quick, it's either 

[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git

2019-11-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974702#comment-16974702
 ] 

Jan Høydahl commented on LUCENE-8987:
-

[~adamwalz] See [https://github.com/apache/lucene-site/pull/7] for some files I 
suggest to remove. And [https://github.com/apache/lucene-site/pull/8] for a 
convenience script for installing pelican, building and serving site.

A general question: Should we have Apache License headers on all our MD files? 
I think so...

> Move Lucene web site from svn to git
> 
>
> Key: LUCENE-8987
> URL: https://issues.apache.org/jira/browse/LUCENE-8987
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: lucene-site-repo.png
>
>
> INFRA just enabled [a new way of configuring website 
> build|https://s.apache.org/asfyaml] from a git branch, [see dev list 
> email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E].
>  It allows for automatic builds of both staging and production site, much 
> like the old CMS. We can choose to auto publish the html content of an 
> {{output/}} folder, or to have a bot build the site using 
> [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder.
> The goal of this issue is to explore how this can be done for 
> [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new 
> git repo {{lucene-site}}, copy over the site from svn, see if it can be 
> "Pelicanized" easily and then test staging. Benefits are that more people 
> will be able to edit the web site and we can take PRs from the public (with 
> GitHub preview of pages).
> Non-goals:
>  * Create a new web site or a new graphic design
>  * Change from Markdown to Asciidoc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-site] janhoy opened a new pull request #8: Simple build script

2019-11-14 Thread GitBox
janhoy opened a new pull request #8: Simple build script
URL: https://github.com/apache/lucene-site/pull/8
 
 
   Script that checks for python, installs pelican and live-builds site


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-site] janhoy opened a new pull request #7: Cleanup some files we won't need

2019-11-14 Thread GitBox
janhoy opened a new pull request #7: Cleanup some files we won't need
URL: https://github.com/apache/lucene-site/pull/7
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git

2019-11-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974681#comment-16974681
 ] 

Jan Høydahl commented on LUCENE-8987:
-

With LUCENE-9015 done we now also have a {{production}} branch (source) and a 
{{asf-site}} branch (generated). See more in 
[README|https://github.com/apache/lucene-site].

> Move Lucene web site from svn to git
> 
>
> Key: LUCENE-8987
> URL: https://issues.apache.org/jira/browse/LUCENE-8987
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: lucene-site-repo.png
>
>
> INFRA just enabled [a new way of configuring website 
> build|https://s.apache.org/asfyaml] from a git branch, [see dev list 
> email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E].
>  It allows for automatic builds of both staging and production site, much 
> like the old CMS. We can choose to auto publish the html content of an 
> {{output/}} folder, or to have a bot build the site using 
> [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder.
> The goal of this issue is to explore how this can be done for 
> [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new 
> git repo {{lucene-site}}, copy over the site from svn, see if it can be 
> "Pelicanized" easily and then test staging. Benefits are that more people 
> will be able to edit the web site and we can take PRs from the public (with 
> GitHub preview of pages).
> Non-goals:
>  * Create a new web site or a new graphic design
>  * Change from Markdown to Asciidoc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974674#comment-16974674
 ] 

Michael Sokolov commented on LUCENE-8920:
-

> In that case the version bump is not strictly needed since the new format is 
> a superset of the old format, through I think we made the right choice of 
> bumping the version in order to give a better error to users who would try to 
> read 8.4 FSTs with an older release.

For example, this happened to us when trying to revert these changes, yet we 
did not regenerate kuromoji and nori "dictionaries," which are FSTs.

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13860) Enable back TestTlogReplica

2019-11-14 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974672#comment-16974672
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-13860:
--

I'm ignoring this test for now

> Enable back TestTlogReplica
> ---
>
> Key: SOLR-13860
> URL: https://issues.apache.org/jira/browse/SOLR-13860
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud, Tests
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{TestTlogReplica}} was disabled in the past due to random failures. This 
> Jira is to fox those failures and enable back the test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13860) Enable back TestTlogReplica

2019-11-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974670#comment-16974670
 ] 

ASF subversion and git services commented on SOLR-13860:


Commit 3a7ea9756cc1158564cb661c69eb0f56550f109a in lucene-solr's branch 
refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3a7ea97 ]

SOLR-13860: Ignore testKillTlogReplica


> Enable back TestTlogReplica
> ---
>
> Key: SOLR-13860
> URL: https://issues.apache.org/jira/browse/SOLR-13860
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud, Tests
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{TestTlogReplica}} was disabled in the past due to random failures. This 
> Jira is to fox those failures and enable back the test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13860) Enable back TestTlogReplica

2019-11-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974669#comment-16974669
 ] 

ASF subversion and git services commented on SOLR-13860:


Commit 4931c0989dd8920c02588c336098089b45a8591a in lucene-solr's branch 
refs/heads/master from Tomas Eduardo Fernandez Lobbe
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4931c09 ]

SOLR-13860: Ignore testKillTlogReplica


> Enable back TestTlogReplica
> ---
>
> Key: SOLR-13860
> URL: https://issues.apache.org/jira/browse/SOLR-13860
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud, Tests
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{TestTlogReplica}} was disabled in the past due to random failures. This 
> Jira is to fox those failures and enable back the test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13662) Package manager CLI

2019-11-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974665#comment-16974665
 ] 

ASF subversion and git services commented on SOLR-13662:


Commit f462fe2794ddaa10ce68ebf1ccbd23f990e154cf in lucene-solr's branch 
refs/heads/branch_8x from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f462fe2 ]

SOLR-13662: Renaming the test jars to .tmp to avoid precommit failures. Adding 
timeout for test failure fix.


> Package manager CLI
> ---
>
> Key: SOLR-13662
> URL: https://issues.apache.org/jira/browse/SOLR-13662
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: plugin-cli.png
>
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> Design details and usage details are here: 
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-9034) Officially publish the new site

2019-11-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/LUCENE-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned LUCENE-9034:
---

Assignee: Jan Høydahl

> Officially publish the new site
> ---
>
> Key: LUCENE-9034
> URL: https://issues.apache.org/jira/browse/LUCENE-9034
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>
> Publishing the web site means creating a publish branch and adding the right 
> magic instructions to {{.asf.yml}} etc. This will then publish the new site 
> and disable old CMS.
> Before we do that we should
>  # Make sure all docs and release tools are updated for new site publishing 
> instructions
>  # Create a PR with latest changes in old CMS site since the export. This 
> will be the changes done during 8.3.0 release and possibly some news entries 
> related to security issues etc.
> After publishing we should ask INFRA to make old site svn read-only (and 
> perhaps do a commit that replaces svn content with a README.txt), so it is 
> obvious for everyone that we have migrated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9034) Officially publish the new site

2019-11-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974664#comment-16974664
 ] 

Jan Høydahl commented on LUCENE-9034:
-

This is necessary as the final step to actually replace the current website 
with the new from {{asf-site}} branch:
{code:sh}
# asf-site branch
git fetch && git checkout asf-site
cat << EOF > .asf.yaml
publish:
  whoami: asf-site
EOF
git add .asf.yaml
git commit -m "Auto publish live public website"
git push origin{code}

> Officially publish the new site
> ---
>
> Key: LUCENE-9034
> URL: https://issues.apache.org/jira/browse/LUCENE-9034
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/website
>Reporter: Jan Høydahl
>Priority: Major
>
> Publishing the web site means creating a publish branch and adding the right 
> magic instructions to {{.asf.yml}} etc. This will then publish the new site 
> and disable old CMS.
> Before we do that we should
>  # Make sure all docs and release tools are updated for new site publishing 
> instructions
>  # Create a PR with latest changes in old CMS site since the export. This 
> will be the changes done during 8.3.0 release and possibly some news entries 
> related to security issues etc.
> After publishing we should ask INFRA to make old site svn read-only (and 
> perhaps do a commit that replaces svn content with a README.txt), so it is 
> obvious for everyone that we have migrated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved LUCENE-9015.
-
Resolution: Fixed

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974638#comment-16974638
 ] 

Jan Høydahl edited comment on LUCENE-9015 at 11/14/19 10:59 PM:


I think we don't need tasks.py, Makefile, publishconf or similar. Probably this 
would work:
{code:java}
git checkout master
# prod branch (src)
git checkout -b production
cat << EOF > .asf.yaml
pelican:
  whoami: production
  target: asf-site
EOF
git add .asf.yaml
git commit -m "Production source branch"
git push origin
{code}
Then we get the simple workflow. Once we have a site we're happy with in 
master, we just merge to production branch and it ends up in the main site, and 
you can continue working in master with unstable stuff. Note that for major 
site rewrites or larger work .asf.yaml also supports separate feature branches 
that can build and publish to separate namespaces on staged.apache.org. Very 
flexible.


was (Author: janhoy):
I think we don't need tasks.py, Makefile, publishconf or similar. Probably this 
would work:
{code:java}
git checkout master
# prod branch (src)
git checkout -b production
cat << EOF > .asf.yaml
pelican:
  whoami: production
  target: asf-site
EOF
git add .asf.yaml
git commit -m "Production source branch"
git push origin
# asf-site branch
git checkout master
git checkout --orphan asf-site
git rm --cached -r .
cat << EOF > .asf.yaml
publish:
  whoami: asf-site
EOF
git add .asf.yaml
git commit -m "Production build branch"
git push origin{code}
Then we get the simple workflow. Once we have a site we're happy with in 
master, we just merge to production branch and it ends up in the main site, and 
you can continue working in master with unstable stuff. Note that for major 
site rewrites or larger work .asf.yaml also supports separate feature branches 
that can build and publish to separate namespaces on staged.apache.org. Very 
flexible.

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974662#comment-16974662
 ] 

Jan Høydahl commented on LUCENE-9015:
-

Ok, I setup half of this. We now have a {{production}} branch that we can 
create a PR against for publishing. Currently it only builds to asf-site branch 
(that branch was actually created automatically by the buildbot). That means 
this Jira is now done!

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13662) Package manager CLI

2019-11-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974661#comment-16974661
 ] 

ASF subversion and git services commented on SOLR-13662:


Commit e59563f18975ebf6c7b6117cc87e9dcc17ce509f in lucene-solr's branch 
refs/heads/master from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e59563f ]

SOLR-13662: Renaming the test jars to .tmp to avoid precommit failures. Adding 
timeout for test failure fix.


> Package manager CLI
> ---
>
> Key: SOLR-13662
> URL: https://issues.apache.org/jira/browse/SOLR-13662
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: plugin-cli.png
>
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> Design details and usage details are here: 
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13662) Package manager CLI

2019-11-14 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974648#comment-16974648
 ] 

Ishan Chattopadhyaya commented on SOLR-13662:
-

Btw, [~noble.paul] mentioned that same happened to him with SOLR-13822, and he 
had to add a timeout to his tests. Since my test uses the same underlying code 
paths, adding that should help/suffice. So, tentatively, I'm planning to 
disable the test (or have a fixed version of the test committed) in another 6-8 
hours. Sorry for the inconvenience.

> Package manager CLI
> ---
>
> Key: SOLR-13662
> URL: https://issues.apache.org/jira/browse/SOLR-13662
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: plugin-cli.png
>
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> Design details and usage details are here: 
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13662) Package manager CLI

2019-11-14 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974647#comment-16974647
 ] 

Ishan Chattopadhyaya commented on SOLR-13662:
-

Yes, Hoss. I am aware of this since the first time it failed. As I replied on 
the build, I am looking into it.
There are two types of failures:
# The package updates are actually not happening. I am unable to reproduce 
locally, so I'll let it fail on Jenkins a few times more to collect more 
information, before fixing/disabling.
# Other issue is that there are stray .jar.sha1 files being left behind. As I 
mentioned as a reply to that build mail, I have no clue how to even look to 
reproducing/fixing it. [~krisden], gave me some clues on how to reproduce it 
locally; will explore them.

> Package manager CLI
> ---
>
> Key: SOLR-13662
> URL: https://issues.apache.org/jira/browse/SOLR-13662
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: plugin-cli.png
>
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> Design details and usage details are here: 
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974639#comment-16974639
 ] 

Jan Høydahl commented on LUCENE-9015:
-

Infra also suggested that they could add a "publish" button somewhere, or even 
support a special keyword in commits message that would trigger a publish to 
prod. But that can be for another day. Two source branches should be familiar 
for developers, so that's a plus!

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974638#comment-16974638
 ] 

Jan Høydahl commented on LUCENE-9015:
-

I think we don't need tasks.py, Makefile, publishconf or similar. Probably this 
would work:
{code:java}
git checkout master
# prod branch (src)
git checkout -b production
cat << EOF > .asf.yaml
pelican:
  whoami: production
  target: asf-site
EOF
git add .asf.yaml
git commit -m "Production source branch"
git push origin
# asf-site branch
git checkout master
git checkout --orphan asf-site
git rm --cached -r .
cat << EOF > .asf.yaml
publish:
  whoami: asf-site
EOF
git add .asf.yaml
git commit -m "Production build branch"
git push origin{code}
Then we get the simple workflow. Once we have a site we're happy with in 
master, we just merge to production branch and it ends up in the main site, and 
you can continue working in master with unstable stuff. Note that for major 
site rewrites or larger work .asf.yaml also supports separate feature branches 
that can build and publish to separate namespaces on staged.apache.org. Very 
flexible.

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974637#comment-16974637
 ] 

Adrien Grand commented on LUCENE-8920:
--

bq. I don't recall when that is validated (on each test or only on release?)

It is tested on each test run, the test name is TestBackwardsCompatibility in 
lucene/backward-codecs.

In that case the version bump is not strictly needed since the new format is a 
superset of the old format, through I think we made the right choice of bumping 
the version in order to give a better error to users who would try to read 8.4 
FSTs with an older release.

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #927: LUCENE-8997: : Add type of triangle info to ShapeField encoding

2019-11-14 Thread GitBox
jpountz commented on a change in pull request #927: LUCENE-8997: : Add type of 
triangle info to ShapeField encoding
URL: https://github.com/apache/lucene-solr/pull/927#discussion_r346564941
 
 

 ##
 File path: lucene/sandbox/src/java/org/apache/lucene/document/ShapeField.java
 ##
 @@ -101,21 +143,95 @@ protected void setTriangleValue(int aX, int aY, boolean 
abFromShape, int bX, int
   private static final int MAXY_MINX_MINY_X_Y_MAXX = 6;
   private static final int MINY_MINX_Y_MAXX_MAXY_X = 7;
 
 Review comment:
   What would be the pros/cons vs. adding new triangle types, e.g. as below?
   
   ```
   private static final int POINT_Y_X = 8;
   private static final int LINE_MAXY_MINX_MINY_MAXX = 9;
   private static final int LINE_MINY_MINX_MAXY_MAXX = 10;
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-10489) StatsReloadRaceTest.testParallelReloadAndStats failures

2019-11-14 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-10489.
-
Resolution: Cannot Reproduce

This failure doesn't reproduce anymore. It was likely related to problems with 
gauge registration / unregistration races, eventually fixed in SOLR-13677.

> StatsReloadRaceTest.testParallelReloadAndStats failures
> ---
>
> Key: SOLR-10489
> URL: https://issues.apache.org/jira/browse/SOLR-10489
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 7.0
>
> Attachments: 19806.txt, SOLR-10489.comeback.log
>
>
> This test has been failing a lot after the changes in SOLR-9959, for unclear 
> reasons. The failure is always in the same place:
> {code}
> java.lang.AssertionError: Key SEARCHER.searcher.indexVersion not found in 
> registry solr.core.collection1
>   at 
> __randomizedtesting.SeedInfo.seed([28B54D77FD0E3DF1:E72B284E72FF55AE]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.solr.handler.admin.StatsReloadRaceTest.requestMetrics(StatsReloadRaceTest.java:132)
>   at 
> org.apache.solr.handler.admin.StatsReloadRaceTest.testParallelReloadAndStats(StatsReloadRaceTest.java:70)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13703) LFUCache should support maxRamMB limit

2019-11-14 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-13703.
-
Resolution: Won't Fix

This cache implementation has been deprecated and removed in 9.0.

> LFUCache should support maxRamMB limit
> --
>
> Key: SOLR-13703
> URL: https://issues.apache.org/jira/browse/SOLR-13703
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> All other cache implementations in Solr support this limit, which is 
> important from the operational point of view for limiting the overall 
> resource consumption.
> ConcurrentLFUCache already tracks memory usage so the implementation should 
> be easy, and analogical to ConcurrentLRUCache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13817) Deprecate and remove legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-13817.
-
Resolution: Fixed

> Deprecate and remove legacy SolrCache implementations
> -
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0), 8.4
>
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13817) Deprecate and remove legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-13817:

Fix Version/s: master (9.0)

> Deprecate and remove legacy SolrCache implementations
> -
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0), 8.4
>
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13817) Deprecate and remove legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-13817:

Summary: Deprecate and remove legacy SolrCache implementations  (was: 
Deprecate legacy SolrCache implementations)

> Deprecate and remove legacy SolrCache implementations
> -
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.4
>
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974581#comment-16974581
 ] 

ASF subversion and git services commented on SOLR-13817:


Commit 6e655a99cec2766aa7739d06e586e0e90fd44f10 in lucene-solr's branch 
refs/heads/branch_8x from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6e655a9 ]

SOLR-13817: Deprecate legacy SolrCache implementations.


> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.4
>
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974580#comment-16974580
 ] 

ASF subversion and git services commented on SOLR-13817:


Commit b4fe911cc8e4bddff18226bc8c98a2deb735a8fc in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b4fe911 ]

SOLR-13817: Remove legacy SolrCache implementations.


> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.4
>
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974569#comment-16974569
 ] 

Michael Sokolov edited comment on LUCENE-8920 at 11/14/19 8:05 PM:
---

I ran the `luceneutil` test just to be sure: yes, 8.x can read indexes created 
by 8.3


was (Author: sokolov):
I'll run the `luceneutil` test just to be sure

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to control multi-terms expansions in a phrase

2019-11-14 Thread GitBox
bruno-roustant commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to 
control multi-terms expansions in a phrase
URL: https://github.com/apache/lucene-solr/pull/889#issuecomment-554055990
 
 
   > Expanding terms of a MTQ should be done by passing a RewriteMethod and 
then rewriting it
   
   I see. The main point of this PhraseWildcardQuery is to not rewrite to the 
numerous expansions. This is one of the primary goals in its design. It seems 
to be a blocker here.
   
   Out of curiosity, is there a strong requirement to have the MTQ#getTermsEnum 
protected and not public?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974569#comment-16974569
 ] 

Michael Sokolov commented on LUCENE-8920:
-

I'll run the `luceneutil` test just to be sure

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974564#comment-16974564
 ] 

Jan Høydahl commented on LUCENE-9015:
-

We have to ask infra about recommended workflow. Perhaps they have a “publish” 
button on the buildbot server? Or can make some asf.json support for it? 
Imagine if we could commit a git hash to the publish part of .asf.yaml that 
would then build and publish that exact version to prod?

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9048) Tutorial and docs section missing from the new website

2019-11-14 Thread Adam Walz (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974543#comment-16974543
 ] 

Adam Walz commented on LUCENE-9048:
---

[~janhoy] Yes I will fix this section over the weekend. Thanks for the review

> Tutorial and docs section missing from the new website
> --
>
> Key: LUCENE-9048
> URL: https://issues.apache.org/jira/browse/LUCENE-9048
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/website
>Reporter: Jan Høydahl
>Priority: Major
>
> See [https://lucene.staged.apache.org/solr/resources.html#tutorials]
> The Tutorials and Docuemtation sub sections are missing from this page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Adam Walz (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974541#comment-16974541
 ] 

Adam Walz commented on LUCENE-9015:
---

[~janhoy] For the python script to merge commits from asf-staging to asf-site I 
would suggest trying to fit it into the tasks.py or publishconf.py which is 
provided by Pelican.

 

Would you like to work on that or would you like me to this weekend?

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9036) ExitableDirectoryReader to interrupt DocValues as well

2019-11-14 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-9036:
-
Attachment: LUCENE-9036.patch
Status: Patch Available  (was: Patch Available)

> ExitableDirectoryReader to interrupt DocValues as well
> --
>
> Key: LUCENE-9036
> URL: https://issues.apache.org/jira/browse/LUCENE-9036
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mikhail Khludnev
>Priority: Major
> Attachments: LUCENE-9036.patch, LUCENE-9036.patch, LUCENE-9036.patch, 
> LUCENE-9036.patch
>
>
> This allow to make AnalyticsComponent and json.facet sensitive to time 
> allowed. 
> Does it make sense? Is it enough to check on DV creation ie per field/segment 
> or it's worth to check every Nth doc? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git

2019-11-14 Thread Adam Walz (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974536#comment-16974536
 ] 

Adam Walz commented on LUCENE-8987:
---

Thanks [~danmuzi], there are some known issues. I still need to go through each 
page with a fine-toothed comb to ensure parity with production. This process 
will be easier now that the site is on staging rather than building locally 
only. I'll go through these mistakes this weekend. 

 

I've been trying to port changes in from the svn site, but haven't ported 
anything in the last week which is why the slack channel is unchanged. I'll fix 
that.

> Move Lucene web site from svn to git
> 
>
> Key: LUCENE-8987
> URL: https://issues.apache.org/jira/browse/LUCENE-8987
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: lucene-site-repo.png
>
>
> INFRA just enabled [a new way of configuring website 
> build|https://s.apache.org/asfyaml] from a git branch, [see dev list 
> email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E].
>  It allows for automatic builds of both staging and production site, much 
> like the old CMS. We can choose to auto publish the html content of an 
> {{output/}} folder, or to have a bot build the site using 
> [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder.
> The goal of this issue is to explore how this can be done for 
> [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new 
> git repo {{lucene-site}}, copy over the site from svn, see if it can be 
> "Pelicanized" easily and then test staging. Benefits are that more people 
> will be able to edit the web site and we can take PRs from the public (with 
> GitHub preview of pages).
> Non-goals:
>  * Create a new web site or a new graphic design
>  * Change from Markdown to Asciidoc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-11-14 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-9018:
-
Fix Version/s: 8.4
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks for contributing!

Note: I changed the reference in the factory from Version.LATEST to 
Version.8_4_0 since that is the specific version introducing this toggle.  I 
know those specific versions are marked deprecated, which is confusing and 
perhaps dissuaded you.

> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.4
>
> Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-11-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974528#comment-16974528
 ] 

ASF subversion and git services commented on LUCENE-9018:
-

Commit e5f2b2380b6e93d48df5f1733113c6b6c0bc090c in lucene-solr's branch 
refs/heads/branch_8x from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e5f2b23 ]

LUCENE-9018: ConcatenateGraphFilter now has a configurable separator.

(cherry picked from commit e466d622c8161038d4e0730e2925474a0a05d596)


> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler edited a comment on issue #889: LUCENE-8983: Add PhraseWildcardQuery to control multi-terms expansions in a phrase

2019-11-14 Thread GitBox
uschindler edited a comment on issue #889: LUCENE-8983: Add PhraseWildcardQuery 
to control multi-terms expansions in a phrase
URL: https://github.com/apache/lucene-solr/pull/889#issuecomment-554021084
 
 
   The current way how this is done (create the MultiTermQuery termsenum) is 
violating the API. The method MTQ#getTermsEnum is protected, so it should never 
ever called from the outside. Java just allows this from the same package, but 
it's just incorrect. protected methods should only be called from the class 
itsself and its subclasses.
   
   Expanding terms of a MTQ should be done by passing a RewriteMethod and then 
rewriting it (this still looks like a hack, but it's correct way). Some MTQ 
queryies may do some adjustments on rewrite.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to control multi-terms expansions in a phrase

2019-11-14 Thread GitBox
uschindler commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to 
control multi-terms expansions in a phrase
URL: https://github.com/apache/lucene-solr/pull/889#issuecomment-554021084
 
 
   The current way how this is done (create the MultiTermQuery termsenum) is 
violating the API. The method MTQ#getTermsEnum is protected, so it should never 
ever called from the outside. Java just allows this from the same package, but 
it's just incorrect. protected methods should only be called from the class 
itsself and its subclasses.
   
   Expanding terms of a MTQ should be done by passing a RewriteMethod and then 
rewriting it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13930) Running TestKoreanTokenizer with Ant fails in gradle_8 build

2019-11-14 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974516#comment-16974516
 ] 

Erick Erickson commented on SOLR-13930:
---

See my comment just above. It's the _ant_ build that fails on the GW branch.

> Running TestKoreanTokenizer with Ant fails  in gradle_8 build
> -
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-11-14 Thread Andy Vuong (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974515#comment-16974515
 ] 

Andy Vuong commented on SOLR-13101:
---

We still need to work on adding some documentation to the ref-guide on how to 
configure/use the feature from an end-user prospective. I suppose a doc would 
will be useful covering public interfaces, additions to ZK, and the overall 
design for solr developers as well.

 

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8983) PhraseWildcardQuery - new query to control and optimize wildcard expansions in phrase

2019-11-14 Thread Ken LaPorte (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974514#comment-16974514
 ] 

Ken LaPorte commented on LUCENE-8983:
-

Hi [~bruno.roustant]. I don't yet. The team we're working with is reluctant to 
make modifications to the software at this point as they have released to their 
beta clients. At present, we've shifted to testing this internally in the hopes 
of making progress there. 

> PhraseWildcardQuery - new query to control and optimize wildcard expansions 
> in phrase
> -
>
> Key: LUCENE-8983
> URL: https://issues.apache.org/jira/browse/LUCENE-8983
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Bruno Roustant
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A generalized version of PhraseQuery, built with one or more MultiTermQuery 
> that provides term expansions for multi-terms (one of the expanded terms must 
> match).
> Its main advantage is to control the total number of expansions across all 
> MultiTermQuery and across all segments.
>  This query is similar to MultiPhraseQuery, but it handles, controls and 
> optimizes the multi-term expansions.
>  
>  This query is equivalent to building an ordered SpanNearQuery with a list of 
> SpanTermQuery and SpanMultiTermQueryWrapper.
>  But it optimizes the multi-term expansions and the segment accesses.
>  It first resolves the single-terms to early stop if some does not match. 
> Then it expands each multi-term sequentially, stopping immediately if one 
> does not match. It detects the segments that do not match to skip them for 
> the next expansions. This often avoid expanding the other multi-terms on some 
> or even all segments. And finally it controls the total number of expansions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git

2019-11-14 Thread Namgyu Kim (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974512#comment-16974512
 ] 

Namgyu Kim commented on LUCENE-8987:


Awesome work! [~janhoy]
 I found there are some simple mistakes :D

1) Resources links in [https://lucene.staged.apache.org/core/] is wrong. (right 
side of the page)
 [https://lucene.staged.apache.org/discussion.html] => 
[https://lucene.staged.apache.org/core/discussion.html]
 [https://lucene.staged.apache.org/developer.html] => 
[https://lucene.staged.apache.org/core/developer.html]
 [https://lucene.staged.apache.org/features.html] => 
[https://lucene.staged.apache.org/core/features.html]
 But [https://lucene.staged.apache.org/core/features.html] is not found.
 [https://lucene.staged.apache.org/downloads.html] => 
[https://lucene.staged.apache.org/core/downloads.html]

2) In mailing list, there is an unchanged content.
 As you know, our Slack page is #lucene-dev now.
 It was changed a week ago and I changed the web page an hour ago.
 [https://lucene.apache.org/core/discussion.html#slack]
 [https://lucene.apache.org/solr/community.html#slack]
 Channel name #lucene-solr -> #lucene-dev

> Move Lucene web site from svn to git
> 
>
> Key: LUCENE-8987
> URL: https://issues.apache.org/jira/browse/LUCENE-8987
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: lucene-site-repo.png
>
>
> INFRA just enabled [a new way of configuring website 
> build|https://s.apache.org/asfyaml] from a git branch, [see dev list 
> email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E].
>  It allows for automatic builds of both staging and production site, much 
> like the old CMS. We can choose to auto publish the html content of an 
> {{output/}} folder, or to have a bot build the site using 
> [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder.
> The goal of this issue is to explore how this can be done for 
> [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new 
> git repo {{lucene-site}}, copy over the site from svn, see if it can be 
> "Pelicanized" easily and then test staging. Benefits are that more people 
> will be able to edit the web site and we can take PRs from the public (with 
> GitHub preview of pages).
> Non-goals:
>  * Create a new web site or a new graphic design
>  * Change from Markdown to Asciidoc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13933) Cluster mode Stress test suite

2019-11-14 Thread Ishan Chattopadhyaya (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-13933:

Description: 
We need a stress test harness based on 10s or 100s of nodes, 1000s of 
collection API operations, overseer operations etc. This suite should run 
nightly and help with:
# Uncover stability problems
# Benchmarking (timings, resource metrics etc.) on collection operations
# Indexing/querying performance
# Validate the accuracy of potential improvements

References:
SOLR-10317
https://github.com/lucidworks/solr-scale-tk
https://github.com/shalinmangar/solr-perf-tools
Lucene benchmarks

  was:
We need a stress test harness based on 10s or 100s of nodes, 1000s of 
collection API operations, overseer operations etc. This suite should run 
nightly and help with:
# Uncover stability problems
# Benchmarking (timings, resource metrics etc.) on collection operations
# Indexing/querying performance

References:
SOLR-10317
https://github.com/lucidworks/solr-scale-tk
https://github.com/shalinmangar/solr-perf-tools
Lucene benchmarks


> Cluster mode Stress test suite 
> ---
>
> Key: SOLR-13933
> URL: https://issues.apache.org/jira/browse/SOLR-13933
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> We need a stress test harness based on 10s or 100s of nodes, 1000s of 
> collection API operations, overseer operations etc. This suite should run 
> nightly and help with:
> # Uncover stability problems
> # Benchmarking (timings, resource metrics etc.) on collection operations
> # Indexing/querying performance
> # Validate the accuracy of potential improvements
> References:
> SOLR-10317
> https://github.com/lucidworks/solr-scale-tk
> https://github.com/shalinmangar/solr-perf-tools
> Lucene benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13933) Cluster mode Stress test suite

2019-11-14 Thread Ishan Chattopadhyaya (Jira)
Ishan Chattopadhyaya created SOLR-13933:
---

 Summary: Cluster mode Stress test suite 
 Key: SOLR-13933
 URL: https://issues.apache.org/jira/browse/SOLR-13933
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Ishan Chattopadhyaya
Assignee: Ishan Chattopadhyaya


We need a stress test harness based on 10s or 100s of nodes, 1000s of 
collection API operations, overseer operations etc. This suite should run 
nightly and help with:
# Uncover stability problems
# Benchmarking (timings, resource metrics etc.) on collection operations
# Indexing/querying performance

References:
SOLR-10317
https://github.com/lucidworks/solr-scale-tk
https://github.com/shalinmangar/solr-perf-tools
Lucene benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox
janhoy commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346436605
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/packagemanager/RepositoryManager.java
 ##
 @@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.packagemanager;
+
+import static org.apache.solr.packagemanager.PackageUtils.getMapper;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.nio.ByteBuffer;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOUtils;
+import org.apache.lucene.util.Version;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.client.solrj.request.V2Request;
+import org.apache.solr.client.solrj.request.beans.Package;
+import org.apache.solr.client.solrj.response.V2Response;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.cloud.SolrZkClient;
+import org.apache.solr.core.BlobRepository;
+import org.apache.solr.packagemanager.SolrPackage.Artifact;
+import org.apache.solr.packagemanager.SolrPackage.SolrPackageRelease;
+import org.apache.solr.pkg.PackageAPI;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+/**
+ * Handles most of the management of repositories and packages present in 
external repositories.
+ */
+public class RepositoryManager {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  final private PackageManager packageManager;
+
+  public static final String systemVersion = Version.LATEST.toString();
+
+  final HttpSolrClient solrClient;
+
+  public RepositoryManager(HttpSolrClient solrClient, PackageManager 
packageManager) {
+this.packageManager = packageManager;
+this.solrClient = solrClient;
+  }
+
+  public List getPackages() {
+List list = new ArrayList<>(getPackagesMap().values());
+Collections.sort(list);
+return list;
+  }
+
+  /**
+   * Get a map of package name to {@link SolrPackage} objects
+   */
+  public Map getPackagesMap() {
+Map packagesMap = new HashMap<>();
+for (PackageRepository repository: getRepositories()) {
+  packagesMap.putAll(repository.getPackages());
+}
+
+return packagesMap;
+  }
+
+  /**
+   * List of added repositories
+   */
+  public List getRepositories() {
+// TODO: Instead of fetching again and again, we should look for caching 
this
+PackageRepository items[];
+try {
+  items = 
getMapper().readValue(getRepositoriesJson(packageManager.zkClient), 
DefaultPackageRepository[].class);
+} catch (IOException | KeeperException | InterruptedException e) {
+  throw new SolrException(ErrorCode.SERVER_ERROR, e);
+}
+List repositories = Arrays.asList(items);
+
+for (PackageRepository updateRepository: repositories) {
+  updateRepository.refresh();
+}
+
+return repositories;
+  }
+
+  /**
+   * Add a repository to Solr
+   */
+  public void addRepository(String name, String uri) throws KeeperException, 
InterruptedException, MalformedURLException, IOException {
+String existingRepositoriesJson = 
getRepositoriesJson(packageManager.zkClient);
+log.info(existingRepositoriesJson);
+
+List repos = getMapper().readValue(existingRepositoriesJson, List.class);
+repos.add(new DefaultPackageRepository(name, uri));
+if (packageManager.zkClient.exists("/repositories.json", true) == false) {
+  packageManager.zkClient.create("/repositories.json", 
getMapper().writeValueAsString(repos).getBytes("UTF-8"), CreateMode.PERSISTENT, 

[jira] [Assigned] (SOLR-10317) Solr Nightly Benchmarks

2019-11-14 Thread Ishan Chattopadhyaya (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya reassigned SOLR-10317:
---

Assignee: Ishan Chattopadhyaya

> Solr Nightly Benchmarks
> ---
>
> Key: SOLR-10317
> URL: https://issues.apache.org/jira/browse/SOLR-10317
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks-FINAL-PROPOSAL.pdf, 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks.docx, SOLR-10317.patch, 
> SOLR-10317.patch, Screenshot from 2017-07-30 20-30-05.png, 
> changes-lucene-20160907.json, changes-solr-20160907.json, managed-schema, 
> solrconfig.xml
>
>
> Currently hosted at: http://212.47.242.214/MergedViewCloud.html
> 
> Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, https://home.apache.org/~mikemccand/lucenebench/.
> Preferably, we need:
> # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
> # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
> # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
> There is some prior work / discussion:
> # https://github.com/shalinmangar/solr-perf-tools (Shalin)
> # https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md 
> (Ishan/Vivek)
> # SOLR-2646 & SOLR-9863 (Mark Miller)
> # https://home.apache.org/~mikemccand/lucenebench/ (Mike McCandless)
> # https://github.com/lucidworks/solr-scale-tk (Tim Potter)
> There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
> Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [~markrmil...@gmail.com] would help here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] chatman commented on issue #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox
chatman commented on issue #994: SOLR-13662: Package Manager (CLI)
URL: https://github.com/apache/lucene-solr/pull/994#issuecomment-553978870
 
 
   Sounds good. If we get GPG support in JDK or some decent library, it will be 
great.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] chatman commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox
chatman commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346429995
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/packagemanager/RepositoryManager.java
 ##
 @@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.packagemanager;
+
+import static org.apache.solr.packagemanager.PackageUtils.getMapper;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.nio.ByteBuffer;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOUtils;
+import org.apache.lucene.util.Version;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.client.solrj.request.V2Request;
+import org.apache.solr.client.solrj.request.beans.Package;
+import org.apache.solr.client.solrj.response.V2Response;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.cloud.SolrZkClient;
+import org.apache.solr.core.BlobRepository;
+import org.apache.solr.packagemanager.SolrPackage.Artifact;
+import org.apache.solr.packagemanager.SolrPackage.SolrPackageRelease;
+import org.apache.solr.pkg.PackageAPI;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+/**
+ * Handles most of the management of repositories and packages present in 
external repositories.
+ */
+public class RepositoryManager {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  final private PackageManager packageManager;
+
+  public static final String systemVersion = Version.LATEST.toString();
+
+  final HttpSolrClient solrClient;
+
+  public RepositoryManager(HttpSolrClient solrClient, PackageManager 
packageManager) {
+this.packageManager = packageManager;
+this.solrClient = solrClient;
+  }
+
+  public List getPackages() {
+List list = new ArrayList<>(getPackagesMap().values());
+Collections.sort(list);
+return list;
+  }
+
+  /**
+   * Get a map of package name to {@link SolrPackage} objects
+   */
+  public Map getPackagesMap() {
+Map packagesMap = new HashMap<>();
+for (PackageRepository repository: getRepositories()) {
+  packagesMap.putAll(repository.getPackages());
+}
+
+return packagesMap;
+  }
+
+  /**
+   * List of added repositories
+   */
+  public List getRepositories() {
+// TODO: Instead of fetching again and again, we should look for caching 
this
+PackageRepository items[];
+try {
+  items = 
getMapper().readValue(getRepositoriesJson(packageManager.zkClient), 
DefaultPackageRepository[].class);
+} catch (IOException | KeeperException | InterruptedException e) {
+  throw new SolrException(ErrorCode.SERVER_ERROR, e);
+}
+List repositories = Arrays.asList(items);
+
+for (PackageRepository updateRepository: repositories) {
+  updateRepository.refresh();
+}
+
+return repositories;
+  }
+
+  /**
+   * Add a repository to Solr
+   */
+  public void addRepository(String name, String uri) throws KeeperException, 
InterruptedException, MalformedURLException, IOException {
+String existingRepositoriesJson = 
getRepositoriesJson(packageManager.zkClient);
+log.info(existingRepositoriesJson);
+
+List repos = getMapper().readValue(existingRepositoriesJson, List.class);
+repos.add(new DefaultPackageRepository(name, uri));
+if (packageManager.zkClient.exists("/repositories.json", true) == false) {
+  packageManager.zkClient.create("/repositories.json", 
getMapper().writeValueAsString(repos).getBytes("UTF-8"), CreateMode.PERSISTENT, 

[GitHub] [lucene-solr] chatman commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox
chatman commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346430128
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/util/PackageTool.java
 ##
 @@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.util;
+
+import java.lang.invoke.MethodHandles;
+import java.util.Map;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.OptionBuilder;
+import org.apache.http.impl.client.CloseableHttpClient;
+import org.apache.logging.log4j.Level;
+import org.apache.logging.log4j.core.config.Configurator;
+import org.apache.lucene.util.SuppressForbidden;
+import org.apache.solr.client.solrj.impl.HttpClientUtil;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.packagemanager.PackageManager;
+import org.apache.solr.packagemanager.PackageUtils;
+import org.apache.solr.packagemanager.RepositoryManager;
+import org.apache.solr.packagemanager.SolrPackageInstance;
+import org.apache.solr.util.SolrCLI.StatusTool;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+@SuppressForbidden(reason = "Need to use System.out.println() instead of 
log4j/slf4j for cleaner output")
+public class PackageTool extends SolrCLI.ToolBase {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  @SuppressForbidden(reason = "Need to turn off logging, and SLF4J doesn't 
seem to provide for a way.")
+  public PackageTool() {
+// Need a logging free, clean output going through to the user.
+Configurator.setRootLevel(Level.OFF);
+  }
+
+  @Override
+  public String getName() {
+return "package";
+  }
+
+  public static String solrUrl = null;
+  public static String solrBaseUrl = null;
+  public PackageManager packageManager;
+  public RepositoryManager repositoryManager;
+
+  @Override
+  protected void runImpl(CommandLine cli) throws Exception {
+try {
+  solrUrl = 
cli.getOptionValues("solrUrl")[cli.getOptionValues("solrUrl").length-1];
+  solrBaseUrl = solrUrl.replaceAll("\\/solr$", ""); // strip out ending 
"/solr"
+  log.info("Solr url: "+solrUrl+", solr base url: "+solrBaseUrl);
+  String zkHost = getZkHost(cli);
+
+  log.info("ZK: "+zkHost);
+  String cmd = cli.getArgList().size() == 0? "help": cli.getArgs()[0];
+
+  try (HttpSolrClient solrClient = new 
HttpSolrClient.Builder(solrBaseUrl).build()) {
+if (cmd != null) {
+  packageManager = new PackageManager(solrClient, solrBaseUrl, 
zkHost); 
+  try {
+repositoryManager = new RepositoryManager(solrClient, 
packageManager);
+
+switch (cmd) {
+  case "add-repo":
+repositoryManager.addRepository(cli.getArgs()[1], 
cli.getArgs()[2]);
+break;
+  case "list-installed":
+packageManager.listInstalled();
+break;
+  case "list-available":
+repositoryManager.listAvailable();
+break;
+  case "list-deployed":
+if (cli.hasOption('c')) {
+  String collection = cli.getArgs()[1];
+  Map packages = 
packageManager.getPackagesDeployed(collection);
+  PackageUtils.printGreen("Packages deployed on " + collection 
+ ":");
+  for (String packageName: packages.keySet()) {
+PackageUtils.printGreen("\t" + packages.get(packageName)); 

+  }
+} else {
+  String packageName = cli.getArgs()[1];
+  Map deployedCollections = 
packageManager.getDeployedCollections(packageName);
+  PackageUtils.printGreen("Collections on which package " + 
packageName + " was deployed:");
+  for (String collection: deployedCollections.keySet()) {
+PackageUtils.printGreen("\t" + collection + 
"("+packageName+":"+deployedCollections.get(collection)+")");
+  }
+}
+break;
+   

[GitHub] [lucene-solr] chatman commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox
chatman commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346430220
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/util/PackageTool.java
 ##
 @@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.util;
+
+import java.lang.invoke.MethodHandles;
+import java.util.Map;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.OptionBuilder;
+import org.apache.http.impl.client.CloseableHttpClient;
+import org.apache.logging.log4j.Level;
+import org.apache.logging.log4j.core.config.Configurator;
+import org.apache.lucene.util.SuppressForbidden;
+import org.apache.solr.client.solrj.impl.HttpClientUtil;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.packagemanager.PackageManager;
+import org.apache.solr.packagemanager.PackageUtils;
+import org.apache.solr.packagemanager.RepositoryManager;
+import org.apache.solr.packagemanager.SolrPackageInstance;
+import org.apache.solr.util.SolrCLI.StatusTool;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+@SuppressForbidden(reason = "Need to use System.out.println() instead of 
log4j/slf4j for cleaner output")
+public class PackageTool extends SolrCLI.ToolBase {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  @SuppressForbidden(reason = "Need to turn off logging, and SLF4J doesn't 
seem to provide for a way.")
+  public PackageTool() {
+// Need a logging free, clean output going through to the user.
+Configurator.setRootLevel(Level.OFF);
+  }
+
+  @Override
+  public String getName() {
+return "package";
+  }
+
+  public static String solrUrl = null;
+  public static String solrBaseUrl = null;
+  public PackageManager packageManager;
+  public RepositoryManager repositoryManager;
+
+  @Override
+  protected void runImpl(CommandLine cli) throws Exception {
+try {
+  solrUrl = 
cli.getOptionValues("solrUrl")[cli.getOptionValues("solrUrl").length-1];
+  solrBaseUrl = solrUrl.replaceAll("\\/solr$", ""); // strip out ending 
"/solr"
+  log.info("Solr url: "+solrUrl+", solr base url: "+solrBaseUrl);
+  String zkHost = getZkHost(cli);
+
+  log.info("ZK: "+zkHost);
+  String cmd = cli.getArgList().size() == 0? "help": cli.getArgs()[0];
+
+  try (HttpSolrClient solrClient = new 
HttpSolrClient.Builder(solrBaseUrl).build()) {
+if (cmd != null) {
+  packageManager = new PackageManager(solrClient, solrBaseUrl, 
zkHost); 
+  try {
+repositoryManager = new RepositoryManager(solrClient, 
packageManager);
+
+switch (cmd) {
+  case "add-repo":
+repositoryManager.addRepository(cli.getArgs()[1], 
cli.getArgs()[2]);
+break;
+  case "list-installed":
+packageManager.listInstalled();
+break;
+  case "list-available":
+repositoryManager.listAvailable();
+break;
+  case "list-deployed":
+if (cli.hasOption('c')) {
+  String collection = cli.getArgs()[1];
+  Map packages = 
packageManager.getPackagesDeployed(collection);
+  PackageUtils.printGreen("Packages deployed on " + collection 
+ ":");
+  for (String packageName: packages.keySet()) {
+PackageUtils.printGreen("\t" + packages.get(packageName)); 

+  }
+} else {
+  String packageName = cli.getArgs()[1];
+  Map deployedCollections = 
packageManager.getDeployedCollections(packageName);
+  PackageUtils.printGreen("Collections on which package " + 
packageName + " was deployed:");
+  for (String collection: deployedCollections.keySet()) {
+PackageUtils.printGreen("\t" + collection + 
"("+packageName+":"+deployedCollections.get(collection)+")");
+  }
+}
+break;
+   

[GitHub] [lucene-solr] chatman commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox
chatman commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346429399
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/packagemanager/RepositoryManager.java
 ##
 @@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.packagemanager;
+
+import static org.apache.solr.packagemanager.PackageUtils.getMapper;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.nio.ByteBuffer;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOUtils;
+import org.apache.lucene.util.Version;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.client.solrj.request.V2Request;
+import org.apache.solr.client.solrj.request.beans.Package;
+import org.apache.solr.client.solrj.response.V2Response;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.cloud.SolrZkClient;
+import org.apache.solr.core.BlobRepository;
+import org.apache.solr.packagemanager.SolrPackage.Artifact;
+import org.apache.solr.packagemanager.SolrPackage.SolrPackageRelease;
+import org.apache.solr.pkg.PackageAPI;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+/**
+ * Handles most of the management of repositories and packages present in 
external repositories.
+ */
+public class RepositoryManager {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  final private PackageManager packageManager;
+
+  public static final String systemVersion = Version.LATEST.toString();
+
+  final HttpSolrClient solrClient;
+
+  public RepositoryManager(HttpSolrClient solrClient, PackageManager 
packageManager) {
+this.packageManager = packageManager;
+this.solrClient = solrClient;
+  }
+
+  public List getPackages() {
+List list = new ArrayList<>(getPackagesMap().values());
+Collections.sort(list);
+return list;
+  }
+
+  /**
+   * Get a map of package name to {@link SolrPackage} objects
+   */
+  public Map getPackagesMap() {
+Map packagesMap = new HashMap<>();
+for (PackageRepository repository: getRepositories()) {
+  packagesMap.putAll(repository.getPackages());
+}
+
+return packagesMap;
+  }
+
+  /**
+   * List of added repositories
+   */
+  public List getRepositories() {
+// TODO: Instead of fetching again and again, we should look for caching 
this
+PackageRepository items[];
+try {
+  items = 
getMapper().readValue(getRepositoriesJson(packageManager.zkClient), 
DefaultPackageRepository[].class);
+} catch (IOException | KeeperException | InterruptedException e) {
+  throw new SolrException(ErrorCode.SERVER_ERROR, e);
+}
+List repositories = Arrays.asList(items);
+
+for (PackageRepository updateRepository: repositories) {
+  updateRepository.refresh();
+}
+
+return repositories;
+  }
+
+  /**
+   * Add a repository to Solr
+   */
+  public void addRepository(String name, String uri) throws KeeperException, 
InterruptedException, MalformedURLException, IOException {
+String existingRepositoriesJson = 
getRepositoriesJson(packageManager.zkClient);
+log.info(existingRepositoriesJson);
+
+List repos = getMapper().readValue(existingRepositoriesJson, List.class);
+repos.add(new DefaultPackageRepository(name, uri));
+if (packageManager.zkClient.exists("/repositories.json", true) == false) {
+  packageManager.zkClient.create("/repositories.json", 
getMapper().writeValueAsString(repos).getBytes("UTF-8"), CreateMode.PERSISTENT, 

[GitHub] [lucene-solr] janhoy commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox
janhoy commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346427550
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/packagemanager/RepositoryManager.java
 ##
 @@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.packagemanager;
+
+import static org.apache.solr.packagemanager.PackageUtils.getMapper;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.nio.ByteBuffer;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOUtils;
+import org.apache.lucene.util.Version;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.client.solrj.request.V2Request;
+import org.apache.solr.client.solrj.request.beans.Package;
+import org.apache.solr.client.solrj.response.V2Response;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.cloud.SolrZkClient;
+import org.apache.solr.core.BlobRepository;
+import org.apache.solr.packagemanager.SolrPackage.Artifact;
+import org.apache.solr.packagemanager.SolrPackage.SolrPackageRelease;
+import org.apache.solr.pkg.PackageAPI;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+/**
+ * Handles most of the management of repositories and packages present in 
external repositories.
+ */
+public class RepositoryManager {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  final private PackageManager packageManager;
+
+  public static final String systemVersion = Version.LATEST.toString();
+
+  final HttpSolrClient solrClient;
+
+  public RepositoryManager(HttpSolrClient solrClient, PackageManager 
packageManager) {
+this.packageManager = packageManager;
+this.solrClient = solrClient;
+  }
+
+  public List getPackages() {
+List list = new ArrayList<>(getPackagesMap().values());
+Collections.sort(list);
+return list;
+  }
+
+  /**
+   * Get a map of package name to {@link SolrPackage} objects
+   */
+  public Map getPackagesMap() {
+Map packagesMap = new HashMap<>();
+for (PackageRepository repository: getRepositories()) {
+  packagesMap.putAll(repository.getPackages());
+}
+
+return packagesMap;
+  }
+
+  /**
+   * List of added repositories
+   */
+  public List getRepositories() {
+// TODO: Instead of fetching again and again, we should look for caching 
this
+PackageRepository items[];
+try {
+  items = 
getMapper().readValue(getRepositoriesJson(packageManager.zkClient), 
DefaultPackageRepository[].class);
+} catch (IOException | KeeperException | InterruptedException e) {
+  throw new SolrException(ErrorCode.SERVER_ERROR, e);
+}
+List repositories = Arrays.asList(items);
+
+for (PackageRepository updateRepository: repositories) {
+  updateRepository.refresh();
+}
+
+return repositories;
+  }
+
+  /**
+   * Add a repository to Solr
+   */
+  public void addRepository(String name, String uri) throws KeeperException, 
InterruptedException, MalformedURLException, IOException {
+String existingRepositoriesJson = 
getRepositoriesJson(packageManager.zkClient);
+log.info(existingRepositoriesJson);
+
+List repos = getMapper().readValue(existingRepositoriesJson, List.class);
+repos.add(new DefaultPackageRepository(name, uri));
+if (packageManager.zkClient.exists("/repositories.json", true) == false) {
+  packageManager.zkClient.create("/repositories.json", 
getMapper().writeValueAsString(repos).getBytes("UTF-8"), CreateMode.PERSISTENT, 

[jira] [Commented] (LUCENE-8983) PhraseWildcardQuery - new query to control and optimize wildcard expansions in phrase

2019-11-14 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974420#comment-16974420
 ] 

Bruno Roustant commented on LUCENE-8983:


[~klaporte] did you try this PhraseWildcardQuery? Do you have some feedback 
about it?

We will probably move it to lucene/sandbox.

> PhraseWildcardQuery - new query to control and optimize wildcard expansions 
> in phrase
> -
>
> Key: LUCENE-8983
> URL: https://issues.apache.org/jira/browse/LUCENE-8983
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Bruno Roustant
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A generalized version of PhraseQuery, built with one or more MultiTermQuery 
> that provides term expansions for multi-terms (one of the expanded terms must 
> match).
> Its main advantage is to control the total number of expansions across all 
> MultiTermQuery and across all segments.
>  This query is similar to MultiPhraseQuery, but it handles, controls and 
> optimizes the multi-term expansions.
>  
>  This query is equivalent to building an ordered SpanNearQuery with a list of 
> SpanTermQuery and SpanMultiTermQueryWrapper.
>  But it optimizes the multi-term expansions and the segment accesses.
>  It first resolves the single-terms to early stop if some does not match. 
> Then it expands each multi-term sequentially, stopping immediately if one 
> does not match. It detects the segments that do not match to skip them for 
> the next expansions. This often avoid expanding the other multi-terms on some 
> or even all segments. And finally it controls the total number of expansions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to control multi-terms expansions in a phrase

2019-11-14 Thread GitBox
bruno-roustant commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to 
control multi-terms expansions in a phrase
URL: https://github.com/apache/lucene-solr/pull/889#issuecomment-553972639
 
 
   Sandbox is fine for me yes.
   I'll push a commit soon to fix the precommit here and I'll move it to 
sandbox.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on issue #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox
janhoy commented on issue #994: SOLR-13662: Package Manager (CLI)
URL: https://github.com/apache/lucene-solr/pull/994#issuecomment-553970009
 
 
   >I just didn't feel like tackling the bouncy castle dependency at the moment
   Java 11 has stong crypto included ootb so should be easier. We can add that 
in master and not backport to 8.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974410#comment-16974410
 ] 

Michael Sokolov commented on LUCENE-8920:
-

I had tested with the previous version of this patch, and yes I also believe 
this preserves the same back-compat since the old arc encoding is read as 
before, but there is no automated testing to verify. It would be wise to run 
some manual spot-checking. We could eg build an "old" index with luceneutil and 
then run its tests with that index after upping the code. Or any test that runs 
on an existing index should do - is there a more convenient one? 

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9049) Remove FST cachedRootArcs now redundant with direct-addressing

2019-11-14 Thread Bruno Roustant (Jira)
Bruno Roustant created LUCENE-9049:
--

 Summary: Remove FST cachedRootArcs now redundant with 
direct-addressing
 Key: LUCENE-9049
 URL: https://issues.apache.org/jira/browse/LUCENE-9049
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Bruno Roustant


With LUCENE-8920 FST most often encodes top level nodes with direct-addressing 
(instead of array for binary search). This probably made the cachedRootArcs 
redundant. So they should be removed, and this will reduce the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974400#comment-16974400
 ] 

Bruno Roustant commented on LUCENE-8920:


{quote}I want to confirm we have back-compat handled. Do we?
{quote}
I'm pretty sure we are back-compatible. We introduce a new node type based on a 
new value of the node flags. The new code should read previous FST, and should 
write new FSTs with new direct-addressing nodes. That said I'm interested to 
know when it is validated automatically too.

 

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974386#comment-16974386
 ] 

David Smiley commented on LUCENE-8920:
--

I want to confirm we have back-compat handled.  Do we?  A very quick look at 
the code shows we bumped the FST version and I see the FST's constructor 
accepts the previous version.  But will _it actually work_ -- will this Lucene 
8.4 code read FSTs written in previous indexes correctly?  I know we have some 
back-compat indices but I don't recall when that is validated (on each test or 
only on release?)

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] pgerber closed pull request #374: [SOLR-12334] Improve detection of recreated lockfiles

2019-11-14 Thread GitBox
pgerber closed pull request #374: [SOLR-12334] Improve detection of recreated 
lockfiles
URL: https://github.com/apache/lucene-solr/pull/374
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] pgerber commented on issue #374: [SOLR-12334] Improve detection of recreated lockfiles

2019-11-14 Thread GitBox
pgerber commented on issue #374: [SOLR-12334] Improve detection of recreated 
lockfiles
URL: https://github.com/apache/lucene-solr/pull/374#issuecomment-553917907
 
 
   Closing, I don't have any intention to still do this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13930) Running TestKoreanTokenizer with Ant fails in gradle_8 build

2019-11-14 Thread Pinkesh Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974300#comment-16974300
 ] 

Pinkesh Sharma commented on SOLR-13930:
---

Hey Erick, I tried building this with gradle, and seems like the tests and the 
build is passing.



I was running the build here:

./gradlew lucene:lucene-analyzers:lucene-analyzers-nori:test

> Running TestKoreanTokenizer with Ant fails  in gradle_8 build
> -
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13662) Package manager CLI

2019-11-14 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974298#comment-16974298
 ] 

Ishan Chattopadhyaya commented on SOLR-13662:
-

I'll make the ref guide changes in another PR soon.

> Package manager CLI
> ---
>
> Key: SOLR-13662
> URL: https://issues.apache.org/jira/browse/SOLR-13662
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: plugin-cli.png
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Design details and usage details are here: 
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974283#comment-16974283
 ] 

Andrzej Bialecki commented on SOLR-13817:
-

Patch for branch_8x to add @deprecation tags and switch the default config 
(when {{class=...}} attribute is missing) to {{CaffeineCache}}.

> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-13817:

Attachment: SOLR-13817-8x.patch

> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9031) UnsupportedOperationException on highlighting Interval Query

2019-11-14 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-9031:
-
Attachment: LUCENE-9031.patch
Status: Patch Available  (was: Patch Available)

Starting from scratch, limiting by simple term intervals only
https://github.com/apache/lucene-solr/pull/1011

> UnsupportedOperationException on highlighting Interval Query
> 
>
> Key: LUCENE-9031
> URL: https://issues.apache.org/jira/browse/LUCENE-9031
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/queries
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.4
>
> Attachments: LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch, 
> LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When UnifiedHighlighter highlights Interval Query it encounters 
> UnsupportedOperationException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mkhludnev opened a new pull request #1011: LUCENE-9031: Just highlight term intervals and its' combinations.

2019-11-14 Thread GitBox
mkhludnev opened a new pull request #1011: LUCENE-9031: Just highlight term 
intervals and its' combinations.
URL: https://github.com/apache/lucene-solr/pull/1011
 
 
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I am authorized to contribute this code to the ASF and have removed 
any code I do not have a license to distribute.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13662) Package manager CLI

2019-11-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974272#comment-16974272
 ] 

ASF subversion and git services commented on SOLR-13662:


Commit 6edbda74291fa9fabb5e6cdc1141e799b738f5ef in lucene-solr's branch 
refs/heads/branch_8x from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6edbda7 ]

SOLR-13662: Package manager (CLI)


> Package manager CLI
> ---
>
> Key: SOLR-13662
> URL: https://issues.apache.org/jira/browse/SOLR-13662
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: plugin-cli.png
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Design details and usage details are here: 
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to control multi-terms expansions in a phrase

2019-11-14 Thread GitBox
jpountz commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to 
control multi-terms expansions in a phrase
URL: https://github.com/apache/lucene-solr/pull/889#issuecomment-553893391
 
 
   This is a bit too esoteric for lucene/core in my opinion, would it work for 
you if we had it in lucene/sandbox?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974256#comment-16974256
 ] 

Andrzej Bialecki commented on SOLR-13817:
-

Patch relative to master. It removes all traces of {{LRUCache, LFUCache, 
FastLRUCache}} from sources, configs and documentation and replaces all cache 
configs with {{CaffeineCache}}.

Tests are still passing, which is nice ;)

> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding

2019-11-14 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974254#comment-16974254
 ] 

Ignacio Vera commented on LUCENE-8997:
--

I see your point, I revert that change.

> Add type of triangle info to ShapeField encoding
> 
>
> Key: LUCENE-8997
> URL: https://issues.apache.org/jira/browse/LUCENE-8997
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-13817:

Attachment: SOLR-13817-master.patch

> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13930) Running TestKoreanTokenizer with Ant fails in gradle_8 build

2019-11-14 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-13930:
--
Summary: Running TestKoreanTokenizer with Ant fails  in gradle_8 build  
(was: Fix failing TestKoreanTokenizer test in Gradle build)

> Running TestKoreanTokenizer with Ant fails  in gradle_8 build
> -
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13930) Fix failing TestKoreanTokenizer test in Gradle build

2019-11-14 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974251#comment-16974251
 ] 

Erick Erickson commented on SOLR-13930:
---

Dear Lord, sometime you'd think I'd learn to put in complete details and save 
others wasting time when they try to help. Sorry about that.

The _ant_ build fails, not the Gradle test:

ant -Dtestcase=TestKoreanTokenizer test

Oddly, TestJapaneseToknizer succeeds when run under Ant.

And thanks to all who are looking into these things. I'm trying to record 
things as I find them and so descriptions may be fragmentary I'm afraid.

 

 

> Fix failing TestKoreanTokenizer test in Gradle build
> 
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13930) Fix failing TestKoreanTokenizer test in Gradle build

2019-11-14 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974200#comment-16974200
 ] 

Erick Erickson edited comment on SOLR-13930 at 11/14/19 1:28 PM:
-

Sorry, I caused you extra work, this is in the Ant test in the Gradle_8 branch, 
not the regular Ant build on master, there it works fine.

I changed to the title to make this more plain.

Thanks for looking and again sorry for the ambiguity


was (Author: erickerickson):
Sorry, I caused you extra work, this is in the Gradle build, not the regular 
Ant build, it works fine in the regular Ant build.

I changed to the title to make this more plain.

Thanks for looking and again sorry for the ambiguity

> Fix failing TestKoreanTokenizer test in Gradle build
> 
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974250#comment-16974250
 ] 

Adrien Grand commented on LUCENE-8920:
--

Thanks for checking [~sokolov]!

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8997) Add type of triangle info to ShapeField encoding

2019-11-14 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974215#comment-16974215
 ] 

Ignacio Vera edited comment on LUCENE-8997 at 11/14/19 1:19 PM:


I would like to raise this issue again as I make a small improvement. I realise 
that for points I do not need to add the point information for data dimensions, 
therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that 
only contain points it means they will compress very well.

I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of 
the index size of 30%!

 
{code}
||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader 
heap (MB)||

          ||Dev||Base||Diff ||Dev  ||Base  ||diff   
||Dev||Base||Diff||Dev||Base||Diff ||

|shapes|244.7s|250.7s|-2%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.14| 0%|
{code}


was (Author: ivera):
I would like to raise this issue again as I make a small improvement. I realise 
that for points I do not need to add the point information for data dimensions, 
therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that 
only contain points it means they will compress very well.

I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of 
the index size of 30%!

 
{code}
||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader 
heap (MB)||

          ||Dev||Base||Diff ||Dev  ||Base  ||diff   
||Dev||Base||Diff||Dev||Base||Diff ||

|shapes|260.8s|264.2s|-1%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.78|-36%|
{code}

> Add type of triangle info to ShapeField encoding
> 
>
> Key: LUCENE-8997
> URL: https://issues.apache.org/jira/browse/LUCENE-8997
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding

2019-11-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974241#comment-16974241
 ] 

Adrien Grand commented on LUCENE-8997:
--

I guess it could still work if we indexed this dimension, but I don't think 
this is the right trade-off.

> Add type of triangle info to ShapeField encoding
> 
>
> Key: LUCENE-8997
> URL: https://issues.apache.org/jira/browse/LUCENE-8997
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding

2019-11-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974239#comment-16974239
 ] 

Adrien Grand commented on LUCENE-8997:
--

I'm unsure about keeping dimensions empty: it works well if your index has only 
lines or only points since all points will have a value of 0 for certain 
dimensions. But if the index mixes triangles and points, then this could 
actually hurt?

> Add type of triangle info to ShapeField encoding
> 
>
> Key: LUCENE-8997
> URL: https://issues.apache.org/jira/browse/LUCENE-8997
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13930) Fix failing TestKoreanTokenizer test in Gradle build

2019-11-14 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974238#comment-16974238
 ] 

Michael Sokolov commented on SOLR-13930:


I was able to run

./gradlew lucene:lucene-analyzers:lucene-analyzers-nori:test

successfully. What command/branch did you see the failure with, 
[~erickerickson]?

> Fix failing TestKoreanTokenizer test in Gradle build
> 
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] chatman closed pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox
chatman closed pull request #994: SOLR-13662: Package Manager (CLI)
URL: https://github.com/apache/lucene-solr/pull/994
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] chatman commented on issue #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox
chatman commented on issue #994: SOLR-13662: Package Manager (CLI)
URL: https://github.com/apache/lucene-solr/pull/994#issuecomment-553874860
 
 
   Merged, thanks. 
https://issues.apache.org/jira/browse/SOLR-13662?focusedCommentId=16974218=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16974218
   
   Thanks for all your reviews!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13662) Package manager CLI

2019-11-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974218#comment-16974218
 ] 

ASF subversion and git services commented on SOLR-13662:


Commit d9f41f8a5a31e7dd8f4ccee729d479ce07175c1a in lucene-solr's branch 
refs/heads/master from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d9f41f8 ]

SOLR-13662: Package manager (CLI)


> Package manager CLI
> ---
>
> Key: SOLR-13662
> URL: https://issues.apache.org/jira/browse/SOLR-13662
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: plugin-cli.png
>
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> Design details and usage details are here: 
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding

2019-11-14 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974215#comment-16974215
 ] 

Ignacio Vera commented on LUCENE-8997:
--

I would like to raise this issue again as I make a small improvement. I realise 
that for points I do not need to add the point information for data dimensions, 
therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that 
only contain points it means they will compress very well.

I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of 
the index size of 30%!

 
{code}
||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader 
heap (MB)||

          ||Dev||Base||Diff ||Dev  ||Base  ||diff   
||Dev||Base||Diff||Dev||Base||Diff ||

|shapes|260.8s|264.2s|-1%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.78|-36%|
{code}

> Add type of triangle info to ShapeField encoding
> 
>
> Key: LUCENE-8997
> URL: https://issues.apache.org/jira/browse/LUCENE-8997
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13930) Fix failing TestKoreanTokenizer test in Gradle build

2019-11-14 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974203#comment-16974203
 ] 

Michael Sokolov commented on SOLR-13930:


I'll just note that {{TestJapaneseTokenizerTest}} does pretty much exactly the 
same thing -- yet it passes?

> Fix failing TestKoreanTokenizer test in Gradle build
> 
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13923) Test target (task?) should fail when no tests run in Gradle build

2019-11-14 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-13923:
--
Summary: Test target (task?) should fail when no tests run in Gradle build  
(was: Test target (task?) should fail when no tests run)

> Test target (task?) should fail when no tests run in Gradle build
> -
>
> Key: SOLR-13923
> URL: https://issues.apache.org/jira/browse/SOLR-13923
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Build
>Reporter: Michael Sokolov
>Priority: Minor
>
> With the ant build if you try to test a nonexistent test case or method 
> ({{-Dtestcase=NoSuchThing}}, the build will fail; this is pretty helpful if 
> you make a lot of typos or forget the names of things. According to [~dweiss] 
> we can get this behavior in gradle by listening to the test results and 
> failing if no tests ran.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13929) Reconcile parallel licenses and licenses_gradle trees in Gradle build

2019-11-14 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-13929:
--
Summary: Reconcile parallel licenses and licenses_gradle trees in Gradle 
build  (was: Reconcile parallel licenses and licenses_gradle trees)

> Reconcile parallel licenses and licenses_gradle trees in Gradle build
> -
>
> Key: SOLR-13929
> URL: https://issues.apache.org/jira/browse/SOLR-13929
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Priority: Major
>
> I had a hard time making Gradle and Ant play nice together when they shared 
> the same license directory. Temporarily there are two, license and 
> license_gradle, both in the lucene and solr trees. When we remove Ant, we 
> need to reconcile this, probably by removing the two "license" directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >