[jira] [Updated] (SOLR-8306) Enhance ExpandComponent to allow expand.hits=0

2020-03-12 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-8306:
---
Status: Patch Available  (was: Open)

> Enhance ExpandComponent to allow expand.hits=0
> --
>
> Key: SOLR-8306
> URL: https://issues.apache.org/jira/browse/SOLR-8306
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 5.3.1
>Reporter: Marshall Sanders
>Priority: Minor
>  Labels: expand
> Fix For: 5.5
>
> Attachments: SOLR-8306.patch, SOLR-8306.patch, SOLR-8306.patch, 
> SOLR-8306_branch_5x@1715230.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This enhancement allows the ExpandComponent to allow expand.hits=0 for those 
> who don't want an expanded document returned and only want the numFound from 
> the expand section.
> This is useful for "See 54 more like this" use cases, but without the 
> performance hit of gathering an entire expanded document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8306) Enhance ExpandComponent to allow expand.hits=0

2020-03-12 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058442#comment-17058442
 ] 

Munendra S N commented on SOLR-8306:


 [^SOLR-8306.patch] 
Thanks [~adhenderson] for the PR. I have attached the patch generated using 
your PR as I'm not  sure if pre-commit build supported for PR. Finally, will 
merge the PR so that you can  have the attribution

Few minor changes
* {{Changes.txt}} should go to 8.6 instead of 8.5 as release branch is cut and 
I  think this fits in optimization than improvements based  on recent email 
thread about  categorization of issues
* When expand.rows=0 scores won't be computed. So, {{maxScore}} would never be 
available even if score is requested, this should be fine but we might need to 
add this solr documentation 
https://lucene.apache.org/solr/guide/8_4/collapse-and-expand-results.html#expand-component

> Enhance ExpandComponent to allow expand.hits=0
> --
>
> Key: SOLR-8306
> URL: https://issues.apache.org/jira/browse/SOLR-8306
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 5.3.1
>Reporter: Marshall Sanders
>Priority: Minor
>  Labels: expand
> Fix For: 5.5
>
> Attachments: SOLR-8306.patch, SOLR-8306.patch, SOLR-8306.patch, 
> SOLR-8306_branch_5x@1715230.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This enhancement allows the ExpandComponent to allow expand.hits=0 for those 
> who don't want an expanded document returned and only want the numFound from 
> the expand section.
> This is useful for "See 54 more like this" use cases, but without the 
> performance hit of gathering an entire expanded document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-8306) Enhance ExpandComponent to allow expand.hits=0

2020-03-12 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-8306:
---
Attachment: SOLR-8306.patch

> Enhance ExpandComponent to allow expand.hits=0
> --
>
> Key: SOLR-8306
> URL: https://issues.apache.org/jira/browse/SOLR-8306
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 5.3.1
>Reporter: Marshall Sanders
>Priority: Minor
>  Labels: expand
> Fix For: 5.5
>
> Attachments: SOLR-8306.patch, SOLR-8306.patch, SOLR-8306.patch, 
> SOLR-8306_branch_5x@1715230.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This enhancement allows the ExpandComponent to allow expand.hits=0 for those 
> who don't want an expanded document returned and only want the numFound from 
> the expand section.
> This is useful for "See 54 more like this" use cases, but without the 
> performance hit of gathering an entire expanded document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1346: LUCENE-9276: Use same code-path for updateDocuments and updateDocument

2020-03-12 Thread GitBox
dnhatn commented on a change in pull request #1346: LUCENE-9276: Use same 
code-path for updateDocuments and updateDocument
URL: https://github.com/apache/lucene-solr/pull/1346#discussion_r392011697
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java
 ##
 @@ -474,49 +474,11 @@ long updateDocuments(final Iterable
 return seqNo;
   }
 
+
   long updateDocument(final Iterable doc, final 
Analyzer analyzer,
 
 Review comment:
   Can we also remove this method and delegate `updateDocument` to 
`updateDocuments` in IndexWriter instead?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1346: LUCENE-9276: Use same code-path for updateDocuments and updateDocument

2020-03-12 Thread GitBox
dnhatn commented on a change in pull request #1346: LUCENE-9276: Use same 
code-path for updateDocuments and updateDocument
URL: https://github.com/apache/lucene-solr/pull/1346#discussion_r392011485
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
 ##
 @@ -346,7 +285,7 @@ public long updateDocuments(Iterable deleteNode) {
+  private long finishDocument(DocumentsWriterDeleteQueue.Node deleteNode, 
int docCount) {
 
 Review comment:
   Should we call this finishDocument**S** ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1298: SOLR-14289 Skip ZkChroot check when not necessary

2020-03-12 Thread GitBox
dsmiley commented on a change in pull request #1298: SOLR-14289 Skip ZkChroot 
check when not necessary
URL: https://github.com/apache/lucene-solr/pull/1298#discussion_r391985435
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java
 ##
 @@ -285,7 +285,7 @@ public static NodeConfig loadNodeConfig(Path solrHome, 
Properties nodeProperties
   if (zkClient.exists("/solr.xml", true)) {
 log.info("solr.xml found in ZooKeeper. Loading...");
 byte[] data = zkClient.getData("/solr.xml", null, null, true);
-return SolrXmlConfig.fromInputStream(loader, new 
ByteArrayInputStream(data));
+return SolrXmlConfig.fromInputStream(loader, new 
ByteArrayInputStream(data), "zookeeper");
 
 Review comment:
   Then I much prefer a boolean as it'd be much clearer -- "isInZooKeeper" or 
some-such.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1298: SOLR-14289 Skip ZkChroot check when not necessary

2020-03-12 Thread GitBox
madrob commented on a change in pull request #1298: SOLR-14289 Skip ZkChroot 
check when not necessary
URL: https://github.com/apache/lucene-solr/pull/1298#discussion_r391898475
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java
 ##
 @@ -285,7 +285,7 @@ public static NodeConfig loadNodeConfig(Path solrHome, 
Properties nodeProperties
   if (zkClient.exists("/solr.xml", true)) {
 log.info("solr.xml found in ZooKeeper. Loading...");
 byte[] data = zkClient.getData("/solr.xml", null, null, true);
-return SolrXmlConfig.fromInputStream(loader, new 
ByteArrayInputStream(data));
+return SolrXmlConfig.fromInputStream(loader, new 
ByteArrayInputStream(data), "zookeeper");
 
 Review comment:
   Yea, a boolean would be sufficient. I was thinking about if we need to have 
other sources in the future, but we can change this to a string/enum later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10336) NPE during queryCache warming

2020-03-12 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058204#comment-17058204
 ] 

Joel Bernstein edited comment on SOLR-10336 at 3/12/20, 7:40 PM:
-

This is likely resolved as well. Multiple collapses should no longer be a 
problem.


was (Author: joel.bernstein):
This is likely resolved as well

> NPE during queryCache warming
> -
>
> Key: SOLR-10336
> URL: https://issues.apache.org/jira/browse/SOLR-10336
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 6.4.2
>Reporter: Markus Jelsma
>Priority: Major
> Fix For: 7.0
>
>
> Regular cache warming stumbles on this NPE. It seems to be related to 
> SOLR-9104, it is the same collection and the query that fails the cache 
> warmer is similar to that of SOLR-9104, i.e, two CollapsingQParsers.
> {code}
> Error during auto-warming of 
> key:org.apache.solr.search.QueryResultKey@fe9769ca:java.lang.NullPointerException
>   at 
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:816)
>   at 
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:853)
>   at 
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:256)
>   at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1823)
>   at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1640)
>   at 
> org.apache.solr.search.SolrIndexSearcher.lambda$initRegenerators$3(SolrIndexSearcher.java:604)
>   at org.apache.solr.search.LFUCache.warm(LFUCache.java:188)
>   at 
> org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:2376)
>   at 
> org.apache.solr.core.SolrCore.lambda$getSearcher$2(SolrCore.java:2054)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10336) NPE during queryCache warming

2020-03-12 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058204#comment-17058204
 ] 

Joel Bernstein commented on SOLR-10336:
---

This is likely resolved as well

> NPE during queryCache warming
> -
>
> Key: SOLR-10336
> URL: https://issues.apache.org/jira/browse/SOLR-10336
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 6.4.2
>Reporter: Markus Jelsma
>Priority: Major
> Fix For: 7.0
>
>
> Regular cache warming stumbles on this NPE. It seems to be related to 
> SOLR-9104, it is the same collection and the query that fails the cache 
> warmer is similar to that of SOLR-9104, i.e, two CollapsingQParsers.
> {code}
> Error during auto-warming of 
> key:org.apache.solr.search.QueryResultKey@fe9769ca:java.lang.NullPointerException
>   at 
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:816)
>   at 
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:853)
>   at 
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:256)
>   at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1823)
>   at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1640)
>   at 
> org.apache.solr.search.SolrIndexSearcher.lambda$initRegenerators$3(SolrIndexSearcher.java:604)
>   at org.apache.solr.search.LFUCache.warm(LFUCache.java:188)
>   at 
> org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:2376)
>   at 
> org.apache.solr.core.SolrCore.lambda$getSearcher$2(SolrCore.java:2054)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob opened a new pull request #1347: SOLR-14322 Improve AbstractFullDistribZkTestBase.waitForThingsToLevelOut

2020-03-12 Thread GitBox
madrob opened a new pull request #1347: SOLR-14322 Improve 
AbstractFullDistribZkTestBase.waitForThingsToLevelOut
URL: https://github.com/apache/lucene-solr/pull/1347
 
 
   https://issues.apache.org/jira/browse/SOLR-14322


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on issue #1346: LUCENE-9276: Use same code-path for updateDocuments and updateDocument

2020-03-12 Thread GitBox
s1monw commented on issue #1346: LUCENE-9276: Use same code-path for 
updateDocuments and updateDocument
URL: https://github.com/apache/lucene-solr/pull/1346#issuecomment-598371903
 
 
   @uschindler I would love to get your input here too
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9274) UnifiedHighlighter cannot handle SpanMultiTermQueryWrapper with an Automaton of type SINGLE

2020-03-12 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058188#comment-17058188
 ] 

David Smiley commented on LUCENE-9274:
--

Yes indeed; the latter part of my comment here: 
https://issues.apache.org/jira/browse/LUCENE-8158?focusedCommentId=16352779=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16352779
  I just filed a new issue for this: LUCENE-9277. I have no plans to work on 
this anytime soon but I'm always happy to code review / merge (which itself is 
plenty of work – _alas_).

> UnifiedHighlighter cannot handle SpanMultiTermQueryWrapper with an Automaton 
> of type SINGLE
> ---
>
> Key: LUCENE-9274
> URL: https://issues.apache.org/jira/browse/LUCENE-9274
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.4
>Reporter: Christoph Goller
>Priority: Major
> Attachments: TestUnifiedHighlighterMTQ.java
>
>
> MultiTermHighlighting.extractAutomata ignores a Term from a SINGLE Automaton 
> and Highlighting does not work. 
> Of course an AutomatonQuery with a single term does not make much sense, but 
> it may be generated by an automatic process.
> Possible fixes:
>  * Either implement consumeTerms in MultiTermHighlighting.AutomataCollector
>  * Or remove special case for SINGLE in CompiledAutomaton.visit
> I attatch a Unit Test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9277) UnifiedHighlighter: internally visit the query tree once

2020-03-12 Thread David Smiley (Jira)
David Smiley created LUCENE-9277:


 Summary: UnifiedHighlighter: internally visit the query tree once
 Key: LUCENE-9277
 URL: https://issues.apache.org/jira/browse/LUCENE-9277
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/highlighter
Reporter: David Smiley


Ideally the UnifiedHighlighter should "visit" the query tree *once* instead of 
several times (weight.extractTerms, MultiTermHighlighting, PhraseHelper). 
Perhaps this can happen in one new class, perhaps called QueryExtractor.   It's 
debatable wether this would replace a bunch of fields presently on UHComponents 
or whether it would simply help produce the existing UHComponents; shrug.

Admittedly, I don't know how much of an "optimization" this is, or wether this 
is just a refactoring that is done on principle.  I simply like the principle 
of it; knowing there are multiple _visit_s to the query gnaws at me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw opened a new pull request #1346: LUCENE-9276: Use same code-path for updateDocuments and updateDocument

2020-03-12 Thread GitBox
s1monw opened a new pull request #1346: LUCENE-9276: Use same code-path for 
updateDocuments and updateDocument
URL: https://github.com/apache/lucene-solr/pull/1346
 
 
   Today we have a large amount of duplicated code that is rather of
   complex nature. This change consolidates the code-paths to always
   use the updateDocuments path.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on issue #1346: LUCENE-9276: Use same code-path for updateDocuments and updateDocument

2020-03-12 Thread GitBox
s1monw commented on issue #1346: LUCENE-9276: Use same code-path for 
updateDocuments and updateDocument
URL: https://github.com/apache/lucene-solr/pull/1346#issuecomment-598351277
 
 
   @mikemccand @dnhatn wanna take a look


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9276) Consolidate DW(PT)#updateDocument and #updateDocuments

2020-03-12 Thread Simon Willnauer (Jira)
Simon Willnauer created LUCENE-9276:
---

 Summary: Consolidate DW(PT)#updateDocument and #updateDocuments
 Key: LUCENE-9276
 URL: https://issues.apache.org/jira/browse/LUCENE-9276
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: master (9.0), 8.5
Reporter: Simon Willnauer


While I was working on another IW related issue I made some changes to 
DW#updateDocument but forgot DW#updateDocuments which is annoying since the 
code is 99% identical. The same applies to DWPT#updateDocument[s]. IMO this is 
the wrong place to optimize in order to safe one or two object creations. Maybe 
we can remove this code duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #1329: SOLR-14275: Policy calculations are very slow for large clusters and large operations

2020-03-12 Thread GitBox
sigram commented on a change in pull request #1329: SOLR-14275: Policy 
calculations are very slow for large clusters and large operations
URL: https://github.com/apache/lucene-solr/pull/1329#discussion_r391797970
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/cloud/autoscaling/InactiveShardPlanAction.java
 ##
 @@ -102,9 +104,14 @@ public void process(TriggerEvent event, ActionContext 
context) throws Exception
 String parentPath = ZkStateReader.COLLECTIONS_ZKNODE + "/" + 
coll.getName();
 List locks;
 try {
-  locks = 
cloudManager.getDistribStateManager().listData(parentPath).stream()
-  .filter(name -> name.endsWith("-splitting"))
-  .collect(Collectors.toList());
+  DistribStateManager stateManager = 
cloudManager.getDistribStateManager();
 
 Review comment:
   this change is an unrelated fix, please ignore for now - this should go into 
a separate Jira.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14316) Remove unchecked type conversion warning in JavaBinCodec's readMapEntry's equals() method

2020-03-12 Thread Tomas Eduardo Fernandez Lobbe (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Eduardo Fernandez Lobbe resolved SOLR-14316.
--
Fix Version/s: 8.6
   master (9.0)
   Resolution: Fixed

Not sure what's the problem with Git tagging. Committed to:
master: 
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=commit;h=9a8602c96eebfad97e3f1502cef6c3110653cf67
branch_8x: 
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=commit;h=8d6349b2e0cf89daea6ffa07760ee18719e72eb6

> Remove unchecked type conversion warning in JavaBinCodec's readMapEntry's 
> equals() method
> -
>
> Key: SOLR-14316
> URL: https://issues.apache.org/jira/browse/SOLR-14316
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.2, 8.4.1
>Reporter: Aroop
>Priority: Minor
>  Labels: patch
> Fix For: master (9.0), 8.6
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There is an unchecked type conversion warning in JavaBinCodec's 
> readMapEntry's equals() method. 
> This change removes that warning by handling a checked conversion and also 
> adds to tests to an earlier untested api.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] tflobbe merged pull request #1344: SOLR-14316 Remove unchecked type conversion warning in JavaBinCodec's readMapEntry's equals() squashed

2020-03-12 Thread GitBox
tflobbe merged pull request #1344: SOLR-14316 Remove unchecked type conversion 
warning in JavaBinCodec's readMapEntry's equals() squashed
URL: https://github.com/apache/lucene-solr/pull/1344
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9274) UnifiedHighlighter cannot handle SpanMultiTermQueryWrapper with an Automaton of type SINGLE

2020-03-12 Thread Alan Woodward (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058078#comment-17058078
 ] 

Alan Woodward commented on LUCENE-9274:
---

I think ideally we'd merge UnifiedHighlighter.extractTerms() and 
MultiTermHighlighting.extractAutomata(), so that the single term here is 
handled the same as terms from any other query.  I know this is something that 
[~dsmiley] has been thinking about?

> UnifiedHighlighter cannot handle SpanMultiTermQueryWrapper with an Automaton 
> of type SINGLE
> ---
>
> Key: LUCENE-9274
> URL: https://issues.apache.org/jira/browse/LUCENE-9274
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.4
>Reporter: Christoph Goller
>Priority: Major
> Attachments: TestUnifiedHighlighterMTQ.java
>
>
> MultiTermHighlighting.extractAutomata ignores a Term from a SINGLE Automaton 
> and Highlighting does not work. 
> Of course an AutomatonQuery with a single term does not make much sense, but 
> it may be generated by an automatic process.
> Possible fixes:
>  * Either implement consumeTerms in MultiTermHighlighting.AutomataCollector
>  * Or remove special case for SINGLE in CompiledAutomaton.visit
> I attatch a Unit Test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14324) infra-solr commands are not working in Linux server

2020-03-12 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14324.
---
Resolution: Incomplete

First, here’s not nearly enough information here to even begin to help. 

Second, please raise issues like this on the Solr user’s list first. If it’s 
determined that this really is a code problem rather than an issue with your 
environment, we can open a new Jira or reopen this one.

> infra-solr commands are not working in Linux server
> ---
>
> Key: SOLR-14324
> URL: https://issues.apache.org/jira/browse/SOLR-14324
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.3.1
>Reporter: GANESAN.P
>Priority: Critical
>  Labels: linuc
>
> [root@node03 hduser]# systemctl status solr.service
> Unit solr.service could not be found.
> [root@node03 hduser]# solr status
> bash: solr: command not found...
> [root@node03 hduser]#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14314) Solr does not response most of the update request some times

2020-03-12 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058031#comment-17058031
 ] 

Ishan Chattopadhyaya commented on SOLR-14314:
-

Please ask in solr-users. The best Solr practitioners are there. Jira is not a 
support portal.

> Solr does not response most of the update request some times
> 
>
> Key: SOLR-14314
> URL: https://issues.apache.org/jira/browse/SOLR-14314
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Aaron Sun
>Priority: Critical
> Attachments: jstack_bad_state.log, solrlog.tar.gz, solrlog.tar.gz
>
>
> Solr version:
> {noformat}
> solr-spec
> 8.4.1
> solr-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28
> lucene-spec
> 8.4.1
> lucene-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:35:00
> {noformat}
>  
> Java process:
> {noformat}
> java -Xms100G -Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop 
> -Dsolr.solr.home=/ardome/solr -Djetty.port=8983 
> -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http
> {noformat}
> Run on a powerful server with 32 cores, 265GB RAM.
> The problem is that time to time it start to get very slow to update solr 
> documents, for example time out after 30 minutes.
> document size is around 20k~50K each, each http request send to /update is 
> around 4MB~10MB.
> /update request is done by multi processes.
> Some of the update get response, but the differences between "QTime"  and 
> http response time is big, one example, qtime = 66s, http response time is 
> 2304s.
> According to jstack for the thread state, lots of BLOCKED state.
> thread dump log is attached.
> Any hint would be appreciate, thanks!
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8929) Early Terminating CollectorManager

2020-03-12 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057967#comment-17057967
 ] 

Michael Sokolov commented on LUCENE-8929:
-

Thanks for the insightful comments, [~jim.ferenczi], you've given me a lot to 
think about! I had not really considered sorting segments: that makes a lot of 
sense when documents are at least roughly inserted in sort order. I would have 
thought merges would interfere with that opto, but I guess for the most part it 
works out? The performance improvements you saw are stunning. It would be great 
if we could get the segment sorting ideas merged into the Lucene code base, no? 
I wonder how we determine when they are applicable though. In Elasticsearch is 
it done based on some a-priori knowledge, or do you analyze the distribution 
and turn on the opto automatically? That would be compelling I think. On the 
other hand, the use case inspiring this does not tend to correlate index sort 
order and insertion order, so I don't think it would benefit as much from 
segment sorting (except due to chance, or in special cases), so I think these 
are really two separate optimizations and issues. We should be sure to 
structure the code in such a way that can accomodate them all and properly 
choose which one to apply. We don't have a formal query planner in Lucene, but 
I guess we are beginning to evolve one.

I think the idea of splitting collectors is a good one, to avoid overmuch 
complexity in a single collector, but there is also a good deal of shared code 
across these. I can give that a try and see what it looks like. 

By the way, I did also run a test using luceneutil's "modification timestamp" 
field as the index sort and saw similar gains. I think that field is more 
tightly correlated with insertion order, and also has much higher cardinality, 
so it makes a good counterpoint: I'll post results here later once I can do a 
workup.

I hear your concern about the non-determinism due to tie-breaking, but I * 
think* this is accounted for by including (global) docid in the comparison in 
MaxScoreTerminator.LeafState? I may be missing something though. It doesn't 
seem we have a good unit test checking for this tiebreak. I'll add to 
TestTopFieldCollector.testRandomMaxScoreTermination to make sure that case is 
covered.

I'm not sure what to say about the `LeafFieldComparator` idea - it sounds 
powerful, but I am also a bit leery of these complex Comparators - they make 
other things more difficult since it becomes challenging to reason about the 
sort order "from the outside". I had to resort to some "instanceof" hackery to 
restrict consideration to cases where the comparator is numeric, and extracting 
the sort value from the comparator is pretty messy too. We pay a complexity 
cost here to handle some edge cases of more abstract comparators.  

> Early Terminating CollectorManager
> --
>
> Key: LUCENE-8929
> URL: https://issues.apache.org/jira/browse/LUCENE-8929
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Atri Sharma
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> We should have an early terminating collector manager which accurately tracks 
> hits across all of its collectors and determines when there are enough hits, 
> allowing all the collectors to abort.
> The options for the same are:
> 1) Shared total count : Global "scoreboard" where all collectors update their 
> current hit count. At the end of each document's collection, collector checks 
> if N > threshold, and aborts if true
> 2) State Reporting Collectors: Collectors report their total number of counts 
> collected periodically using a callback mechanism, and get a proceed or abort 
> decision.
> 1) has the overhead of synchronization in the hot path, 2) can collect 
> unnecessary hits before aborting.
> I am planning to work on 2), unless objections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-13264) unexpected autoscaling set-trigger response

2020-03-12 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-13264:
---

Assignee: Andrzej Bialecki

> unexpected autoscaling set-trigger response
> ---
>
> Key: SOLR-13264
> URL: https://issues.apache.org/jira/browse/SOLR-13264
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Reporter: Christine Poerschke
>Assignee: Andrzej Bialecki
>Priority: Minor
> Attachments: SOLR-13264.patch, SOLR-13264.patch
>
>
> Steps to reproduce:
> {code}
> ./bin/solr start -cloud -noprompt
> ./bin/solr create -c demo -d _default -shards 1 -replicationFactor 1
> curl "http://localhost:8983/solr/admin/autoscaling; -d'
> {
>   "set-trigger" : {
> "name" : "index_size_trigger",
> "event" : "indexSize",
> "aboveDocs" : 12345,
> "aboveOp" : "SPLITSHARD",
> "enabled" : true,
> "actions" : [
>   {
> "name" : "compute_plan",
> "class": "solr.ComputePlanAction"
>   }
> ]
>   }
> }
> '
> ./bin/solr stop -all
> {code}
> The {{aboveOp}} is documented on 
> https://lucene.apache.org/solr/guide/7_6/solrcloud-autoscaling-triggers.html#index-size-trigger
>  and logically should be accepted (even though it is actually the default) 
> but unexpectedly an error message is returned {{"Error validating trigger 
> config index_size_trigger: 
> TriggerValidationException\{name=index_size_trigger, 
> details='\{aboveOp=unknown property\}'\}"}}.
> From a quick look it seems that in the {{IndexSizeTrigger}} constructor 
> additional values need to be passed to the {{TriggerUtils.validProperties}} 
> method i.e. aboveOp, belowOp and maybe others too i.e. 
> aboveSize/belowSize/etc. Illustrative patch to follow. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13807) Caching for term facet counts

2020-03-12 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057947#comment-17057947
 ] 

Michael Gibney commented on SOLR-13807:
---

Still working at [PR #751|https://github.com/apache/lucene-solr/pull/751], I 
separated the earlier monolithic commit (and recent incremental adjustments) 
into two logical commits: first introducing single-sweep collection of facet 
term counts across different DocSet domains, second introducing a term facet 
count cache. (N.b. tests are passing for each commit, but validation is failing 
based on intentional nocommits).

[~hossman], I have yet to expand on the test stub you introduced, but if you 
don't think it's premature to take a second look at this now that the two 
features have been separated into logical commits, I'd appreciate any feedback 
you have to offer. I was reluctant to force-push, and wasn't sure whether to 
open new PRs or work with the existing one; but I left the old (monolithic + 
test-stub-patch + iterative-adjustments) available 
[here|https://github.com/magibney/lucene-solr/tree/SOLR-13132-mingled-sweep-and-cache],
 and figured the new 2-commit push would clarify things and be a good jumping 
off point for however we want to proceed (whether new PRs, etc...).

I know I had said I would make single-sweep collection dependent on facet 
cache, but (as you can see) I went the opposite way. Functionality-wise, 
facet-cache would have made sense first, but code/structure-wise, sweep-first 
was much cleaner and clearer.

> Caching for term facet counts
> -
>
> Key: SOLR-13807
> URL: https://issues.apache.org/jira/browse/SOLR-13807
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Affects Versions: master (9.0), 8.2
>Reporter: Michael Gibney
>Priority: Minor
> Attachments: SOLR-13807__SOLR-13132_test_stub.patch
>
>
> Solr does not have a facet count cache; so for _every_ request, term facets 
> are recalculated for _every_ (facet) field, by iterating over _every_ field 
> value for _every_ doc in the result domain, and incrementing the associated 
> count.
> As a result, subsequent requests end up redoing a lot of the same work, 
> including all associated object allocation, GC, etc. This situation could 
> benefit from integrated caching.
> Because of the domain-based, serial/iterative nature of term facet 
> calculation, latency is proportional to the size of the result domain. 
> Consequently, one common/clear manifestation of this issue is high latency 
> for faceting over an unrestricted domain (e.g., {{\*:\*}}), as might be 
> observed on a top-level landing page that exposes facets. This type of 
> "static" case is often mitigated by external (to Solr) caching, either with a 
> caching layer between Solr and a front-end application, or within a front-end 
> application, or even with a caching layer between the end user and a 
> front-end application.
> But in addition to the overhead of handling this caching elsewhere in the 
> stack (or, for a new user, even being aware of this as a potential issue to 
> mitigate), any external caching mitigation is really only appropriate for 
> relatively static cases like the "landing page" example described above. A 
> Solr-internal facet count cache (analogous to the {{filterCache}}) would 
> provide the following additional benefits:
>  # ease of use/out-of-the-box configuration to address a common performance 
> concern
>  # compact (specifically caching count arrays, without the extra baggage that 
> accompanies a naive external caching approach)
>  # NRT-friendly (could be implemented to be segment-aware)
>  # modular, capable of reusing the same cached values in conjunction with 
> variant requests over the same result domain (this would support common use 
> cases like paging, but also potentially more interesting direct uses of 
> facets). 
>  # could be used for distributed refinement (i.e., if facet counts over a 
> given domain are cached, a refinement request could simply look up the 
> ordinal value for each enumerated term and directly grab the count out of the 
> count array that was cached during the first phase of facet calculation)
>  # composable (e.g., in aggregate functions that calculate values based on 
> facet counts across different domains, like SKG/relatedness – see SOLR-13132)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] magibney commented on issue #751: SOLR-13132: single sweep iteration over base, foreground, and background sets for "relatedness" calculation

2020-03-12 Thread GitBox
magibney commented on issue #751: SOLR-13132: single sweep iteration over base, 
foreground, and background sets for "relatedness" calculation
URL: https://github.com/apache/lucene-solr/pull/751#issuecomment-598193540
 
 
   Force push of  bc4b18f separates into two logical commits: one introduces 
single-sweep collection of term facet counts over multiple domains, the second 
(which builds on the first) introduces a facet cache, which is more generally 
useful, but is particularly helpful for performance of relatedness calculation 
over relatively stable "background" sets.
   The original monolithic PR, with some iterative improvements after 
@hossman's test stub patch, is available 
[here](https://github.com/magibney/lucene-solr/tree/SOLR-13132-mingled-sweep-and-cache).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-03-12 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057931#comment-17057931
 ] 

Tomoko Uchida commented on LUCENE-9136:
---

[~jim.ferenczi] Thank you for elaborating. I agree with you, it's great if we 
have some abstraction for vectors (interface or abstract base class with 
default implementation?) for experimenting different ann search algorithms.

> Introduce IVFFlat to Lucene for ANN similarity search
> -
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
> Attachments: glove-100-angular.png, glove-25-angular.png, 
> image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, 
> image-2020-03-07-01-27-12-859.png, sift-128-euclidean.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface, making it hard 
> to be integrated in Java projects or those who are not familier with C/C++  
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified into four 
> categories,
>  # Tree-base algorithms, such as KD-tree;
>  # Hashing methods, such as LSH (Local Sensitive Hashing);
>  # Product quantization based algorithms, such as IVFFlat;
>  # Graph-base algorithms, such as HNSW, SSG, NSG;
> where IVFFlat and HNSW are the most popular ones among all the VR algorithms.
> IVFFlat is better for high-precision applications such as face recognition, 
> while HNSW performs better in general scenarios including recommendation and 
> personalized advertisement. *The recall ratio of IVFFlat could be gradually 
> increased by adjusting the query parameter (nprobe), while it's hard for HNSW 
> to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. 
> Recently, the implementation of HNSW (Hierarchical Navigable Small World, 
> LUCENE-9004) for Lucene, has made great progress. The issue draws attention 
> of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. 
> As an alternative for solving ANN similarity search problems, IVFFlat is also 
> very popular with many users and supporters. Compared with HNSW, IVFFlat has 
> smaller index size but requires k-means clustering, while HNSW is faster in 
> query (no training required) but requires extra storage for saving graphs 
> [indexing 1M 
> vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]].
>  Another advantage is that IVFFlat can be faster and more accurate when 
> enables GPU parallel computing (current not support in Java). Both algorithms 
> have their merits and demerits. Since HNSW is now under development, it may 
> be better to provide both implementations (HNSW && IVFFlat) for potential 
> users who are faced with very different scenarios and want to more choices.
> The latest branch is 
> [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14314) Solr does not response most of the update request some times

2020-03-12 Thread Aaron Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Sun updated SOLR-14314:
-
Attachment: solrlog.tar.gz

> Solr does not response most of the update request some times
> 
>
> Key: SOLR-14314
> URL: https://issues.apache.org/jira/browse/SOLR-14314
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Aaron Sun
>Priority: Critical
> Attachments: jstack_bad_state.log, solrlog.tar.gz, solrlog.tar.gz
>
>
> Solr version:
> {noformat}
> solr-spec
> 8.4.1
> solr-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28
> lucene-spec
> 8.4.1
> lucene-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:35:00
> {noformat}
>  
> Java process:
> {noformat}
> java -Xms100G -Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop 
> -Dsolr.solr.home=/ardome/solr -Djetty.port=8983 
> -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http
> {noformat}
> Run on a powerful server with 32 cores, 265GB RAM.
> The problem is that time to time it start to get very slow to update solr 
> documents, for example time out after 30 minutes.
> document size is around 20k~50K each, each http request send to /update is 
> around 4MB~10MB.
> /update request is done by multi processes.
> Some of the update get response, but the differences between "QTime"  and 
> http response time is big, one example, qtime = 66s, http response time is 
> 2304s.
> According to jstack for the thread state, lots of BLOCKED state.
> thread dump log is attached.
> Any hint would be appreciate, thanks!
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14314) Solr does not response most of the update request some times

2020-03-12 Thread Aaron Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057919#comment-17057919
 ] 

Aaron Sun commented on SOLR-14314:
--

After more stability test, it turned out that big pause could still happen even 
with heapsize 25GB .

{noformat}

2020-03-12 14:09:45.434 DEBUG (qtp1668016508-3378) [ x:agglogtrackitem] 
o.a.s.u.DirectUpdateHandler2 
updateDocument(add\{_version_=1660963872229556224,id=2101611210074371724})
2020-03-12 14:09:45.434 DEBUG (qtp1668016508-3406) [ x:agglogtrackitem] 
o.a.s.u.DirectUpdateHandler2 
updateDocument(add\{_version_=1660963872228507650,id=2101703060188004924})
2020-03-12 14:09:48.680 DEBUG (qtp1668016508-82044) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101702020780064124} 
\{{params(commit=true),defaults(wt=json)}}
2020-03-12 14:09:48.696 DEBUG (qtp1668016508-82044) [ x:agglogtrackitem] 
o.a.s.u.DirectUpdateHandler2 
updateDocument(add\{_version_=1660963875644768256,id=2101702020780064124})
2020-03-12 14:10:01.879 DEBUG (qtp1668016508-82115) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2102002130766448724} 
\{{params(commit=true),defaults(wt=json)}}
2020-03-12 14:10:01.879 DEBUG (qtp1668016508-82115) [ x:agglogtrackitem] 
o.a.s.u.DirectUpdateHandler2 
updateDocument(add\{_version_=1660963889483874304,id=2102002130766448724})
2020-03-12 14:10:08.566 DEBUG (qtp1668016508-82155) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101702170061492124} 
\{{params(commit=true),defaults(wt=json)}}

{noformat}

 

{noformat}

java -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime -Xloggc:/var/ardendo/log/solr_gc.log 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M 
-Xms25G -Xmx25G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop 
-Dsolr.solr.home=/data1/solr8 -Djetty.port=8983 -Dsolr.log.dir=/var/ardendo/log 
-jar start.jar --module=http

{noformat}

 

[^solrlog.tar.gz]

> Solr does not response most of the update request some times
> 
>
> Key: SOLR-14314
> URL: https://issues.apache.org/jira/browse/SOLR-14314
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Aaron Sun
>Priority: Critical
> Attachments: jstack_bad_state.log, solrlog.tar.gz, solrlog.tar.gz
>
>
> Solr version:
> {noformat}
> solr-spec
> 8.4.1
> solr-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28
> lucene-spec
> 8.4.1
> lucene-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:35:00
> {noformat}
>  
> Java process:
> {noformat}
> java -Xms100G -Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop 
> -Dsolr.solr.home=/ardome/solr -Djetty.port=8983 
> -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http
> {noformat}
> Run on a powerful server with 32 cores, 265GB RAM.
> The problem is that time to time it start to get very slow to update solr 
> documents, for example time out after 30 minutes.
> document size is around 20k~50K each, each http request send to /update is 
> around 4MB~10MB.
> /update request is done by multi processes.
> Some of the update get response, but the differences between "QTime"  and 
> http response time is big, one example, qtime = 66s, http response time is 
> 2304s.
> According to jstack for the thread state, lots of BLOCKED state.
> thread dump log is attached.
> Any hint would be appreciate, thanks!
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14314) Solr does not response most of the update request some times

2020-03-12 Thread Aaron Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057912#comment-17057912
 ] 

Aaron Sun commented on SOLR-14314:
--

Update:

After more stability test, it turned out that big pause could still happen even 
with heapsize 25GB .

 

{noformat}

 

{noformat}

 

 

> Solr does not response most of the update request some times
> 
>
> Key: SOLR-14314
> URL: https://issues.apache.org/jira/browse/SOLR-14314
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Aaron Sun
>Priority: Critical
> Attachments: jstack_bad_state.log, solrlog.tar.gz
>
>
> Solr version:
> {noformat}
> solr-spec
> 8.4.1
> solr-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28
> lucene-spec
> 8.4.1
> lucene-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:35:00
> {noformat}
>  
> Java process:
> {noformat}
> java -Xms100G -Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop 
> -Dsolr.solr.home=/ardome/solr -Djetty.port=8983 
> -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http
> {noformat}
> Run on a powerful server with 32 cores, 265GB RAM.
> The problem is that time to time it start to get very slow to update solr 
> documents, for example time out after 30 minutes.
> document size is around 20k~50K each, each http request send to /update is 
> around 4MB~10MB.
> /update request is done by multi processes.
> Some of the update get response, but the differences between "QTime"  and 
> http response time is big, one example, qtime = 66s, http response time is 
> 2304s.
> According to jstack for the thread state, lots of BLOCKED state.
> thread dump log is attached.
> Any hint would be appreciate, thanks!
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-14314) Solr does not response most of the update request some times

2020-03-12 Thread Aaron Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Sun updated SOLR-14314:
-
Comment: was deleted

(was: Update:

After more stability test, it turned out that big pause could still happen even 
with heapsize 25GB .

 

{noformat}

 

{noformat}

 

 )

> Solr does not response most of the update request some times
> 
>
> Key: SOLR-14314
> URL: https://issues.apache.org/jira/browse/SOLR-14314
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Aaron Sun
>Priority: Critical
> Attachments: jstack_bad_state.log, solrlog.tar.gz
>
>
> Solr version:
> {noformat}
> solr-spec
> 8.4.1
> solr-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28
> lucene-spec
> 8.4.1
> lucene-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:35:00
> {noformat}
>  
> Java process:
> {noformat}
> java -Xms100G -Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop 
> -Dsolr.solr.home=/ardome/solr -Djetty.port=8983 
> -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http
> {noformat}
> Run on a powerful server with 32 cores, 265GB RAM.
> The problem is that time to time it start to get very slow to update solr 
> documents, for example time out after 30 minutes.
> document size is around 20k~50K each, each http request send to /update is 
> around 4MB~10MB.
> /update request is done by multi processes.
> Some of the update get response, but the differences between "QTime"  and 
> http response time is big, one example, qtime = 66s, http response time is 
> 2304s.
> According to jstack for the thread state, lots of BLOCKED state.
> thread dump log is attached.
> Any hint would be appreciate, thanks!
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-03-12 Thread Jim Ferenczi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057911#comment-17057911
 ] 

Jim Ferenczi commented on LUCENE-9004:
--

I like this issue a lot and all the discussions around it, thanks all for 
working on this!

I'd like to share some of the findings we had while working on similar 
solutions for Elasticsearch. The known limitations for graph based approach are 
the computational cost of building the graph and the memory needed to store the 
neighbors per node. Regarding the computational cost, inserting a new node is 
equivalent to a query in the solution so we should expect that the number of 
comparisons needed to insert a node in the graph will grow logarithmically with 
the size of the data. We've made some tests to check the number of comparisons 
needed on the million scale and found out that this number doesn't vary too 
much on the dataset present in the ann-benchmark repo. To get good performance 
at search time, the efConstruction parameter need to be set high (from 200 to 
800 in the best results) while M (max numbers of neighbors per node) can can 
remain lower (16 to 64). This led to around 10k comparisons in average for the 
ann-benchmark dataset in the 1-10M ranges.

10K comparisons for 1-10M ranges at query time is very compelling. Users can 
also trade some recall with performance and get acceptable results in the 1-10k 
ranges. However this trade-offs are more difficult to apply at build time where 
the quality of the graph is important to maintain. I mainly see this cost as 
static due to its logarithmic growth that is verified in the paper around 
small-world graph approaches. This is the main trade-offs that users need to 
make when using graph-based approaches, building will be slow.

 

Regarding the memory consumption, I have mixed feelings. The fact that we need 
to keep M nearest neighbors per node should not be a problem at search time 
since the graph can be static and accessed through a file. The random reads 
nature of a query in the graph will require disk seeks and reads but we 
retrieve M neighbors each time so we're not talking of tiny random reads and 
the filesystem cache will keep the hot nodes in direct memory (upper layer in 
the hierarchical graph?). I am saying this because it seems that we're 
expecting to load the entire graph in RAM at some point. I don't think this is 
needed at query time, hence my comment.

The tricky part in my opinion here is at build time where the graph is updated 
dynamically. This requires more efficient access and the ability to change the 
data. We also need to keep the nearest neighbor distances for each neighbor so 
the total cost is N*M*8 where N is the total number of documents, M the maximum 
number of neighbors per node and 8 the cost associated with keeping a doc id 
and the distance for each neighbor (int+float). The formula is slightly more 
complicated for hierarchical graph but doesn't change the scale. This memory 
requirement seems acceptable for medium-sized graph in the range of 1-10M but 
can become problematic when building large graphs of hundreds of millions 
nodes. Considering the logarithmic growth of the number of operations needed to 
find a local minimum when the dataset grows, building large graphs is 
encouraged at the expense of more memory. I don't know what would be acceptable 
but requiring tens of gigabytes of heap memories to build such graph doesn't 
seem compelling to me. Considering that the benefit of using a graph are 
already visible in the 1-10M ranges I also wonder if we could make a compromise 
and cap the size of the graphs that we build. So instead of having one graph 
per segment, we'd build N depending on how much memory the user is willing to 
allocate for the build and the total number of docs present in the segment. 
Obviously, searching these graphs sequentially would be more costly than having 
a single giant graph. However, this could also have interesting properties when 
merging segments since we wouldn't need to rebuild graphs that reached the 
maximum size allowed (assuming there's no deleted documents).

This is an idea that I wanted to not limit ourselves to the overall size of a 
single graph in a single segment of 2 billions vectors (maximum allowed per 
index in Lucene).

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search 

[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-03-12 Thread Jim Ferenczi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057886#comment-17057886
 ] 

Jim Ferenczi commented on LUCENE-9136:
--

Thanks for chiming in [~tomoko]. Although I still think it would be valuable to 
discuss the minimal signature needed to share a new codec in both approaches. I 
also think that there is a consensus around the fact that multiple strategies 
could be needed depending on the trade-offs that users are willing to take. If 
we start adding codecs and formats for every strategy that we think valuable I 
am afraid that this will block us sooner that we expect. If we agree that 
having a new codec for vectors and ann is valuable in Lucene, my proposal is to 
have a generic codec that can be used to test different strategies (k-means, 
hsnw, ...). IMO this could also changed the goal for these approaches since we 
don't want to require users to tune tons of internal options (numbers of 
neighbors, numbers of levels, ...) upfront. 

> Introduce IVFFlat to Lucene for ANN similarity search
> -
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
> Attachments: glove-100-angular.png, glove-25-angular.png, 
> image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, 
> image-2020-03-07-01-27-12-859.png, sift-128-euclidean.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface, making it hard 
> to be integrated in Java projects or those who are not familier with C/C++  
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified into four 
> categories,
>  # Tree-base algorithms, such as KD-tree;
>  # Hashing methods, such as LSH (Local Sensitive Hashing);
>  # Product quantization based algorithms, such as IVFFlat;
>  # Graph-base algorithms, such as HNSW, SSG, NSG;
> where IVFFlat and HNSW are the most popular ones among all the VR algorithms.
> IVFFlat is better for high-precision applications such as face recognition, 
> while HNSW performs better in general scenarios including recommendation and 
> personalized advertisement. *The recall ratio of IVFFlat could be gradually 
> increased by adjusting the query parameter (nprobe), while it's hard for HNSW 
> to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. 
> Recently, the implementation of HNSW (Hierarchical Navigable Small World, 
> LUCENE-9004) for Lucene, has made great progress. The issue draws attention 
> of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. 
> As an alternative for solving ANN similarity search problems, IVFFlat is also 
> very popular with many users and supporters. Compared with HNSW, IVFFlat has 
> smaller index size but requires k-means clustering, while HNSW is faster in 
> query (no training required) but requires extra storage for saving graphs 
> [indexing 1M 
> vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]].
>  Another advantage is that IVFFlat can be faster and more accurate when 
> enables GPU parallel computing (current not support in Java). Both algorithms 
> have their merits and demerits. Since HNSW is now under development, it may 
> be better to provide both implementations (HNSW && IVFFlat) for potential 
> users who are faced with very different scenarios and want to more choices.
> The latest branch is 
> [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, 

[GitHub] [lucene-solr] iverase opened a new pull request #1345: make TestLatLonMultiPolygonShapeQueries more resilient for CONTAINS queries

2020-03-12 Thread GitBox
iverase opened a new pull request #1345: make 
TestLatLonMultiPolygonShapeQueries more resilient for CONTAINS queries
URL: https://github.com/apache/lucene-solr/pull/1345
 
 
   This test can fail when a circle goes over the pole as distance point to 
line can have quite a bit of error.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9275) TestLatLonMultiPolygonShapeQueries failure

2020-03-12 Thread Ignacio Vera (Jira)
Ignacio Vera created LUCENE-9275:


 Summary: TestLatLonMultiPolygonShapeQueries failure
 Key: LUCENE-9275
 URL: https://issues.apache.org/jira/browse/LUCENE-9275
 Project: Lucene - Core
  Issue Type: Test
Reporter: Ignacio Vera


This test can fail for big circle queries when it goes over the pole.  

{code}
Error Message:
wrong hit (first of possibly more):  FAIL: id=128 should match but did not   
relation=CONTAINS   query=LatLonShapeQuery: 
field=shape:[CIRCLE([73.45044631686574,-43.522442537891635] radius = 
1320857.7583952076 meters),] docID=127   shape=[[-43.60599318072272, 
-95.89632190395075] [1.401298464324817E-45, -95.89632190395075] 
[1.401298464324817E-45, 148.0564038690461] [-43.60599318072272, 
-95.89632190395075] , [-8.713707222781277, -137.43977030462523] 
[-8.665986874636296, -136.83720024522643] [-8.605159056677273, 
-135.67900228425023] [-9.022985319342514, -135.7748381870073] 
[-9.57551836995, -135.03944293912676] [-10.486875163146422, 
-133.75932451570236] [-12.667313123772418, -133.7153234402556] 
[-15.400299607273027, -133.5089745815] [-17.28330603483186, 
-134.4554641982157] [-21.607368456646313, -136.29612908889345] 
[-20.932241412751615, -139.63293025024942] [-20.650194586536255, 
-141.13774572688035] [-19.001635084539416, -144.5606838562986] 
[-15.72417778804206, -146.161554433355] [-15.56323460342411, 
-147.13460257950626] [-11.61552273270253, -144.82632867223] 
[-8.302765767406079, -143.5037337366715] [-9.07099844105521, 
-140.49240322673248] [-7.525403752869964, -140.08470342809397] 
[-8.713707222781277, -137.43977030462523] , [0.999403953552, 
-157.66023552014605] [90.0, -157.66023552014605] [90.0, 1.401298464324817E-45] 
[0.999403953552, 1.401298464324817E-45] [0.999403953552, 
-157.66023552014605] , [78.40177762548313, 0.999403953552] [90.0, 
0.999403953552] [90.0, 107.68304478215401] [78.40177762548313, 
0.999403953552] ]   deleted?=false  
distanceQuery=CIRCLE([73.45044631686574,-43.522442537891635] radius = 
1320857.7583952076 meters)
{code}

reproduce with: 

{code}ant test  -Dtestcase=TestLatLonMultiPolygonShapeQueries 
-Dtests.method=testRandomMedium -Dtests.seed=B76D55AB11A1D02A 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=vi 
-Dtests.timezone=Etc/GMT-3 -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-03-12 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057878#comment-17057878
 ] 

Tomoko Uchida commented on LUCENE-9136:
---

{code}
Do we need a new VectorFormat that can be shared with the graph-based approach ?
{code}

About this point, I think we don't need to consider both approaches at once. 
Please don't wait or take care the hnsw issue, and concentrate to get this in 
the master. I or someone with more knowledge/experience in this area will find 
the good way to integrate the graph-based approach later.

> Introduce IVFFlat to Lucene for ANN similarity search
> -
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
> Attachments: glove-100-angular.png, glove-25-angular.png, 
> image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, 
> image-2020-03-07-01-27-12-859.png, sift-128-euclidean.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface, making it hard 
> to be integrated in Java projects or those who are not familier with C/C++  
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified into four 
> categories,
>  # Tree-base algorithms, such as KD-tree;
>  # Hashing methods, such as LSH (Local Sensitive Hashing);
>  # Product quantization based algorithms, such as IVFFlat;
>  # Graph-base algorithms, such as HNSW, SSG, NSG;
> where IVFFlat and HNSW are the most popular ones among all the VR algorithms.
> IVFFlat is better for high-precision applications such as face recognition, 
> while HNSW performs better in general scenarios including recommendation and 
> personalized advertisement. *The recall ratio of IVFFlat could be gradually 
> increased by adjusting the query parameter (nprobe), while it's hard for HNSW 
> to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. 
> Recently, the implementation of HNSW (Hierarchical Navigable Small World, 
> LUCENE-9004) for Lucene, has made great progress. The issue draws attention 
> of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. 
> As an alternative for solving ANN similarity search problems, IVFFlat is also 
> very popular with many users and supporters. Compared with HNSW, IVFFlat has 
> smaller index size but requires k-means clustering, while HNSW is faster in 
> query (no training required) but requires extra storage for saving graphs 
> [indexing 1M 
> vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]].
>  Another advantage is that IVFFlat can be faster and more accurate when 
> enables GPU parallel computing (current not support in Java). Both algorithms 
> have their merits and demerits. Since HNSW is now under development, it may 
> be better to provide both implementations (HNSW && IVFFlat) for potential 
> users who are faced with very different scenarios and want to more choices.
> The latest branch is 
> [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14324) infra-solr commands are not working in Linux server

2020-03-12 Thread GANESAN.P (Jira)
GANESAN.P created SOLR-14324:


 Summary: infra-solr commands are not working in Linux server
 Key: SOLR-14324
 URL: https://issues.apache.org/jira/browse/SOLR-14324
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 7.3.1
Reporter: GANESAN.P


[root@node03 hduser]# systemctl status solr.service
Unit solr.service could not be found.
[root@node03 hduser]# solr status
bash: solr: command not found...
[root@node03 hduser]#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jimczi commented on issue #1316: LUCENE-8929 parallel early termination in TopFieldCollector using minmin score

2020-03-12 Thread GitBox
jimczi commented on issue #1316: LUCENE-8929 parallel early termination in 
TopFieldCollector using minmin score
URL: https://github.com/apache/lucene-solr/pull/1316#issuecomment-598150417
 
 
   Thanks for the ping @msokolov .
   
   >  if you can comment on whether the MaxScoreAccumulator still provides 
additional benefit alongside this opto? I haven't tried removing it, but I 
wonder if it might be doing something redundant now - I'm not totally clear 
what impact setMinCompetitiveScore will have.
   
   It's redundant in spirit but the MaxScoreAccumulator is for queries sorted 
by relevancy. So it used so that queries  sorted by relevancy can use 
`setMinCompetitiveScore` even if they have a tiebreaker on another field (using 
TopFieldCollector). The logic is similar to what you added in the 
`MaxScoreTerminator` except that the side effects of changes in the maximum 
score are handled by the top collectors directly. I left a comment in the 
original issue but I think we should try to merge the optimization you have for 
the sorted index case in the current logic or create a new top field collector 
dedicated to optimize the retrieval of large top N on sorted indices. With this 
pr we would have 3 different objects used by concurrent requests to speed up 
search but I think it would be preferrable to specialize at this point. What do 
you think ?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1304: LUCENE-9242: generate javadocs by calling Ant javadoc task

2020-03-12 Thread GitBox
dweiss commented on a change in pull request #1304: LUCENE-9242: generate 
javadocs by calling Ant javadoc task
URL: https://github.com/apache/lucene-solr/pull/1304#discussion_r391552531
 
 

 ##
 File path: gradle/invoke-javadoc.gradle
 ##
 @@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// invoke javadoc tool
+
+allprojects {
+
+  ext {
+javadocRoot = project.path.startsWith(':lucene') ? 
project(':lucene').file("build/docs") : project(':solr').file("build/docs")
+javadocDestDir = "${javadocRoot}/${project.name}"
+  }
+
+  plugins.withType(JavaPlugin) {
+def libName = project.path.startsWith(":lucene") ? "Lucene" : "Solr"
+def title = "${libName} ${project.version} ${project.name} API".toString()
+def srcDirs = sourceSets.main.java.srcDirs.findAll { dir -> dir.exists() }
+
+task invokeJavadoc {
+  description "Generates Javadoc API documentation for the main source 
code. This invokes Ant Javadoc Task."
+  group "documentation"
+
+  dependsOn sourceSets.main.compileClasspath
+
+  inputs.property("linksource", "no")
+  inputs.property("linkJUnit", false)
+  inputs.property("linkHref", [])
+
+  inputs.files sourceSets.main.java.asFileTree
+  outputs.dir project.javadocRoot
+
+  doFirst {
+srcDirs.each { srcDir ->
+  ant.javadoc(
+  overview: file("${srcDir}/overview.html"),
+  packagenames: "org.apache.lucene.*,org.apache.solr.*",
+  destDir: project.javadocDestDir,
+  access: "protected",
+  encoding: "UTF-8",
+  charset: "UTF-8",
+  docencoding: "UTF-8",
+  noindex: "true",
+  includenosourcepackages: "true",
+  author: "true",
+  version: "true",
+  linksource: inputs.properties.linksource,
+  use: "true",
+  failonerror: "true",
+  locale: "en_US",
+  windowtitle: title,
+  doctitle: title,
+  maxmemory: "512m",
+  classpath: sourceSets.main.compileClasspath.asPath,
+  bottom: "Copyright  2000-${buildYear} Apache Software 
Foundation. All Rights Reserved."
+  ) {
+packageset(dir: srcDir)
+
+tag(name: "lucene.experimental", description: "WARNING: This API 
is experimental and might change in incompatible ways in the next release.")
+tag(name: "lucene.internal", description: "NOTE: This API is for 
internal purposes only and might change in incompatible ways in the next 
release.")
+tag(name: "lucene.spi", description: "SPI Name (Note: This is 
case-insensitive. e.g., if the name is 'htmlStrip', 'htmlstrip' can be used 
when looking up the service):", scope: "types")
+
+// resolve links to JavaSE and JUnit API
+link(offline: "true", href: 
"https://docs.oracle.com/en/java/javase/11/docs/api/;, packageListLoc: 
project(":lucene").file("tools/javadoc/java11/").toString())
+if (inputs.properties.get("linkJUnit")) {
+  link(offline: "true", href: 
"https://junit.org/junit4/javadoc/4.12/;, packageListLoc: 
project(":lucene").file("tools/javadoc/junit").toString())
+}
+// resolve inter-module links if 'linkHref' property is specified
+inputs.properties.get("linkHref").each { href ->
+  link(href: href)
+}
+
+arg(line: "--release 11")
+arg(line: "-Xdoclint:all,-missing")
+
+// force locale to be "en_US" (fix for: 
https://bugs.openjdk.java.net/browse/JDK-8222793)
+arg(line: "-J-Duser.language=en -J-Duser.country=US")
+  }
+}
+
+// append some special table css, prettify css
+ant.concat(destfile: "${javadocDestDir}/stylesheet.css", append: 
"true", fixlastline: "true", encoding: "UTF-8") {
+  filelist(dir: project(":lucene").file("tools/javadoc"), files: 
"table_padding.css")
+  filelist(dir: project(":lucene").file("tools/prettify"), files: 
"prettify.css")
+}
+// append prettify to scripts
+ant.concat(destfile: 

[GitHub] [lucene-solr] mocobeta commented on a change in pull request #1304: LUCENE-9242: generate javadocs by calling Ant javadoc task

2020-03-12 Thread GitBox
mocobeta commented on a change in pull request #1304: LUCENE-9242: generate 
javadocs by calling Ant javadoc task
URL: https://github.com/apache/lucene-solr/pull/1304#discussion_r391547313
 
 

 ##
 File path: gradle/invoke-javadoc.gradle
 ##
 @@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// invoke javadoc tool
+
+allprojects {
+
+  ext {
+javadocRoot = project.path.startsWith(':lucene') ? 
project(':lucene').file("build/docs") : project(':solr').file("build/docs")
+javadocDestDir = "${javadocRoot}/${project.name}"
+  }
+
+  plugins.withType(JavaPlugin) {
+def libName = project.path.startsWith(":lucene") ? "Lucene" : "Solr"
+def title = "${libName} ${project.version} ${project.name} API".toString()
+def srcDirs = sourceSets.main.java.srcDirs.findAll { dir -> dir.exists() }
+
+task invokeJavadoc {
+  description "Generates Javadoc API documentation for the main source 
code. This invokes Ant Javadoc Task."
+  group "documentation"
+
+  dependsOn sourceSets.main.compileClasspath
+
+  inputs.property("linksource", "no")
+  inputs.property("linkJUnit", false)
+  inputs.property("linkHref", [])
+
+  inputs.files sourceSets.main.java.asFileTree
+  outputs.dir project.javadocRoot
+
+  doFirst {
+srcDirs.each { srcDir ->
+  ant.javadoc(
+  overview: file("${srcDir}/overview.html"),
+  packagenames: "org.apache.lucene.*,org.apache.solr.*",
+  destDir: project.javadocDestDir,
+  access: "protected",
+  encoding: "UTF-8",
+  charset: "UTF-8",
+  docencoding: "UTF-8",
+  noindex: "true",
+  includenosourcepackages: "true",
+  author: "true",
+  version: "true",
+  linksource: inputs.properties.linksource,
+  use: "true",
+  failonerror: "true",
+  locale: "en_US",
+  windowtitle: title,
+  doctitle: title,
+  maxmemory: "512m",
+  classpath: sourceSets.main.compileClasspath.asPath,
+  bottom: "Copyright  2000-${buildYear} Apache Software 
Foundation. All Rights Reserved."
+  ) {
+packageset(dir: srcDir)
+
+tag(name: "lucene.experimental", description: "WARNING: This API 
is experimental and might change in incompatible ways in the next release.")
+tag(name: "lucene.internal", description: "NOTE: This API is for 
internal purposes only and might change in incompatible ways in the next 
release.")
+tag(name: "lucene.spi", description: "SPI Name (Note: This is 
case-insensitive. e.g., if the name is 'htmlStrip', 'htmlstrip' can be used 
when looking up the service):", scope: "types")
+
+// resolve links to JavaSE and JUnit API
+link(offline: "true", href: 
"https://docs.oracle.com/en/java/javase/11/docs/api/;, packageListLoc: 
project(":lucene").file("tools/javadoc/java11/").toString())
+if (inputs.properties.get("linkJUnit")) {
+  link(offline: "true", href: 
"https://junit.org/junit4/javadoc/4.12/;, packageListLoc: 
project(":lucene").file("tools/javadoc/junit").toString())
+}
+// resolve inter-module links if 'linkHref' property is specified
+inputs.properties.get("linkHref").each { href ->
+  link(href: href)
+}
+
+arg(line: "--release 11")
+arg(line: "-Xdoclint:all,-missing")
+
+// force locale to be "en_US" (fix for: 
https://bugs.openjdk.java.net/browse/JDK-8222793)
+arg(line: "-J-Duser.language=en -J-Duser.country=US")
+  }
+}
+
+// append some special table css, prettify css
+ant.concat(destfile: "${javadocDestDir}/stylesheet.css", append: 
"true", fixlastline: "true", encoding: "UTF-8") {
+  filelist(dir: project(":lucene").file("tools/javadoc"), files: 
"table_padding.css")
+  filelist(dir: project(":lucene").file("tools/prettify"), files: 
"prettify.css")
+}
+// append prettify to scripts
+ant.concat(destfile: 

[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-03-12 Thread Jim Ferenczi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057803#comment-17057803
 ] 

Jim Ferenczi commented on LUCENE-9136:
--

> ??I was thinking we could actually reuse the existing `PostingsFormat` and 
>`DocValuesFormat` implementations.??

 That's the one of the main reason why this approach is interesting for Lucene. 
The main operation at query time is a basic inverted lists search so it would 
be a shame to not reuse the existing formats that were designed for this 
purpose. In general I think that this approach (k-means clustering at index 
time) is very compelling since it's a light layer on top of existing 
functionalities. The computational cost is big (running k-means and assigning 
vectors to centroids) but we can ensure that it remains acceptable by capping 
the number of centroids or by using an hybrid approach with a small-world graph 
like Julie suggested. 

Regarding the link with the graph-based approach, I wonder what the new ANN 
Codec should expose. If the goal is to provide approximate nearest neighbors 
capabilities to Lucene I don't think we want to leak any implementation details 
there.

It's difficult to tell now since both effort are in the design phase but I 
think we should aim at something very simple that only exposes an approximate 
nearest neighbor search. Something like:
{code:java}
interface VectorFormat {
  TopDocs ann(int topN, int maxDocsToVisit);
  float[] getVector(int docID);
}{code}
should be enough. Most of the format we have in Lucene have sensible defaults 
or compute parameters based on the shape of the data so I don't think we should 
expose tons of options here. This is another advantage of this approach in my 
opinion since we can compute the number of centroids needed for each segment 
automatically. The research in this area are also moving fast so we need to 
remain open to new approaches without requiring to add a new format all the 
time. 

> Actually, we need random access to the vector values! For a typical search 
>engine, we are going to retrieving the best matched documents after obtaining 
>the TopK docIDs. Retrieving vectors via these docIDs requires random access to 
>the vector values.

You can sort the TopK (which should be small) by docIDs and then perform the 
lookup sequentially ? That's how we retrieve stored fields from top documents 
in the normal search. This is again an advantage against the graph based 
approach because it is compliant with the search model in Lucene that requires 
forward iteration.

To move forward on this issue I'd like to list the things that need 
clarifications in my opinion:
 * Do we need a new VectorFormat that can be shared with the graph-based 
approach ?
 ** This decision and the design of the VectorFormat is important to ensure 
that both efforts can move independently. Currently it it not clear if this 
approach can move forward if the graph-based approach is stalled or needs more 
work. 
I tend to think that having a simple format upfront can drive decisions we make 
on both approaches so we should tackle this first.
 *   What is the acceptable state for this approach to be considered ready to 
merge ?

 ** Lots of optimizations have been mentioned in both issues but I think we 
should drive for simplicity first.
That's the beauty of the k-means approach, it's simple to understand and reason 
about. We should have
a first version that reuses the internal data formats since they fit perfectly. 
I think that's what Julie's pr brings here while leaving the room for further 
improvements like any Lucene features.
 ** We should decorrelate the progress here from the one in the other Lucene 
issue. This is linked to question 1 but I think it's key to move forward.

In general I feel like the branch proposed by [~irvingzhang] and the additional 
changes by [~jtibshirani] are moving toward the right direction. The qps 
improvement over a brute-force approach are already compelling as outlined in 
[https://github.com/apache/lucene-solr/pull/1314] so I don't think it will be 
difficult to have a consensus whether this would be useful to add in Lucene.

> Introduce IVFFlat to Lucene for ANN similarity search
> -
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
> Attachments: glove-100-angular.png, glove-25-angular.png, 
> image-2020-03-07-01-22-06-132.png, image-2020-03-07-01-25-58-047.png, 
> image-2020-03-07-01-27-12-859.png, sift-128-euclidean.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous 

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1304: LUCENE-9242: generate javadocs by calling Ant javadoc task

2020-03-12 Thread GitBox
dweiss commented on a change in pull request #1304: LUCENE-9242: generate 
javadocs by calling Ant javadoc task
URL: https://github.com/apache/lucene-solr/pull/1304#discussion_r391523979
 
 

 ##
 File path: gradle/invoke-javadoc.gradle
 ##
 @@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// invoke javadoc tool
+
+allprojects {
+
+  ext {
+javadocRoot = project.path.startsWith(':lucene') ? 
project(':lucene').file("build/docs") : project(':solr').file("build/docs")
+javadocDestDir = "${javadocRoot}/${project.name}"
+  }
+
+  plugins.withType(JavaPlugin) {
+def libName = project.path.startsWith(":lucene") ? "Lucene" : "Solr"
+def title = "${libName} ${project.version} ${project.name} API".toString()
+def srcDirs = sourceSets.main.java.srcDirs.findAll { dir -> dir.exists() }
+
+task invokeJavadoc {
+  description "Generates Javadoc API documentation for the main source 
code. This invokes Ant Javadoc Task."
+  group "documentation"
+
+  dependsOn sourceSets.main.compileClasspath
+
+  inputs.property("linksource", "no")
+  inputs.property("linkJUnit", false)
+  inputs.property("linkHref", [])
+
+  inputs.files sourceSets.main.java.asFileTree
+  outputs.dir project.javadocRoot
+
+  doFirst {
+srcDirs.each { srcDir ->
+  ant.javadoc(
+  overview: file("${srcDir}/overview.html"),
+  packagenames: "org.apache.lucene.*,org.apache.solr.*",
+  destDir: project.javadocDestDir,
+  access: "protected",
+  encoding: "UTF-8",
+  charset: "UTF-8",
+  docencoding: "UTF-8",
+  noindex: "true",
+  includenosourcepackages: "true",
+  author: "true",
+  version: "true",
+  linksource: inputs.properties.linksource,
+  use: "true",
+  failonerror: "true",
+  locale: "en_US",
+  windowtitle: title,
+  doctitle: title,
+  maxmemory: "512m",
+  classpath: sourceSets.main.compileClasspath.asPath,
+  bottom: "Copyright  2000-${buildYear} Apache Software 
Foundation. All Rights Reserved."
+  ) {
+packageset(dir: srcDir)
+
+tag(name: "lucene.experimental", description: "WARNING: This API 
is experimental and might change in incompatible ways in the next release.")
+tag(name: "lucene.internal", description: "NOTE: This API is for 
internal purposes only and might change in incompatible ways in the next 
release.")
+tag(name: "lucene.spi", description: "SPI Name (Note: This is 
case-insensitive. e.g., if the name is 'htmlStrip', 'htmlstrip' can be used 
when looking up the service):", scope: "types")
+
+// resolve links to JavaSE and JUnit API
+link(offline: "true", href: 
"https://docs.oracle.com/en/java/javase/11/docs/api/;, packageListLoc: 
project(":lucene").file("tools/javadoc/java11/").toString())
+if (inputs.properties.get("linkJUnit")) {
+  link(offline: "true", href: 
"https://junit.org/junit4/javadoc/4.12/;, packageListLoc: 
project(":lucene").file("tools/javadoc/junit").toString())
+}
+// resolve inter-module links if 'linkHref' property is specified
+inputs.properties.get("linkHref").each { href ->
+  link(href: href)
+}
+
+arg(line: "--release 11")
+arg(line: "-Xdoclint:all,-missing")
+
+// force locale to be "en_US" (fix for: 
https://bugs.openjdk.java.net/browse/JDK-8222793)
+arg(line: "-J-Duser.language=en -J-Duser.country=US")
+  }
+}
+
+// append some special table css, prettify css
+ant.concat(destfile: "${javadocDestDir}/stylesheet.css", append: 
"true", fixlastline: "true", encoding: "UTF-8") {
+  filelist(dir: project(":lucene").file("tools/javadoc"), files: 
"table_padding.css")
+  filelist(dir: project(":lucene").file("tools/prettify"), files: 
"prettify.css")
+}
+// append prettify to scripts
+ant.concat(destfile: 

[jira] [Commented] (SOLR-13944) CollapsingQParserPlugin throws NPE instead of bad request

2020-03-12 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057784#comment-17057784
 ] 

Lucene/Solr QA commented on SOLR-13944:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  2m  5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  2m  5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  2m  5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 32m 17s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.search.CurrencyRangeFacetCloudTest |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13944 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12996485/SOLR-13944.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP 
Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 8a940e7 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
| unit | 
https://builds.apache.org/job/PreCommit-SOLR-Build/710/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/710/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/710/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> CollapsingQParserPlugin throws NPE instead of bad request
> -
>
> Key: SOLR-13944
> URL: https://issues.apache.org/jira/browse/SOLR-13944
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.3.1
>Reporter: Stefan
>Assignee: Munendra S N
>Priority: Minor
> Attachments: SOLR-13944.patch, SOLR-13944.patch, SOLR-13944.patch, 
> SOLR-13944.patch
>
>
>  I noticed the following NPE:
> {code:java}
> java.lang.NullPointerException at 
> org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1021)
>  at 
> org.apache.solr.search.CollapsingQParserPlugin$OrdFieldValueCollector.finish(CollapsingQParserPlugin.java:1081)
>  at 
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:230)
>  at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1602)
>  at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1419)
>  at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584)
> {code}
> If I am correct, the problem was already addressed in SOLR-8807. The fix does 
> was not working in this case though, because of a syntax error in the query 
> (I used the local parameter syntax twice instead of combining it). The 
> relevant part of the query is:
> {code:java}
> ={!tag=collapser}{!collapse field=productId sort='merchantOrder asc, price 
> asc, id asc'}
> {code}
> After discussing that on the mailing list, I was asked to open a ticket, 
> because this situation should result in a bad request instead of a 
> NullpointerException (see 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201911.mbox/%3CCAMJgJxTuSb%3D8szO8bvHiAafJOs08O_NMB4pcaHOXME4Jj-GO2A%40mail.gmail.com%3E])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (SOLR-14314) Solr does not response most of the update request some times

2020-03-12 Thread Aaron Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057767#comment-17057767
 ] 

Aaron Sun edited comment on SOLR-14314 at 3/12/20, 10:09 AM:
-

[~ichattopadhyaya]   Thanks for the valuable answer.  After change the JVM heap 
size to 25 GB, it indeed become much better, still a bit pause in log here and 
there, but much shorter, around 1~2 seconds.  Is it possible to make it even 
better?  Also notice the pause happen more often around "HttpSolrCall Closing 
out SolrRequest" which does not seem related with GC pause.

Regarding the muliple solr nodes(JVMs), I guess you refer to this page: 
[https://lucene.apache.org/solr/guide/7_2/taking-solr-to-production.html#running-multiple-solr-nodes-per-host|https://lucene.apache.org/solr/guide/7_2/taking-solr-to-production.html#running-multiple-solr-nodes-per-host,]
  , is that mean each solr instance have it's own solr home directory and port? 
 if so how to split the data? one core with one instance?  Is that mean client 
need to manage which solr instance to talk with?

I couldn't find good example on internet, appreciate if you could provide some 
guidance.  
{noformat}
2020-03-12 10:37:19.804 DEBUG (qtp1668016508-3474) [ x:aggprogram] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-12 10:37:20.543 DEBUG (qtp1668016508-4857) [ x:aggprogram] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101608110097976031} 
\{{params(commit=true),defaults(wt=json)}}

2020-03-12 10:39:11.250 DEBUG (qtp1668016508-6123) [ x:aggasset] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-12 10:39:11.915 TRACE (qtp1668016508-3376) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2102003090810779924 to 
tlog\{file=/data1/solr8/agglogtrackitem/data/tlog/tlog.0002583 
refcount=1} LogPtr(1081326) map=1784607161

2020-03-12 10:40:08.746 DEBUG (qtp1668016508-382) [ x:aggasset] 
o.a.s.s.HttpSolrCall Closing out SolrRequest: 
\{{params(commit=true),defaults(wt=json)}}
2020-03-12 10:40:09.640 DEBUG (qtp1668016508-3239) [ x:aggasset] 
o.a.s.u.TransactionLog New TransactionLog 
file=/data1/solr8/aggasset/data/tlog/tlog.0001116, exists=false, 
size=0, openExisting=false


2020-03-12 10:40:58.182 DEBUG (qtp1668016508-3413) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-12 10:41:00.318 TRACE (qtp1668016508-381) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101701290647113224 to 
tlog\{file=/data1/solr8/agglogtrackitem/data/tlog/tlog.0002593 
refcount=1} LogPtr(1940077) map=1984880505


2020-03-12 10:41:33.880 DEBUG (qtp1668016508-771) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-12 10:41:35.754 TRACE (qtp1668016508-3298) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2102003070806775224 to 
tlog\{file=/data1/solr8/agglogtrackitem/data/tlog/tlog.0002598 
refcount=1} LogPtr(4246525) map=1493020555


2020-03-12 10:42:23.140 DEBUG (qtp1668016508-107) [ x:agglogtrackitem] 
o.a.s.u.DirectUpdateHandler2 
updateDocument(add\{_version_=1660950824311848960,id=2101702170007764324})
2020-03-12 10:42:23.935 TRACE (qtp1668016508-380) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101806210189104124 to 
tlog\{file=/data1/solr8/agglogtrackitem/data/tlog/tlog.0002605 
refcount=1} LogPtr(5096503) map=2041040637

{noformat}
 And the QTime with 100+s still not sound too good

{noformat}

2020-03-12 11:08:27.586 INFO (qtp1668016508-15663) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory [agglogtrackitem] webapp=/solr path=/update 
params=\{commit=true}{add=[2102001130729569124 (1660952325002362880), 
2102001220746002624 (1660952325018091520), 2102002130766975424 
(1660952325216272385), 2102003020799380624 (1660952325224660992), 
2102001150733370324 (1660952325239341056), 2102003090811568924 
(1660952325280235520), 2102002130766460924 (1660952325295964161), 
2102001220746002024 (1660952325313789954), 2102002200779134024 
(1660952325333712896), 2102002280792794524 (1660952325357830145), ... (200 
adds)],commit=} 0 134457
2020-03-12 11:08:27.586 DEBUG (qtp1668016508-15663) [ x:agglogtrackitem] 
o.a.s.s.HttpSolrCall Closing out SolrRequest: 

[jira] [Commented] (LUCENE-8929) Early Terminating CollectorManager

2020-03-12 Thread Jim Ferenczi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057773#comment-17057773
 ] 

Jim Ferenczi commented on LUCENE-8929:
--

Interesting results [~sokolov] and thanks for cleaning the TopFieldCollector.

Regarding the challenge, I wonder if DAY_OF_YEAR is a good candidate. 
Considering the cardinality of the field, it could be more efficient to sort 
the leaves based on their max values and number of documents before forking new 
threads ? This is not the case here, but for time-based data where the order of 
segments follow the natural order of insertion, sorting by segments prior to 
search can improve the performance dramatically even for small top N. This is 
something we added in Elasticsearch to boost the performance of queries sorted 
by timestamp on time-based indices: 
[https://github.com/elastic/elasticsearch/pull/44021]

For sorted queries in general, I think it could be interesting to differentiate 
requests that don't require to follow the natural order of segments. This is 
true for concurrent requests but this shouldn't be limited to this case. Today 
we try to share a global state between leaves so that concurrent and sequential 
request can early terminate efficiently. We also handle sorted indices and 
queries sorted by relevancy and a tiebreaker, all of that in the same 
TopFieldCollector. I know you already made some cleanup but it is maybe a time 
to have a clear split ? Optimizing queries on sorted indices for large top N 
could be enhanced further if we add a special top field collector for this 
purpose. You could for instance remove the leaf priority queue entirely since 
results are already sorted ?

I am also not sure that we're comparing the same thing in the benchmark. If I 
understand the last pr correctly, leaves are terminated as soon as they've 
reached the global lower bound so they don't tiebreak ties on doc ids. Not sure 
if that makes a big difference or not in terms of performance but that would at 
least make the top N non-deterministic so that's a problem.

I am supportive of any improvements we want to make on sorted queries but we 
should also keep the TopFieldCollector simple.

Another idea that we discussed with Adrien would be to give the ability to skip 
documents in the LeafFieldComparator. This is similar in spirit than what we 
have in queries with setMinCompetitiveScore:
{code:java}
public interface LeafFieldComparator {
   void setBottom(final int slot) throws IOException;
   
   ...

   default DocIdSetIterator iterator() {
 return null;
   }
}{code}
 If the returned iterator is used in conjunction with the query, it should be 
possible to stop/modify the remaining collection when setBottom is called by 
the top collector. With this mechanism in place it could be much simpler 
implement the optimization we added in  Elasticsearch in: 
[https://github.com/elastic/elasticsearch/pull/49732]. I am not sure if this 
would be usable for the optimization you want but I wanted to share this idea 
since it could have the same impact on sorted queries in Lucene than the 
block-max WAND have on queries sorted by score.

 

 

> Early Terminating CollectorManager
> --
>
> Key: LUCENE-8929
> URL: https://issues.apache.org/jira/browse/LUCENE-8929
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Atri Sharma
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> We should have an early terminating collector manager which accurately tracks 
> hits across all of its collectors and determines when there are enough hits, 
> allowing all the collectors to abort.
> The options for the same are:
> 1) Shared total count : Global "scoreboard" where all collectors update their 
> current hit count. At the end of each document's collection, collector checks 
> if N > threshold, and aborts if true
> 2) State Reporting Collectors: Collectors report their total number of counts 
> collected periodically using a callback mechanism, and get a proceed or abort 
> decision.
> 1) has the overhead of synchronization in the hot path, 2) can collect 
> unnecessary hits before aborting.
> I am planning to work on 2), unless objections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14314) Solr does not response most of the update request some times

2020-03-12 Thread Aaron Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057767#comment-17057767
 ] 

Aaron Sun commented on SOLR-14314:
--

[~ichattopadhyaya]   Thanks for the valuable answer.  After change the JVM heap 
size to 25 GB, it indeed become much better, still a bit pause in log here and 
there, but much shorter, around 1~2 seconds.  Is it possible to make it even 
better?  Also notice the pause happen more often around "HttpSolrCall Closing 
out SolrRequest" which does not seem related with GC pause.

Regarding the muliple solr nodes(JVMs), I guess you refer to this page: 
[https://lucene.apache.org/solr/guide/7_2/taking-solr-to-production.html#running-multiple-solr-nodes-per-host|https://lucene.apache.org/solr/guide/7_2/taking-solr-to-production.html#running-multiple-solr-nodes-per-host,]
  , is that mean each solr instance have it's own solr home directory and port? 
 if so how to split the data? one core with one instance?  Is that mean client 
need to manage which solr instance to talk with?

I couldn't find good example on internet, appreciate if you could provide some 
guidance.  

{noformat}

2020-03-12 10:37:19.804 DEBUG (qtp1668016508-3474) [ x:aggprogram] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-12 10:37:20.543 DEBUG (qtp1668016508-4857) [ x:aggprogram] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101608110097976031} 
\{{params(commit=true),defaults(wt=json)}}

2020-03-12 10:39:11.250 DEBUG (qtp1668016508-6123) [ x:aggasset] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-12 10:39:11.915 TRACE (qtp1668016508-3376) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2102003090810779924 to 
tlog\{file=/data1/solr8/agglogtrackitem/data/tlog/tlog.0002583 
refcount=1} LogPtr(1081326) map=1784607161

2020-03-12 10:40:08.746 DEBUG (qtp1668016508-382) [ x:aggasset] 
o.a.s.s.HttpSolrCall Closing out SolrRequest: 
\{{params(commit=true),defaults(wt=json)}}
2020-03-12 10:40:09.640 DEBUG (qtp1668016508-3239) [ x:aggasset] 
o.a.s.u.TransactionLog New TransactionLog 
file=/data1/solr8/aggasset/data/tlog/tlog.0001116, exists=false, 
size=0, openExisting=false


2020-03-12 10:40:58.182 DEBUG (qtp1668016508-3413) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-12 10:41:00.318 TRACE (qtp1668016508-381) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101701290647113224 to 
tlog\{file=/data1/solr8/agglogtrackitem/data/tlog/tlog.0002593 
refcount=1} LogPtr(1940077) map=1984880505


2020-03-12 10:41:33.880 DEBUG (qtp1668016508-771) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-12 10:41:35.754 TRACE (qtp1668016508-3298) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2102003070806775224 to 
tlog\{file=/data1/solr8/agglogtrackitem/data/tlog/tlog.0002598 
refcount=1} LogPtr(4246525) map=1493020555


2020-03-12 10:42:23.140 DEBUG (qtp1668016508-107) [ x:agglogtrackitem] 
o.a.s.u.DirectUpdateHandler2 
updateDocument(add\{_version_=1660950824311848960,id=2101702170007764324})
2020-03-12 10:42:23.935 TRACE (qtp1668016508-380) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101806210189104124 to 
tlog\{file=/data1/solr8/agglogtrackitem/data/tlog/tlog.0002605 
refcount=1} LogPtr(5096503) map=2041040637

{noformat}

 

> Solr does not response most of the update request some times
> 
>
> Key: SOLR-14314
> URL: https://issues.apache.org/jira/browse/SOLR-14314
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Aaron Sun
>Priority: Critical
> Attachments: jstack_bad_state.log, solrlog.tar.gz
>
>
> Solr version:
> {noformat}
> solr-spec
> 8.4.1
> solr-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28
> lucene-spec
> 8.4.1
> lucene-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:35:00
> {noformat}
>  
> Java process:
> {noformat}
> java -Xms100G -Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop 
> -Dsolr.solr.home=/ardome/solr 

[GitHub] [lucene-solr] mocobeta commented on a change in pull request #1304: LUCENE-9242: generate javadocs by calling Ant javadoc task

2020-03-12 Thread GitBox
mocobeta commented on a change in pull request #1304: LUCENE-9242: generate 
javadocs by calling Ant javadoc task
URL: https://github.com/apache/lucene-solr/pull/1304#discussion_r391504306
 
 

 ##
 File path: gradle/invoke-javadoc.gradle
 ##
 @@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// invoke javadoc tool
+
+allprojects {
+
+  ext {
+javadocRoot = project.path.startsWith(':lucene') ? 
project(':lucene').file("build/docs") : project(':solr').file("build/docs")
+javadocDestDir = "${javadocRoot}/${project.name}"
+  }
+
+  plugins.withType(JavaPlugin) {
+def libName = project.path.startsWith(":lucene") ? "Lucene" : "Solr"
+def title = "${libName} ${project.version} ${project.name} API".toString()
+def srcDirs = sourceSets.main.java.srcDirs.findAll { dir -> dir.exists() }
+
+task invokeJavadoc {
+  description "Generates Javadoc API documentation for the main source 
code. This invokes Ant Javadoc Task."
+  group "documentation"
+
+  dependsOn sourceSets.main.compileClasspath
+
+  inputs.property("linksource", "no")
+  inputs.property("linkJUnit", false)
+  inputs.property("linkHref", [])
+
+  inputs.files sourceSets.main.java.asFileTree
+  outputs.dir project.javadocRoot
+
+  doFirst {
+srcDirs.each { srcDir ->
+  ant.javadoc(
+  overview: file("${srcDir}/overview.html"),
+  packagenames: "org.apache.lucene.*,org.apache.solr.*",
+  destDir: project.javadocDestDir,
+  access: "protected",
+  encoding: "UTF-8",
+  charset: "UTF-8",
+  docencoding: "UTF-8",
+  noindex: "true",
+  includenosourcepackages: "true",
+  author: "true",
+  version: "true",
+  linksource: inputs.properties.linksource,
+  use: "true",
+  failonerror: "true",
+  locale: "en_US",
+  windowtitle: title,
+  doctitle: title,
+  maxmemory: "512m",
+  classpath: sourceSets.main.compileClasspath.asPath,
+  bottom: "Copyright  2000-${buildYear} Apache Software 
Foundation. All Rights Reserved."
+  ) {
+packageset(dir: srcDir)
+
+tag(name: "lucene.experimental", description: "WARNING: This API 
is experimental and might change in incompatible ways in the next release.")
+tag(name: "lucene.internal", description: "NOTE: This API is for 
internal purposes only and might change in incompatible ways in the next 
release.")
+tag(name: "lucene.spi", description: "SPI Name (Note: This is 
case-insensitive. e.g., if the name is 'htmlStrip', 'htmlstrip' can be used 
when looking up the service):", scope: "types")
+
+// resolve links to JavaSE and JUnit API
+link(offline: "true", href: 
"https://docs.oracle.com/en/java/javase/11/docs/api/;, packageListLoc: 
project(":lucene").file("tools/javadoc/java11/").toString())
+if (inputs.properties.get("linkJUnit")) {
+  link(offline: "true", href: 
"https://junit.org/junit4/javadoc/4.12/;, packageListLoc: 
project(":lucene").file("tools/javadoc/junit").toString())
+}
+// resolve inter-module links if 'linkHref' property is specified
+inputs.properties.get("linkHref").each { href ->
+  link(href: href)
+}
+
+arg(line: "--release 11")
+arg(line: "-Xdoclint:all,-missing")
+
+// force locale to be "en_US" (fix for: 
https://bugs.openjdk.java.net/browse/JDK-8222793)
+arg(line: "-J-Duser.language=en -J-Duser.country=US")
+  }
+}
+
+// append some special table css, prettify css
+ant.concat(destfile: "${javadocDestDir}/stylesheet.css", append: 
"true", fixlastline: "true", encoding: "UTF-8") {
+  filelist(dir: project(":lucene").file("tools/javadoc"), files: 
"table_padding.css")
+  filelist(dir: project(":lucene").file("tools/prettify"), files: 
"prettify.css")
+}
+// append prettify to scripts
+ant.concat(destfile: 

[jira] [Comment Edited] (SOLR-13264) unexpected autoscaling set-trigger response

2020-03-12 Thread Christof Lorenz (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057728#comment-17057728
 ] 

Christof Lorenz edited comment on SOLR-13264 at 3/12/20, 9:03 AM:
--

This is where the problem is, it looks like the belowOp and aboveOp are not 
being added to the validProperties:
{code:java}
public IndexSizeTrigger(String name) {
 super(TriggerEventType.INDEXSIZE, name);
 TriggerUtils.validProperties(validProperties,
 ABOVE_BYTES_PROP, ABOVE_DOCS_PROP, BELOW_BYTES_PROP, BELOW_DOCS_PROP, 
COLLECTIONS_PROP);
 }{code}
Without being able to define the Op the trigger is not usable at all.

I am currently working with 7.4 looking to update to 8.x


was (Author: lochri):
This is where the problem is, it looks like the belowOp and aboveOp are not 
being added to the validProperties:


{code:java}
public IndexSizeTrigger(String name) {
 super(TriggerEventType.INDEXSIZE, name);
 TriggerUtils.validProperties(validProperties,
 ABOVE_BYTES_PROP, ABOVE_DOCS_PROP, BELOW_BYTES_PROP, BELOW_DOCS_PROP, 
COLLECTIONS_PROP);
 }{code}

Without being able to define the Op the trigger is not usable at all.

> unexpected autoscaling set-trigger response
> ---
>
> Key: SOLR-13264
> URL: https://issues.apache.org/jira/browse/SOLR-13264
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-13264.patch, SOLR-13264.patch
>
>
> Steps to reproduce:
> {code}
> ./bin/solr start -cloud -noprompt
> ./bin/solr create -c demo -d _default -shards 1 -replicationFactor 1
> curl "http://localhost:8983/solr/admin/autoscaling; -d'
> {
>   "set-trigger" : {
> "name" : "index_size_trigger",
> "event" : "indexSize",
> "aboveDocs" : 12345,
> "aboveOp" : "SPLITSHARD",
> "enabled" : true,
> "actions" : [
>   {
> "name" : "compute_plan",
> "class": "solr.ComputePlanAction"
>   }
> ]
>   }
> }
> '
> ./bin/solr stop -all
> {code}
> The {{aboveOp}} is documented on 
> https://lucene.apache.org/solr/guide/7_6/solrcloud-autoscaling-triggers.html#index-size-trigger
>  and logically should be accepted (even though it is actually the default) 
> but unexpectedly an error message is returned {{"Error validating trigger 
> config index_size_trigger: 
> TriggerValidationException\{name=index_size_trigger, 
> details='\{aboveOp=unknown property\}'\}"}}.
> From a quick look it seems that in the {{IndexSizeTrigger}} constructor 
> additional values need to be passed to the {{TriggerUtils.validProperties}} 
> method i.e. aboveOp, belowOp and maybe others too i.e. 
> aboveSize/belowSize/etc. Illustrative patch to follow. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13264) unexpected autoscaling set-trigger response

2020-03-12 Thread Christof Lorenz (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057728#comment-17057728
 ] 

Christof Lorenz commented on SOLR-13264:


This is where the problem is, it looks like the belowOp and aboveOp are not 
being added to the validProperties:


{code:java}
public IndexSizeTrigger(String name) {
 super(TriggerEventType.INDEXSIZE, name);
 TriggerUtils.validProperties(validProperties,
 ABOVE_BYTES_PROP, ABOVE_DOCS_PROP, BELOW_BYTES_PROP, BELOW_DOCS_PROP, 
COLLECTIONS_PROP);
 }{code}

Without being able to define the Op the trigger is not usable at all.

> unexpected autoscaling set-trigger response
> ---
>
> Key: SOLR-13264
> URL: https://issues.apache.org/jira/browse/SOLR-13264
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-13264.patch, SOLR-13264.patch
>
>
> Steps to reproduce:
> {code}
> ./bin/solr start -cloud -noprompt
> ./bin/solr create -c demo -d _default -shards 1 -replicationFactor 1
> curl "http://localhost:8983/solr/admin/autoscaling; -d'
> {
>   "set-trigger" : {
> "name" : "index_size_trigger",
> "event" : "indexSize",
> "aboveDocs" : 12345,
> "aboveOp" : "SPLITSHARD",
> "enabled" : true,
> "actions" : [
>   {
> "name" : "compute_plan",
> "class": "solr.ComputePlanAction"
>   }
> ]
>   }
> }
> '
> ./bin/solr stop -all
> {code}
> The {{aboveOp}} is documented on 
> https://lucene.apache.org/solr/guide/7_6/solrcloud-autoscaling-triggers.html#index-size-trigger
>  and logically should be accepted (even though it is actually the default) 
> but unexpectedly an error message is returned {{"Error validating trigger 
> config index_size_trigger: 
> TriggerValidationException\{name=index_size_trigger, 
> details='\{aboveOp=unknown property\}'\}"}}.
> From a quick look it seems that in the {{IndexSizeTrigger}} constructor 
> additional values need to be passed to the {{TriggerUtils.validProperties}} 
> method i.e. aboveOp, belowOp and maybe others too i.e. 
> aboveSize/belowSize/etc. Illustrative patch to follow. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14300) Some conditional clauses on unindexed field will be ignored by query parser in some specific cases

2020-03-12 Thread Hongtai Xue (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057682#comment-17057682
 ] 

Hongtai Xue commented on SOLR-14300:


hi, I attached a patch to fix this issue.
h3. about bug

the if statement here is wrong.
{code:java}
 for (BooleanClause clause : clauses) {
 ...
 // NOTE, for query "B:1 OR B:2"
 // when parse come to "B:2" , 
 // filedValues here will not be null since "B:1" has been stored in 
fieldValues
 fieldValues = fmap.get(sfield); 
 ...
 if ((fieldValues == null && useTermsQuery) || !sfield.indexed()) {
 fieldValues = new ArrayList<>(2); // <-- here, if B is not indexed, 
fieldValues will be overwritten, and "B:1" will lost
 fmap.put(sfield, fieldValues);
 }
 ...
 }
{code}
please check comment above,

if sfield is not indexed, fieldValues will always be overwritten.
 even fieldValues is not null.

another question is why only "q=A:1 OR B:1 OR A:2 OR B:2" causes problem,
 but "q=A:1 OR A:2 OR B:1 OR B:2" is OK.

the answer is 
[here|https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L705].
 the bug code is only run when field change. if the fields are same in clause, 
nothing will happen.
h3. how to fix

so, obviously, it's a very simple bug, and we only changed one line to fix it. 
{code:java}
-if ((fieldValues == null && useTermsQuery) || !sfield.indexed()) {
+if (fieldValues == null && (useTermsQuery || !sfield.indexed())) {
{code}
fieldValues will only be initialized when it's null.
h3. test

we confirmed the issue is fixed. 
 the following queries get same results.
 * query1: 
[http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)=query]
{code:json}
  "debug":{
"rawquerystring":" (name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
"querystring":" (name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
"parsedquery":"cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 6e] 
TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",
"parsedquery_toString":"cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 69 
6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])",
"QParser":"LuceneQParser"}
{code}

 * query2: 
[http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)=query]
{code:json}
 "debug":{
"rawquerystring":" (name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
"querystring":" (name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
"parsedquery":"cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 6e] 
TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",
"parsedquery_toString":"cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 69 
6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])",
"QParser":"LuceneQParser"}}
{code}

 

> Some conditional clauses on unindexed field will be ignored by query parser 
> in some specific cases
> --
>
> Key: SOLR-14300
> URL: https://issues.apache.org/jira/browse/SOLR-14300
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.3, 7.4, 7.5, 7.6, 7.7, 8.0, 8.1, 8.2, 8.3, 8.4
> Environment: Solr 7.3.1 
> centos7.5
>Reporter: Hongtai Xue
>Priority: Minor
>  Labels: newbie, patch
> Fix For: 7.3, 7.4, 7.5, 7.6, 7.7, 8.0, 8.1, 8.2, 8.3, 8.4
>
> Attachments: SOLR-14300.patch
>
>
> In some specific cases, some conditional clauses on unindexed field will be 
> ignored
>  * for query like, q=A:1 OR B:1 OR A:2 OR B:2
>  if field B is not indexed(but docValues="true"), "B:1" will be lost.
>   
>  * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
>  it will work perfect.
> the only difference of two queries is that they are wrote in different orders.
>  one is *ABAB*, another is *AABB.*
>  
> *steps of reproduce*
>  you can easily reproduce this problem on a solr collection with _default 
> configset and exampledocs/books.csv data.
>  # create a _default collection
> {code:java}
> bin/solr create -c books -s 2 -rf 2{code}
>  # post books.csv.
> {code:java}
> bin/post -c books example/exampledocs/books.csv{code}
>  # run followed query.
>  ** query1: 
> [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)=query]
>  ** query2: 
> 

[jira] [Updated] (SOLR-14300) Some conditional clauses on unindexed field will be ignored by query parser in some specific cases

2020-03-12 Thread Hongtai Xue (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongtai Xue updated SOLR-14300:
---
Attachment: SOLR-14300.patch

> Some conditional clauses on unindexed field will be ignored by query parser 
> in some specific cases
> --
>
> Key: SOLR-14300
> URL: https://issues.apache.org/jira/browse/SOLR-14300
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.3, 7.4, 7.5, 7.6, 7.7, 8.0, 8.1, 8.2, 8.3, 8.4
> Environment: Solr 7.3.1 
> centos7.5
>Reporter: Hongtai Xue
>Priority: Minor
>  Labels: newbie, patch
> Fix For: 7.3, 7.4, 7.5, 7.6, 7.7, 8.0, 8.1, 8.2, 8.3, 8.4
>
> Attachments: SOLR-14300.patch
>
>
> In some specific cases, some conditional clauses on unindexed field will be 
> ignored
>  * for query like, q=A:1 OR B:1 OR A:2 OR B:2
>  if field B is not indexed(but docValues="true"), "B:1" will be lost.
>   
>  * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
>  it will work perfect.
> the only difference of two queries is that they are wrote in different orders.
>  one is *ABAB*, another is *AABB.*
>  
> *steps of reproduce*
>  you can easily reproduce this problem on a solr collection with _default 
> configset and exampledocs/books.csv data.
>  # create a _default collection
> {code:java}
> bin/solr create -c books -s 2 -rf 2{code}
>  # post books.csv.
> {code:java}
> bin/post -c books example/exampledocs/books.csv{code}
>  # run followed query.
>  ** query1: 
> [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)=query]
>  ** query2: 
> [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)=query]
>  ** then you can find the parsedqueries are different.
>  *** query1.  ("name_str:Foundation" is lost.)
> {code:json}
>  "debug":{
>      "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg 
> OR cat:cd)",
>      "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
> cat:cd)",
>      "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 
> 68 65 72 65 67]]))",
>      "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] 
> TO [4a 68 65 72 65 67]])",
>      "QParser":"LuceneQParser"}}{code}
>  *** query2.  ("name_str:Foundation" isn't lost.)
> {code:json}
>    "debug":{
>      "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book 
> OR cat:cd)",
>      "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
> cat:cd)",
>      "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
> 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO 
> [4a 68 65 72 65 67]])))",
>      "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 
> 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 
> 67] TO [4a 68 65 72 65 67]]))",
>      "QParser":"LuceneQParser"}{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14300) Some conditional clauses on unindexed field will be ignored by query parser in some specific cases

2020-03-12 Thread Hongtai Xue (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongtai Xue updated SOLR-14300:
---
Labels: newbie patch  (was: patch)

> Some conditional clauses on unindexed field will be ignored by query parser 
> in some specific cases
> --
>
> Key: SOLR-14300
> URL: https://issues.apache.org/jira/browse/SOLR-14300
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.3, 7.4, 7.5, 7.6, 7.7, 8.0, 8.1, 8.2, 8.3, 8.4
> Environment: Solr 7.3.1 
> centos7.5
>Reporter: Hongtai Xue
>Priority: Minor
>  Labels: newbie, patch
> Fix For: 7.3, 7.4, 7.5, 7.6, 7.7, 8.0, 8.1, 8.2, 8.3, 8.4
>
>
> In some specific cases, some conditional clauses on unindexed field will be 
> ignored
>  * for query like, q=A:1 OR B:1 OR A:2 OR B:2
>  if field B is not indexed(but docValues="true"), "B:1" will be lost.
>   
>  * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
>  it will work perfect.
> the only difference of two queries is that they are wrote in different orders.
>  one is *ABAB*, another is *AABB.*
>  
> *steps of reproduce*
>  you can easily reproduce this problem on a solr collection with _default 
> configset and exampledocs/books.csv data.
>  # create a _default collection
> {code:java}
> bin/solr create -c books -s 2 -rf 2{code}
>  # post books.csv.
> {code:java}
> bin/post -c books example/exampledocs/books.csv{code}
>  # run followed query.
>  ** query1: 
> [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)=query]
>  ** query2: 
> [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)=query]
>  ** then you can find the parsedqueries are different.
>  *** query1.  ("name_str:Foundation" is lost.)
> {code:json}
>  "debug":{
>      "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg 
> OR cat:cd)",
>      "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
> cat:cd)",
>      "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 
> 68 65 72 65 67]]))",
>      "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] 
> TO [4a 68 65 72 65 67]])",
>      "QParser":"LuceneQParser"}}{code}
>  *** query2.  ("name_str:Foundation" isn't lost.)
> {code:json}
>    "debug":{
>      "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book 
> OR cat:cd)",
>      "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
> cat:cd)",
>      "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
> 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO 
> [4a 68 65 72 65 67]])))",
>      "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 
> 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 
> 67] TO [4a 68 65 72 65 67]]))",
>      "QParser":"LuceneQParser"}{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14300) Some conditional clauses on unindexed field will be ignored by query parser in some specific cases

2020-03-12 Thread Hongtai Xue (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongtai Xue updated SOLR-14300:
---
Labels: patch  (was: )

> Some conditional clauses on unindexed field will be ignored by query parser 
> in some specific cases
> --
>
> Key: SOLR-14300
> URL: https://issues.apache.org/jira/browse/SOLR-14300
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.3, 7.4, 7.5, 7.6, 7.7, 8.0, 8.1, 8.2, 8.3, 8.4
> Environment: Solr 7.3.1 
> centos7.5
>Reporter: Hongtai Xue
>Priority: Minor
>  Labels: patch
> Fix For: 7.3, 7.4, 7.5, 7.6, 7.7, 8.0, 8.1, 8.2, 8.3, 8.4
>
>
> In some specific cases, some conditional clauses on unindexed field will be 
> ignored
>  * for query like, q=A:1 OR B:1 OR A:2 OR B:2
>  if field B is not indexed(but docValues="true"), "B:1" will be lost.
>   
>  * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
>  it will work perfect.
> the only difference of two queries is that they are wrote in different orders.
>  one is *ABAB*, another is *AABB.*
>  
> *steps of reproduce*
>  you can easily reproduce this problem on a solr collection with _default 
> configset and exampledocs/books.csv data.
>  # create a _default collection
> {code:java}
> bin/solr create -c books -s 2 -rf 2{code}
>  # post books.csv.
> {code:java}
> bin/post -c books example/exampledocs/books.csv{code}
>  # run followed query.
>  ** query1: 
> [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)=query]
>  ** query2: 
> [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)=query]
>  ** then you can find the parsedqueries are different.
>  *** query1.  ("name_str:Foundation" is lost.)
> {code:json}
>  "debug":{
>      "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg 
> OR cat:cd)",
>      "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
> cat:cd)",
>      "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 
> 68 65 72 65 67]]))",
>      "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] 
> TO [4a 68 65 72 65 67]])",
>      "QParser":"LuceneQParser"}}{code}
>  *** query2.  ("name_str:Foundation" isn't lost.)
> {code:json}
>    "debug":{
>      "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book 
> OR cat:cd)",
>      "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
> cat:cd)",
>      "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
> 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO 
> [4a 68 65 72 65 67]])))",
>      "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 
> 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 
> 67] TO [4a 68 65 72 65 67]]))",
>      "QParser":"LuceneQParser"}{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14300) Some conditional clauses on unindexed field will be ignored by query parser in some specific cases

2020-03-12 Thread Hongtai Xue (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongtai Xue updated SOLR-14300:
---
Fix Version/s: 7.3
   7.4
   7.5
   7.6
   7.7
   8.0
   8.1
   8.2
   8.3
   8.4
Affects Version/s: (was: 7.3.1)
   7.3
   7.4
   7.5
   7.6
   7.7
   8.0
   8.1
   8.2
   8.3
   8.4

> Some conditional clauses on unindexed field will be ignored by query parser 
> in some specific cases
> --
>
> Key: SOLR-14300
> URL: https://issues.apache.org/jira/browse/SOLR-14300
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.3, 7.4, 7.5, 7.6, 7.7, 8.0, 8.1, 8.2, 8.3, 8.4
> Environment: Solr 7.3.1 
> centos7.5
>Reporter: Hongtai Xue
>Priority: Minor
> Fix For: 7.3, 7.4, 7.5, 7.6, 7.7, 8.0, 8.1, 8.2, 8.3, 8.4
>
>
> In some specific cases, some conditional clauses on unindexed field will be 
> ignored
>  * for query like, q=A:1 OR B:1 OR A:2 OR B:2
>  if field B is not indexed(but docValues="true"), "B:1" will be lost.
>   
>  * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
>  it will work perfect.
> the only difference of two queries is that they are wrote in different orders.
>  one is *ABAB*, another is *AABB.*
>  
> *steps of reproduce*
>  you can easily reproduce this problem on a solr collection with _default 
> configset and exampledocs/books.csv data.
>  # create a _default collection
> {code:java}
> bin/solr create -c books -s 2 -rf 2{code}
>  # post books.csv.
> {code:java}
> bin/post -c books example/exampledocs/books.csv{code}
>  # run followed query.
>  ** query1: 
> [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)=query]
>  ** query2: 
> [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)=query]
>  ** then you can find the parsedqueries are different.
>  *** query1.  ("name_str:Foundation" is lost.)
> {code:json}
>  "debug":{
>      "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg 
> OR cat:cd)",
>      "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
> cat:cd)",
>      "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 
> 68 65 72 65 67]]))",
>      "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] 
> TO [4a 68 65 72 65 67]])",
>      "QParser":"LuceneQParser"}}{code}
>  *** query2.  ("name_str:Foundation" isn't lost.)
> {code:json}
>    "debug":{
>      "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book 
> OR cat:cd)",
>      "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
> cat:cd)",
>      "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
> 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO 
> [4a 68 65 72 65 67]])))",
>      "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 
> 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 
> 67] TO [4a 68 65 72 65 67]]))",
>      "QParser":"LuceneQParser"}{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down

2020-03-12 Thread Lyle (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lyle updated SOLR-14317:

Attachment: SOLR-14317.patch

> HttpClusterStateProvider throws exception when only one node down
> -
>
> Key: SOLR-14317
> URL: https://issues.apache.org/jira/browse/SOLR-14317
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.1, 7.7.2
>Reporter: Lyle
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-14317.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When create a CloudSolrClient with solrUrls, if the first url in the solrUrls 
> list is invalid or server is down, it will throw exception directly rather 
> than try remaining url.
> In 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65],
>  if fetchLiveNodes(initialClient) have any IOException, in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648],
>  exceptions will be caught and throw SolrServerException to the upper caller, 
> while no IOExceptioin will be caught in 
> HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200).
> The SolrServerException should be caught as well in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69],
>  so that if first node provided in solrUrs down, we can try to use the second 
> to fetch live nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down

2020-03-12 Thread Lyle (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lyle updated SOLR-14317:

Attachment: (was: SOLR-14317)

> HttpClusterStateProvider throws exception when only one node down
> -
>
> Key: SOLR-14317
> URL: https://issues.apache.org/jira/browse/SOLR-14317
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.1, 7.7.2
>Reporter: Lyle
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-14317.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When create a CloudSolrClient with solrUrls, if the first url in the solrUrls 
> list is invalid or server is down, it will throw exception directly rather 
> than try remaining url.
> In 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65],
>  if fetchLiveNodes(initialClient) have any IOException, in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648],
>  exceptions will be caught and throw SolrServerException to the upper caller, 
> while no IOExceptioin will be caught in 
> HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200).
> The SolrServerException should be caught as well in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69],
>  so that if first node provided in solrUrs down, we can try to use the second 
> to fetch live nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down

2020-03-12 Thread Lyle (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lyle updated SOLR-14317:

Attachment: SOLR-14317

> HttpClusterStateProvider throws exception when only one node down
> -
>
> Key: SOLR-14317
> URL: https://issues.apache.org/jira/browse/SOLR-14317
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.1, 7.7.2
>Reporter: Lyle
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-14317
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When create a CloudSolrClient with solrUrls, if the first url in the solrUrls 
> list is invalid or server is down, it will throw exception directly rather 
> than try remaining url.
> In 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65],
>  if fetchLiveNodes(initialClient) have any IOException, in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648],
>  exceptions will be caught and throw SolrServerException to the upper caller, 
> while no IOExceptioin will be caught in 
> HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200).
> The SolrServerException should be caught as well in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69],
>  so that if first node provided in solrUrs down, we can try to use the second 
> to fetch live nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org