[GitHub] [lucene] rmuir commented on pull request #816: LUCENE-10519: ThreadLocal.remove under G1GC takes 100% CPU

2022-04-25 Thread GitBox


rmuir commented on PR #816:
URL: https://github.com/apache/lucene/pull/816#issuecomment-1109349516

   See this is my problem, applications using threadpools of absurd sizes 
(unbounded, 1, etc), and then blaming lucene for their GC woes. at this 
point the DSM-IV is needed, because developers are in denial.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jaisonbi commented on pull request #816: LUCENE-10519: ThreadLocal.remove under G1GC takes 100% CPU

2022-04-25 Thread GitBox


jaisonbi commented on PR #816:
URL: https://github.com/apache/lucene/pull/816#issuecomment-1109236031

   This issue has no relationship with "creating and destroying too many 
threads", but it's caused by **too many "ThreadLocal" objects created from same 
threads**.
   
   Based on the implementation of ThreadLocal:
   1. All the "ThreadLocal" objects created from same thread will be stored in 
one ThreadLocalMap.
   2. Each thread has it's own ThreadLocalMap.
   
   The investigation on the composition of the "ThreadLocalMap": 
   
   "**management**" thread: Most entries(Each ThreadLocal object is wrapped 
into an Entry) are "CompressingStoredFieldsReader"
   
   ```
 final CloseableThreadLocal fieldsReaderLocal = new 
CloseableThreadLocal() {
   @Override
   protected StoredFieldsReader initialValue() {
 return fieldsReaderOrig.clone();
   }
 };
   ```
   
   "**write**" thread: Most entries are "PerThreadIDVersionAndSeqNoLookup"
   
   ```
 static final ConcurrentMap> lookupStates =
ConcurrentCollections.newConcurrentMapWithAggressiveConcurrency();
   ```
   
   Either "CompressingStoredFieldsReader" or "PerThreadIDVersionAndSeqNoLookup" 
is created by "CloseableThreadLocal". 
   
   ReentrantReadWriteLock will create it's own ThreadLocal object. So the 
ReentrantReadWrite lock aquired from "write" thread, it will share the same 
ThreadLocalMap with "PerThreadIDVersionAndSeqNoLookup".  There're too many 
entries stored into ThreadLocalMap, this is why "ThreadLocal#remove" running 
with high CPU usage.
   
   So I think the heavy usage of "ThreadLocal" from Lucene's current machanism 
is one major reason of this issue.
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-25 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527778#comment-17527778
 ] 

Chris M. Hostetter commented on LUCENE-10534:
-

Maybe a diff approach to speeding up these types of impls would be a new 
alterantive to {{MultiFunction.anyExists}} that callers could use to eliminate 
their own need to check exists on the sub-values? ...

{code}
  public static boolean someExists(int doc, FunctionValues[] values, boolean[] 
whoExists) throws IOException {
boolean someoneExists = false;
for (int i = 0; i < values.length; i++) {
  whoExists[i] = false;
  if (values[i].exists(doc)) {
someoneExists = true;
whoExists[i] = true;
  }
}
return someoneExists;
  }
{code}

...and then methods like {{MinFloatFunction.func}} can call {{someExists(...)} 
(instead of {{this.exists(...)}}) to restrict which (if any) of the {{valsArr) 
are candidates for being the min value (w/o needing to redundently call 
{{vals.exists(doc)}} on each o them again)

?

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-25 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527773#comment-17527773
 ] 

Chris M. Hostetter commented on LUCENE-10534:
-

{quote}This is only needed when 0.0f is returned and need to determine if it is 
a valid value or the not found case.
{quote}
I'm almost certain this statement is not true? ... I don't believe there is 
anything in the {{FunctionValues}} API that implies/garuntees that {{floatVal}} 
(or {{doubleVal}} or {{intVal}} etc...) will have a specific value if 
{{exists}} is false.

Conisder something like (shorthand) 
{{sum(field("fieled_name_that_does_not_exist"),const(42))}} ... I'm almost 
certain that {{intVal()}} is  going to return 42 for every doc, but 
{{exists()}} will return {{false}} for every doc.  (and if you use a field that 
exists for some docs but not others, you'll get the expected {{intValue()}} + 
{{exists()}} value for each.

We could potentially harden the {{FunctionValues}} API so that the values 
methods *MUST* return "0" if {{exists()}} returns false -- but that would shift 
complexity from methods like the {{exists()}} impl of min/max to the {{func()}} 
methods of all MultiFloatFunction subclasses

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10533) SpellChecker.formGrams is missing bounds check

2022-04-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10533:
--
Fix Version/s: 9.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> SpellChecker.formGrams is missing bounds check
> --
>
> Key: LUCENE-10533
> URL: https://issues.apache.org/jira/browse/LUCENE-10533
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If using Solr IndexBasedSpellChecker and spellcheck.q is empty the following 
> exception occurs (found in SOLR-16169). There is an argument that the caller 
> should not be invalid, but a simple bounds check would prevent this in Lucene.
> {code:java}
> null:java.lang.NegativeArraySizeException: -1
>   at 
> org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:438)
>   at 
> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:345)
>   at 
> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:147)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at org.eclipse.jetty.server.Server.handle(Server.java:505)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>   at 
> 

[jira] [Commented] (LUCENE-10533) SpellChecker.formGrams is missing bounds check

2022-04-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527740#comment-17527740
 ] 

ASF subversion and git services commented on LUCENE-10533:
--

Commit 4ddcd8a98da0d608ff5ddbeab5564f2c84a715d1 in lucene's branch 
refs/heads/branch_9x from Kevin Risden
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4ddcd8a98da ]

LUCENE-10533: SpellChecker.formGrams is missing bounds check (#836)



> SpellChecker.formGrams is missing bounds check
> --
>
> Key: LUCENE-10533
> URL: https://issues.apache.org/jira/browse/LUCENE-10533
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If using Solr IndexBasedSpellChecker and spellcheck.q is empty the following 
> exception occurs (found in SOLR-16169). There is an argument that the caller 
> should not be invalid, but a simple bounds check would prevent this in Lucene.
> {code:java}
> null:java.lang.NegativeArraySizeException: -1
>   at 
> org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:438)
>   at 
> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:345)
>   at 
> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:147)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at org.eclipse.jetty.server.Server.handle(Server.java:505)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>   at 
> 

[jira] [Updated] (LUCENE-10533) SpellChecker.formGrams is missing bounds check

2022-04-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10533:
--
Fix Version/s: 9.2
   (was: 9.1)

> SpellChecker.formGrams is missing bounds check
> --
>
> Key: LUCENE-10533
> URL: https://issues.apache.org/jira/browse/LUCENE-10533
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If using Solr IndexBasedSpellChecker and spellcheck.q is empty the following 
> exception occurs (found in SOLR-16169). There is an argument that the caller 
> should not be invalid, but a simple bounds check would prevent this in Lucene.
> {code:java}
> null:java.lang.NegativeArraySizeException: -1
>   at 
> org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:438)
>   at 
> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:345)
>   at 
> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:147)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at org.eclipse.jetty.server.Server.handle(Server.java:505)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>   at 
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>   at 
> 

[jira] [Commented] (LUCENE-10533) SpellChecker.formGrams is missing bounds check

2022-04-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527738#comment-17527738
 ] 

ASF subversion and git services commented on LUCENE-10533:
--

Commit 223a74fcb530f9554a3fc590aa8aacca42e40409 in lucene's branch 
refs/heads/main from Kevin Risden
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=223a74fcb53 ]

LUCENE-10533: SpellChecker.formGrams is missing bounds check (#836)



> SpellChecker.formGrams is missing bounds check
> --
>
> Key: LUCENE-10533
> URL: https://issues.apache.org/jira/browse/LUCENE-10533
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If using Solr IndexBasedSpellChecker and spellcheck.q is empty the following 
> exception occurs (found in SOLR-16169). There is an argument that the caller 
> should not be invalid, but a simple bounds check would prevent this in Lucene.
> {code:java}
> null:java.lang.NegativeArraySizeException: -1
>   at 
> org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:438)
>   at 
> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:345)
>   at 
> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:147)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at org.eclipse.jetty.server.Server.handle(Server.java:505)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>   at 
> 

[GitHub] [lucene] risdenk merged pull request #836: LUCENE-10533: SpellChecker.formGrams is missing bounds check

2022-04-25 Thread GitBox


risdenk merged PR #836:
URL: https://github.com/apache/lucene/pull/836


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-25 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527735#comment-17527735
 ] 

Kevin Risden commented on LUCENE-10534:
---

FWIW I am not 100% sure how to performance test this from within Lucene or 
luceneutil benchmarking. I didn't see any function query related performance 
stuff. I have some async-java-profiler flamegraphs from testing and see that 
the exists() call is very hot and slows down for min/max queries.

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Status: Patch Available  (was: Open)

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] risdenk opened a new pull request, #837: LUCENE-10534: MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-25 Thread GitBox


risdenk opened a new pull request, #837:
URL: https://github.com/apache/lucene/pull/837

   # Description
   
   `MinFloatFunction` and `MaxFloatFunction` currently check if values exist 
which can happen twice in the worst case.
   
   # Solution
   
   This rearranges the logic to only check if values exist when `0.0f` is 
returned to check if it is a real value from the document or there is no value 
in the document.
   
   # Tests
   
   The existing tests covered the logic cases for this.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check is slow

2022-04-25 Thread Kevin Risden (Jira)
Kevin Risden created LUCENE-10534:
-

 Summary: MinFloatFunction / MaxFloatFunction exists check is slow
 Key: LUCENE-10534
 URL: https://issues.apache.org/jira/browse/LUCENE-10534
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Kevin Risden
Assignee: Kevin Risden


MinFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
 and MaxFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
 both check if values exist. This is needed since the underlying valuesource 
returns 0.0f as either a valid value or as a value when the document doesn't 
have a value.

Even though this is changed to anyExists and short circuits in the case a value 
is found in any document, the worst case is that there is no value found and 
requires checking all the way through to the raw data. This is only needed when 
0.0f is returned and need to determine if it is a valid value or the not found 
case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Summary: MinFloatFunction / MaxFloatFunction exists check can be slow  
(was: MinFloatFunction / MaxFloatFunction exists check is slow)

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #836: LUCENE-10533: SpellChecker.formGrams is missing bounds check

2022-04-25 Thread GitBox


rmuir commented on PR #836:
URL: https://github.com/apache/lucene/pull/836#issuecomment-1108916100

   OK i looked into some surrounding behavior in suggest module:
   
   Looks like returning empty list is what DirectSpellChecker does:
   
   
https://github.com/apache/lucene/blob/main/lucene/suggest/src/java/org/apache/lucene/search/spell/DirectSpellChecker.java#L317-L320
   
   And it seems to be behavior of suggesters too?
   
   
https://github.com/apache/lucene/blob/main/lucene/suggest/src/test/org/apache/lucene/search/suggest/fst/TestFSTCompletion.java#L208-L211
   
   So given the choice, I think let's go with consistency and return empty 
list. sorry for the noise...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] risdenk commented on pull request #836: LUCENE-10533: SpellChecker.formGrams is missing bounds check

2022-04-25 Thread GitBox


risdenk commented on PR #836:
URL: https://github.com/apache/lucene/pull/836#issuecomment-1108904174

   Yea thats a 100% valid question and one I'm not sure what the right answer 
is. I tried to find a definition of what is expected here. I'm not tied to this 
solution, its just the least intrusive since there is no behavior change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #836: LUCENE-10533: SpellChecker.formGrams is missing bounds check

2022-04-25 Thread GitBox


rmuir commented on PR #836:
URL: https://github.com/apache/lucene/pull/836#issuecomment-1108899794

   Should we really return no-results if a user asks to spellcheck the empty 
string? Or maybe just a better, more obvious exception?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10533) SpellChecker.formGrams is missing bounds check

2022-04-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10533:
--
Status: Patch Available  (was: Open)

> SpellChecker.formGrams is missing bounds check
> --
>
> Key: LUCENE-10533
> URL: https://issues.apache.org/jira/browse/LUCENE-10533
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If using Solr IndexBasedSpellChecker and spellcheck.q is empty the following 
> exception occurs (found in SOLR-16169). There is an argument that the caller 
> should not be invalid, but a simple bounds check would prevent this in Lucene.
> {code:java}
> null:java.lang.NegativeArraySizeException: -1
>   at 
> org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:438)
>   at 
> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:345)
>   at 
> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:147)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at org.eclipse.jetty.server.Server.handle(Server.java:505)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>   at 
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>   at 
> 

[GitHub] [lucene] risdenk opened a new pull request, #836: LUCENE-10533: SpellChecker.formGrams is missing bounds check

2022-04-25 Thread GitBox


risdenk opened a new pull request, #836:
URL: https://github.com/apache/lucene/pull/836

   # Description
   
   SpellChecker.formGrams is missing a bounds check that results in 
NegativeArraySizeException in some cases.
   
   # Solution
   
   This adds a bounds check to SpellChecker.formGrams and ensures that 
SpellChecker has a proper upper bound on the size of the word being matched 
against. Also adds tests to check for this case.
   
   # Tests
   
   `./gradlew check`
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8836) Optimize DocValues TermsDict to continue scanning from the last position when possible

2022-04-25 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527666#comment-17527666
 ] 

Bruno Roustant commented on LUCENE-8836:


Thanks [~jpountz] for this simplified improvement!

I agree to mark this issue as resolved.

> Optimize DocValues TermsDict to continue scanning from the last position when 
> possible
> --
>
> Key: LUCENE-8836
> URL: https://issues.apache.org/jira/browse/LUCENE-8836
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>  Labels: docValues, optimization
> Fix For: 9.2
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Lucene80DocValuesProducer.TermsDict is used to lookup for either a term or a 
> term ordinal.
> Currently it does not have the optimization the FSTEnum has: to be able to 
> continue a sequential scan from where the last lookup was in the IndexInput. 
> For sparse lookups (when searching only a few terms or ordinal) it is not an 
> issue. But for multiple lookups in a row this optimization could save 
> re-scanning all the terms from the block start (since they are delat encoded).
> This patch proposes the optimization.
> To estimate the gain, we ran 3 Lucene tests while counting the seeks and the 
> term reads in the IndexInput, with and without the optimization:
> TestLucene70DocValuesFormat - the optimization saves 24% seeks and 15% term 
> reads.
> TestDocValuesQueries - the optimization adds 0.7% seeks and 0.003% term reads.
> TestDocValuesRewriteMethod.testRegexps - the optimization saves 71% seeks and 
> 82% term reads.
> In some cases, when scanning many terms in lexicographical order, the 
> optimization saves a lot. In some case, when only looking for some sparse 
> terms, the optimization does not bring improvement, but does not penalize 
> neither. It seems to be worth to always have it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10533) SpellChecker.formGrams is missing bounds check

2022-04-25 Thread Kevin Risden (Jira)
Kevin Risden created LUCENE-10533:
-

 Summary: SpellChecker.formGrams is missing bounds check
 Key: LUCENE-10533
 URL: https://issues.apache.org/jira/browse/LUCENE-10533
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Kevin Risden
Assignee: Kevin Risden


If using Solr IndexBasedSpellChecker and spellcheck.q is empty the following 
exception occurs (found in SOLR-16169). There is an argument that the caller 
should not be invalid, but a simple bounds check would prevent this in Lucene.

{code:java}
null:java.lang.NegativeArraySizeException: -1
at 
org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:438)
at 
org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:345)
at 
org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:147)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:505)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917)
at java.base/java.lang.Thread.run(Unknown Source)
{code}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: 

[GitHub] [lucene] yixunx commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges

2022-04-25 Thread GitBox


yixunx commented on PR #756:
URL: https://github.com/apache/lucene/pull/756#issuecomment-1108799875

   Sounds good, thank you for looking into this @iverase. I think it makes 
sense to merge this PR as you suggested, and I'll open separate tickets for the 
latest failing shape and for my questions about self intersections.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


rmuir commented on PR #834:
URL: https://github.com/apache/lucene/pull/834#issuecomment-1108562973

   Thank you for patching spotless and integrating the fix @dweiss !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #816: LUCENE-10519: ThreadLocal.remove under G1GC takes 100% CPU

2022-04-25 Thread GitBox


rmuir commented on PR #816:
URL: https://github.com/apache/lucene/pull/816#issuecomment-1108554479

   Thanks for getting the tests passing.
   
   I'm honestly torn on this issue, here are my thoughts:
   1. If you have 100% cpu in `java.lang.ThreadLocal.remove()`, you are 
creating and destroying **far too many*** threads. It is not lucene's fault.  
The application needs to be fixed to not churn out so many threads. This 
problem is unique to the java ecosystem, I don't see it with any other 
programming languages. Again, there are two problems: a) developers using far 
too many threads (e.g. solr defaulting to `1`!!!), and b) developers using 
"resizable thread pools" (min != max), which doesn't help anything and only 
magnifies these kinds of issues. The resizable ones just cause constant "churn" 
and GC pressure, and I know tomcat, jetty, etc love to default to that, but its 
so inefficient. Java developers should use reasonable fixed-sized threadpools 
just like everyone else does. Lucene can't fix these apps, they need to be 
fixed themselves.
   2. I don't like that lucene now has basically a reimplementation of 
ThreadLocal in its codebase to cater to the problems of such bad applications. 
IMO, we are a search engine library, we should use `java.lang.ThreadLocal` and 
if apps use threads in an insane way, they should get the OOM that they deserve.
   3. CloseableThreadLocal is confusing because it is doing two different 
things as opposed to java Threadlocal. At least let's consider updating the 
javadocs. The major differences are a) explicit close() for all threads, 
basically this clears the map when the storage is not needed anymore, and b) 
faster purging of values for threads that are "dead but not yet GC'd" via 
periodic liveliness check.
   4. That being said, I don't have real opposition to this patch, but I want 
us to be careful about correctness. I am also concerned about not hurting the 
performance of well-behaved apps that don't do bad things. I'm not the best one 
to review the concurrency/performance implications, as I only have a small 
2-core laptop and I can barely remember how Java language works. But let's not 
punish apps that use threads reasonably. I'm not concerned about performance of 
badly behaved apps, because they need to fix how they use threads, and we can't 
help them do that.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


dweiss commented on PR #834:
URL: https://github.com/apache/lucene/pull/834#issuecomment-1108529546

   Thanks @uschindler 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss merged pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


dweiss merged PR #834:
URL: https://github.com/apache/lucene/pull/834


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a diff in pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


dweiss commented on code in PR #834:
URL: https://github.com/apache/lucene/pull/834#discussion_r857583962


##
gradle/validation/spotless.gradle:
##
@@ -110,10 +110,8 @@ configure(project(":lucene").subprojects) { prj ->
 check.dependsOn v
 v.dependsOn ":checkJdkInternalsExportedToGradle"
   }
-}
 
-gradle.taskGraph.afterTask { Task task, TaskState state ->
-  if (task.name == 'spotlessJavaCheck' && state.failure) {
-throw new GradleException("\n\n*PLEASE RUN 
./gradlew tidy!*\n");
+  tasks.matching { task -> task.name == "spotlessJavaCheck" }.configureEach {

Review Comment:
   Those tasks.matching and withType are often required - they're dynamic 
collections and invoke the configuration block for objects that satisfy the 
criteria but are created later (or elsewhere) during configuration. It's a way 
to break dependency cycles between script evaluation, plugins, task creation, 
etc. I don't think there is a nicer way to do many of the things these blocks 
are currently used for.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a diff in pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


dweiss commented on code in PR #834:
URL: https://github.com/apache/lucene/pull/834#discussion_r857581902


##
gradle/validation/spotless.gradle:
##
@@ -110,10 +110,8 @@ configure(project(":lucene").subprojects) { prj ->
 check.dependsOn v
 v.dependsOn ":checkJdkInternalsExportedToGradle"
   }
-}
 
-gradle.taskGraph.afterTask { Task task, TaskState state ->
-  if (task.name == 'spotlessJavaCheck' && state.failure) {
-throw new GradleException("\n\n*PLEASE RUN 
./gradlew tidy!*\n");
+  tasks.matching { task -> task.name == "spotlessJavaCheck" }.configureEach {

Review Comment:
   Ah... I see what you mean. No, this won't work. This property is defined on 
a concrete task type and not on the DSL extension (from which the various 
spotless tasks are dynamically created).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10386) Add BOM module for ease of dependency management in dependent projects

2022-04-25 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527459#comment-17527459
 ] 

Dawid Weiss commented on LUCENE-10386:
--

Hi Petr. I saw the PR but I'm not following all the changes happening there. I 
honestly just prefer dead simple verbosity... Will take another look in a spare 
minute though, unless somebody beats me to it.

> Add BOM module for ease of dependency management in dependent projects
> --
>
> Key: LUCENE-10386
> URL: https://issues.apache.org/jira/browse/LUCENE-10386
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: general/build
>Affects Versions: 9.0, 8.4, 8.11.1
>Reporter: Petr Portnov
>Priority: Trivial
>  Labels: BOM, Dependencies
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h1. Short description
> Add a Bill-of-Materials (BOM) module to Lucene to allow foreign projects to 
> use it for dependency management.
> h1. Reasoning
> [A lot of|https://mvnrepository.com/search?q=bom] multi-module projects are 
> providing BOMs in order to simplify dependency management. This allows 
> dependant projects to only specify the version of the BOM module while 
> declaring the dependencies without them (as the will be provided by BOM).
> For example:
> {code:groovy}
> dependencies {
> // Only specify the version of the BOM
> implementation platform('com.fasterxml.jackson:jackson-bom:2.13.1')
> // Don't specify dependency versions as they are provided by the BOM
> implementation "com.fasterxml.jackson.core:jackson-annotations"
> implementation "com.fasterxml.jackson.core:jackson-core"
> implementation "com.fasterxml.jackson.core:jackson-databind"
> implementation "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-guava"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jdk8"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jsr310"
> implementation 
> "com.fasterxml.jackson.module:jackson-module-parameter-names"
> }{code}
>  
> Not only is this approach "popular" but it also has the following pros:
>  * there is no need to declare a variable (via Maven properties or Gradle 
> ext) to hold the version
>  * this is more automation-friendly because tools like Dependabot only have 
> to update the single version per dependency group
> h1. Other suggestions
> It may be reasonable to also publish BOMs for old versions so that the 
> projects which currently rely on older Lucene versions (such as 8.4) can 
> migrate to the BOM approach without migrating to Lucene 9.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


uschindler commented on code in PR #834:
URL: https://github.com/apache/lucene/pull/834#discussion_r857577924


##
gradle/validation/spotless.gradle:
##
@@ -110,10 +110,8 @@ configure(project(":lucene").subprojects) { prj ->
 check.dependsOn v
 v.dependsOn ":checkJdkInternalsExportedToGradle"
   }
-}
 
-gradle.taskGraph.afterTask { Task task, TaskState state ->
-  if (task.name == 'spotlessJavaCheck' && state.failure) {
-throw new GradleException("\n\n*PLEASE RUN 
./gradlew tidy!*\n");
+  tasks.matching { task -> task.name == "spotlessJavaCheck" }.configureEach {

Review Comment:
   Thats also fine. I meant to move it up to the configuration logic next to 
"spotlessJava" task (or is this the extensions?). No worries - all fine. I am 
just a bit against those "tasks.matching", "tasks.withType" everywhere and 
would rather be explicit with Gradle DSL syntax.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #835: Fix JVM error branch logic.

2022-04-25 Thread GitBox


dweiss commented on PR #835:
URL: https://github.com/apache/lucene/pull/835#issuecomment-1108513010

   I think I made a logic mistake when I looked at what Robert pushed to the sh 
one, my fault.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss merged pull request #835: Fix JVM error branch logic.

2022-04-25 Thread GitBox


dweiss merged PR #835:
URL: https://github.com/apache/lucene/pull/835


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10386) Add BOM module for ease of dependency management in dependent projects

2022-04-25 Thread Petr Portnov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527451#comment-17527451
 ] 

Petr Portnov commented on LUCENE-10386:
---

Hi there!

I've opened a PR with the implementation of this proposal: 
[https://github.com/apache/lucene/pull/830|https://github.com/apache/lucene/pull/830.]

> Add BOM module for ease of dependency management in dependent projects
> --
>
> Key: LUCENE-10386
> URL: https://issues.apache.org/jira/browse/LUCENE-10386
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: general/build
>Affects Versions: 9.0, 8.4, 8.11.1
>Reporter: Petr Portnov
>Priority: Trivial
>  Labels: BOM, Dependencies
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h1. Short description
> Add a Bill-of-Materials (BOM) module to Lucene to allow foreign projects to 
> use it for dependency management.
> h1. Reasoning
> [A lot of|https://mvnrepository.com/search?q=bom] multi-module projects are 
> providing BOMs in order to simplify dependency management. This allows 
> dependant projects to only specify the version of the BOM module while 
> declaring the dependencies without them (as the will be provided by BOM).
> For example:
> {code:groovy}
> dependencies {
> // Only specify the version of the BOM
> implementation platform('com.fasterxml.jackson:jackson-bom:2.13.1')
> // Don't specify dependency versions as they are provided by the BOM
> implementation "com.fasterxml.jackson.core:jackson-annotations"
> implementation "com.fasterxml.jackson.core:jackson-core"
> implementation "com.fasterxml.jackson.core:jackson-databind"
> implementation "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-guava"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jdk8"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jsr310"
> implementation 
> "com.fasterxml.jackson.module:jackson-module-parameter-names"
> }{code}
>  
> Not only is this approach "popular" but it also has the following pros:
>  * there is no need to declare a variable (via Maven properties or Gradle 
> ext) to hold the version
>  * this is more automation-friendly because tools like Dependabot only have 
> to update the single version per dependency group
> h1. Other suggestions
> It may be reasonable to also publish BOMs for old versions so that the 
> projects which currently rely on older Lucene versions (such as 8.4) can 
> migrate to the BOM approach without migrating to Lucene 9.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


dweiss commented on PR #834:
URL: https://github.com/apache/lucene/pull/834#issuecomment-1108498260

   This fixes the script logic for windows: 
https://github.com/apache/lucene/pull/835


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


dweiss commented on PR #834:
URL: https://github.com/apache/lucene/pull/834#issuecomment-1108492891

   Yeah, I have a fix already.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


rmuir commented on PR #834:
URL: https://github.com/apache/lucene/pull/834#issuecomment-1108490605

   I took a look, it seems to me at a lance:
   * any error from WrapperDownloader will go to `:fail` section via an 
explicit GOTO: `IF %ERRORLEVEL% NEQ 0 goto fail`
   * any error from gradlew itself will fall through to `:fail` section simply 
because it is located above it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


dweiss commented on PR #834:
URL: https://github.com/apache/lucene/pull/834#issuecomment-1108489133

   Will take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


rmuir commented on PR #834:
URL: https://github.com/apache/lucene/pull/834#issuecomment-1108487568

   > I just noticed that any gradle error now returns the 'something went 
wrong' message though.
   
   This only happens on windows. Something is off with the `%ERRORLEVEL%` stuff 
in the .bat file I think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #832: LUCENE-10523: remove @Slow annotation

2022-04-25 Thread GitBox


dweiss commented on PR #832:
URL: https://github.com/apache/lucene/pull/832#issuecomment-1108486770

   I'm fine with this. Reasons for Slow (and other test groups) are various. I 
use Slow in projects where certain tests are indeed slow by nature - have to 
unpack the distribution/ fork processes or start networking layers. These are 
typically integration tests. They run on the CI but they're not mandatory for 
local developer runs. I don't think this is needed in Lucene either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a diff in pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


dweiss commented on code in PR #834:
URL: https://github.com/apache/lucene/pull/834#discussion_r857552731


##
gradle/validation/spotless.gradle:
##
@@ -110,10 +110,8 @@ configure(project(":lucene").subprojects) { prj ->
 check.dependsOn v
 v.dependsOn ":checkJdkInternalsExportedToGradle"
   }
-}
 
-gradle.taskGraph.afterTask { Task task, TaskState state ->
-  if (task.name == 'spotlessJavaCheck' && state.failure) {
-throw new GradleException("\n\n*PLEASE RUN 
./gradlew tidy!*\n");
+  tasks.matching { task -> task.name == "spotlessJavaCheck" }.configureEach {

Review Comment:
   I moved it up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges

2022-04-25 Thread GitBox


iverase commented on PR #756:
URL: https://github.com/apache/lucene/pull/756#issuecomment-1108472913

   Hi @yixunx,
   
   I had a look to the polygon and it seems to be a totally different type of 
error and I would prefer to handle it as a separate issue. In the meanwhile I 
propose to push this change as it is as it contains good fixes for the other 
cases, what do you think?
   
   >By the way I have a tangential question: the Tessellator javadoc says the 
polygon cannot have self-intersections, but it seems that the tessellator runs 
fine on shapes with self intersections unless I set checkSelfIntersections = 
true
   
   I was not the original contributor for this class so I cannot answer that 
question on the javadocs and I have always found estrange that javadocs says 
that polygons should not have self-intersections and then we have a method 
called `cureLocalIntersections` which it seems to expect self-intersections 
(currently this method never gets exercise on the test and I am always 
considering if it should be removed).  I wonder if this method is making your 
polygons to succeed.
   
   I would porpoase to open another issue with this subject so we can capture 
the intention of the javadocs and see if we should take any action.
   
 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


uschindler commented on code in PR #834:
URL: https://github.com/apache/lucene/pull/834#discussion_r857519990


##
gradle/validation/spotless.gradle:
##
@@ -110,10 +110,8 @@ configure(project(":lucene").subprojects) { prj ->
 check.dependsOn v
 v.dependsOn ":checkJdkInternalsExportedToGradle"
   }
-}
 
-gradle.taskGraph.afterTask { Task task, TaskState state ->
-  if (task.name == 'spotlessJavaCheck' && state.failure) {
-throw new GradleException("\n\n*PLEASE RUN 
./gradlew tidy!*\n");
+  tasks.matching { task -> task.name == "spotlessJavaCheck" }.configureEach {

Review Comment:
   Should we not move this above where the other tasks are configured?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #834: Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint.

2022-04-25 Thread GitBox


dweiss commented on PR #834:
URL: https://github.com/apache/lucene/pull/834#issuecomment-1108438352

   I removed the hack and updated spotless. Works fine for me. I just noticed 
that any gradle error now returns the 'something went wrong' message though. 
   
   
![image](https://user-images.githubusercontent.com/199470/165078694-d8a83afe-6ff1-43d8-be3a-d0a371c69667.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10493) Can we unify the viterbi search logic in the tokenizers of kuromoji and nori?

2022-04-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527428#comment-17527428
 ] 

ASF subversion and git services commented on LUCENE-10493:
--

Commit c89f8a7ea1e7dfa64ab6d85c22dcbb977f8e09d0 in lucene's branch 
refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c89f8a7ea1e ]

LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and 
nori (#805)



> Can we unify the viterbi search logic in the tokenizers of kuromoji and nori?
> -
>
> Key: LUCENE-10493
> URL: https://issues.apache.org/jira/browse/LUCENE-10493
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We now have common dictionary interfaces for kuromoji and nori 
> ([LUCENE-10393]). A natural question would be: is it possible to unify the 
> Japanese/Korean tokenizers? 
> The core methods of the two tokenizers are `parse()` and `backtrace()` to 
> calculate the minimum cost path by Viterbi search. I'd set the goal of this 
> issue to factoring out them into a separate class (in analysis-common) that 
> is shared between JapaneseTokenizer and KoreanTokenizer. 
> The algorithm to solve the minimum cost path itself is of course 
> language-agnostic, so I think it should be theoretically possible; the most 
> difficult part here might be the N-best path calculation - which is supported 
> only by JapaneseTokenizer and not by KoreanTokenizer.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta merged pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori

2022-04-25 Thread GitBox


mocobeta merged PR #805:
URL: https://github.com/apache/lucene/pull/805


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori

2022-04-25 Thread GitBox


mocobeta commented on PR #805:
URL: https://github.com/apache/lucene/pull/805#issuecomment-1108428861

   I'm merging this and will observe the nightly benchmark (this shouldn't 
affect the analysis performance).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10529) TestTaxonomyFacetAssociations may have floating point issues

2022-04-25 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527419#comment-17527419
 ] 

Tomoko Uchida commented on LUCENE-10529:


I saw the same test fails but with a different cause (NPE). Reproducible 
command:

{code}
./gradlew :lucene:facet:test --tests 
"org.apache.lucene.facet.taxonomy.TestTaxonomyFacetAssociations.testFloatAssociationRandom"
 -Ptests.jvms=1 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 
-Ptests.seed=DB1825817AA4CAF3

org.apache.lucene.facet.taxonomy.TestTaxonomyFacetAssociations > test suite's 
output saved to 
/mnt/hdd/repo/lucene/lucene/facet/build/test-results/test/outputs/OUTPUT-org.apache.lucene.facet.taxonomy.TestTaxonomyFacetAssociations.txt,
 copied below:
   > java.lang.NullPointerException: Cannot read field "dim" because 
"facetResult" is null
   > at 
__randomizedtesting.SeedInfo.seed([DB1825817AA4CAF3:8821D130A8C55B45]:0)
   > at 
org.apache.lucene.facet.taxonomy.TestTaxonomyFacetAssociations.validateFloats(TestTaxonomyFacetAssociations.java:454)
{code}

This might be another issue.

> TestTaxonomyFacetAssociations may have floating point issues
> 
>
> Key: LUCENE-10529
> URL: https://issues.apache.org/jira/browse/LUCENE-10529
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>
> Hit this in a jenkins CI build while testing something else:
> {noformat}
> gradlew test --tests TestTaxonomyFacetAssociations.testFloatAssociationRandom 
> -Dtests.seed=B39C450F4870F7F1 -Dtests.locale=ar-IQ 
> -Dtests.timezone=America/Rankin_Inlet -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> ...
> org.apache.lucene.facet.taxonomy.TestTaxonomyFacetAssociations > 
> testFloatAssociationRandom FAILED
> java.lang.AssertionError: expected:<2605996.5> but was:<2605995.2>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8836) Optimize DocValues TermsDict to continue scanning from the last position when possible

2022-04-25 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8836.
--
Fix Version/s: 9.2
   Resolution: Fixed

I merged a change that only improves lookupOrd, and not seekCeil like the 
previous patch did, but I'm still inclined to mark this issue as resolved. 
Let's improve seekCeil in a follow-up if there's appetite for it?

> Optimize DocValues TermsDict to continue scanning from the last position when 
> possible
> --
>
> Key: LUCENE-8836
> URL: https://issues.apache.org/jira/browse/LUCENE-8836
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>  Labels: docValues, optimization
> Fix For: 9.2
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Lucene80DocValuesProducer.TermsDict is used to lookup for either a term or a 
> term ordinal.
> Currently it does not have the optimization the FSTEnum has: to be able to 
> continue a sequential scan from where the last lookup was in the IndexInput. 
> For sparse lookups (when searching only a few terms or ordinal) it is not an 
> issue. But for multiple lookups in a row this optimization could save 
> re-scanning all the terms from the block start (since they are delat encoded).
> This patch proposes the optimization.
> To estimate the gain, we ran 3 Lucene tests while counting the seeks and the 
> term reads in the IndexInput, with and without the optimization:
> TestLucene70DocValuesFormat - the optimization saves 24% seeks and 15% term 
> reads.
> TestDocValuesQueries - the optimization adds 0.7% seeks and 0.003% term reads.
> TestDocValuesRewriteMethod.testRegexps - the optimization saves 71% seeks and 
> 82% term reads.
> In some cases, when scanning many terms in lexicographical order, the 
> optimization saves a lot. In some case, when only looking for some sparse 
> terms, the optimization does not bring improvement, but does not penalize 
> neither. It seems to be worth to always have it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] boicehuang commented on pull request #816: LUCENE-10519: ThreadLocal.remove under G1GC takes 100% CPU

2022-04-25 Thread GitBox


boicehuang commented on PR #816:
URL: https://github.com/apache/lucene/pull/816#issuecomment-1108216412

   > `./gradlew check` still fails.
   
I have successfully run `gradle :lucene:core:spotlessApply` and `gradle 
:lucene:check` locally.  Please run the CI again, thanks.
BTW, sorry for bothering you, can you give me the right to start workflows? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8836) Optimize DocValues TermsDict to continue scanning from the last position when possible

2022-04-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527350#comment-17527350
 ] 

ASF subversion and git services commented on LUCENE-8836:
-

Commit 975f2392c63c7b9b6b5ecbede1dcd4c87e96cc79 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=975f2392c63 ]

LUCENE-8836: Speed up TermsEnum#lookupOrd on increasing sequences of ords. 
(#827)



> Optimize DocValues TermsDict to continue scanning from the last position when 
> possible
> --
>
> Key: LUCENE-8836
> URL: https://issues.apache.org/jira/browse/LUCENE-8836
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>  Labels: docValues, optimization
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Lucene80DocValuesProducer.TermsDict is used to lookup for either a term or a 
> term ordinal.
> Currently it does not have the optimization the FSTEnum has: to be able to 
> continue a sequential scan from where the last lookup was in the IndexInput. 
> For sparse lookups (when searching only a few terms or ordinal) it is not an 
> issue. But for multiple lookups in a row this optimization could save 
> re-scanning all the terms from the block start (since they are delat encoded).
> This patch proposes the optimization.
> To estimate the gain, we ran 3 Lucene tests while counting the seeks and the 
> term reads in the IndexInput, with and without the optimization:
> TestLucene70DocValuesFormat - the optimization saves 24% seeks and 15% term 
> reads.
> TestDocValuesQueries - the optimization adds 0.7% seeks and 0.003% term reads.
> TestDocValuesRewriteMethod.testRegexps - the optimization saves 71% seeks and 
> 82% term reads.
> In some cases, when scanning many terms in lexicographical order, the 
> optimization saves a lot. In some case, when only looking for some sparse 
> terms, the optimization does not bring improvement, but does not penalize 
> neither. It seems to be worth to always have it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8836) Optimize DocValues TermsDict to continue scanning from the last position when possible

2022-04-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527349#comment-17527349
 ] 

ASF subversion and git services commented on LUCENE-8836:
-

Commit 2a4c21bb586c2c5afb8550b88cbfd9dd15d433c5 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2a4c21bb586 ]

LUCENE-8836: Speed up TermsEnum#lookupOrd on increasing sequences of ords. 
(#827)



> Optimize DocValues TermsDict to continue scanning from the last position when 
> possible
> --
>
> Key: LUCENE-8836
> URL: https://issues.apache.org/jira/browse/LUCENE-8836
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>  Labels: docValues, optimization
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Lucene80DocValuesProducer.TermsDict is used to lookup for either a term or a 
> term ordinal.
> Currently it does not have the optimization the FSTEnum has: to be able to 
> continue a sequential scan from where the last lookup was in the IndexInput. 
> For sparse lookups (when searching only a few terms or ordinal) it is not an 
> issue. But for multiple lookups in a row this optimization could save 
> re-scanning all the terms from the block start (since they are delat encoded).
> This patch proposes the optimization.
> To estimate the gain, we ran 3 Lucene tests while counting the seeks and the 
> term reads in the IndexInput, with and without the optimization:
> TestLucene70DocValuesFormat - the optimization saves 24% seeks and 15% term 
> reads.
> TestDocValuesQueries - the optimization adds 0.7% seeks and 0.003% term reads.
> TestDocValuesRewriteMethod.testRegexps - the optimization saves 71% seeks and 
> 82% term reads.
> In some cases, when scanning many terms in lexicographical order, the 
> optimization saves a lot. In some case, when only looking for some sparse 
> terms, the optimization does not bring improvement, but does not penalize 
> neither. It seems to be worth to always have it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #827: LUCENE-8836: Speed up TermsEnum#lookupOrd on increasing sequences of ords.

2022-04-25 Thread GitBox


jpountz merged PR #827:
URL: https://github.com/apache/lucene/pull/827


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader

2022-04-25 Thread GitBox


jpountz commented on code in PR #833:
URL: https://github.com/apache/lucene/pull/833#discussion_r857318802


##
lucene/core/src/java/org/apache/lucene/index/ExitableDirectoryReader.java:
##
@@ -323,6 +325,35 @@ public int nextDoc() throws IOException {
   : sortedSetDocValues;
 }
 
+@Override
+public VectorValues getVectorValues(String field) throws IOException {
+  final VectorValues vectorValues = in.getVectorValues(field);
+  if (vectorValues == null) {
+return null;
+  }
+  return (queryTimeout.isTimeoutEnabled())
+  ? new ExitableVectorValues(vectorValues)
+  : vectorValues;
+}
+
+@Override
+public TopDocs searchNearestVectors(
+String field, float[] target, int k, Bits acceptDocs, int 
visitedLimit) throws IOException {
+  // nocommit - sampling needed?
+  if (queryTimeout.shouldExit()) {
+throw new ExitingReaderException(
+"The request took too long to search nearest vectors. Timeout: "
++ queryTimeout.toString()
++ ", Reader="
++ in);
+  } else if (Thread.interrupted()) {
+throw new ExitingReaderException(
+"Interrupted while searching nearest vectors. Reader=" + in);
+  }
+
+  return in.searchNearestVectors(field, target, k, acceptDocs, 
visitedLimit);

Review Comment:
   Maybe we should wrap `acceptDocs` and check the query timeout on every N 
calls to `Bits#get`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org