[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2022-06-08 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551745#comment-17551745
 ] 

Kevin Risden commented on LUCENE-4574:
--

Ugh actually just after I sent that I found a reproducing case in Solr - I 
don't know if this is Solr specific. Still digging. bf=functionquery and 
sort=SOME_FIELD and fl=score it looks like. TopFieldCollector.populateScores is 
the secondary score calculation after MaxScore is already trying to calculate 
the scores.

> FunctionQuery ValueSource value computed twice per document
> ---
>
> Key: LUCENE-4574
> URL: https://issues.apache.org/jira/browse/LUCENE-4574
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0, 4.1
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-4574.patch, LUCENE-4574.patch, LUCENE-4574.patch, 
> LUCENE-4574.patch, Test_for_LUCENE-4574.patch
>
>
> I was working on a custom ValueSource and did some basic profiling and 
> debugging to see if it was being used optimally.  To my surprise, the value 
> was being fetched twice per document in a row.  This computation isn't 
> exactly cheap to calculate so this is a big problem.  I was able to 
> work-around this problem trivially on my end by caching the last value with 
> corresponding docid in my FunctionValues implementation.
> Here is an excerpt of the code path to the first execution:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> And here is the 2nd call:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
> at 
> org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> The 2nd call appears to use some score caching mechanism, which is all well 
> and good, but that same mechanism wasn't used in the first call so there's no 
> cached value to retrieve.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2022-06-08 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551732#comment-17551732
 ] 

Kevin Risden commented on LUCENE-4574:
--

[~dsmiley] / [~jpountz] - is it possible this was addressed by LUCENE-6263 (and 
somewhat related LUCENE-8405)? I don't see this behavior currently. Even 
requesting score, the score is now cached so don't compute the function twice.

> FunctionQuery ValueSource value computed twice per document
> ---
>
> Key: LUCENE-4574
> URL: https://issues.apache.org/jira/browse/LUCENE-4574
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0, 4.1
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-4574.patch, LUCENE-4574.patch, LUCENE-4574.patch, 
> LUCENE-4574.patch, Test_for_LUCENE-4574.patch
>
>
> I was working on a custom ValueSource and did some basic profiling and 
> debugging to see if it was being used optimally.  To my surprise, the value 
> was being fetched twice per document in a row.  This computation isn't 
> exactly cheap to calculate so this is a big problem.  I was able to 
> work-around this problem trivially on my end by caching the last value with 
> corresponding docid in my FunctionValues implementation.
> Here is an excerpt of the code path to the first execution:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> And here is the 2nd call:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
> at 
> org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> The 2nd call appears to use some score caching mechanism, which is all well 
> and good, but that same mechanism wasn't used in the first call so there's no 
> cached value to retrieve.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-17 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538164#comment-17538164
 ] 

Kevin Risden commented on LUCENE-10576:
---

This is marked as won't fix since some reasonable items were brought up on the 
PR - https://github.com/apache/lucene/pull/895

> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177]
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
>  * LUCENE-6437
>  * LUCENE-6119
>  * LUCENE-5951
>  ** Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
> Runtime.getRuntime().availableProcessors()/2));"
>  ** 
> https://github.com/apache/lucene/commit/33410e30c1af7105a6b8b922255af047d13be626#diff-ceb8ec6fe5807682cfb691a8ec52bcc672fb7c5eeb6922c80da4c075f7f003c8R147



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-17 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177]
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
>  * LUCENE-6437
>  * LUCENE-6119
>  * LUCENE-5951
>  ** Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
> Runtime.getRuntime().availableProcessors()/2));"
>  ** 
> https://github.com/apache/lucene/commit/33410e30c1af7105a6b8b922255af047d13be626#diff-ceb8ec6fe5807682cfb691a8ec52bcc672fb7c5eeb6922c80da4c075f7f003c8R147



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden reassigned LUCENE-10576:
-

Assignee: Kevin Risden

> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177]
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
>  * LUCENE-6437
>  * LUCENE-6119
>  * LUCENE-5951
>  ** Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
> Runtime.getRuntime().availableProcessors()/2));"
>  ** 
> https://github.com/apache/lucene/commit/33410e30c1af7105a6b8b922255af047d13be626#diff-ceb8ec6fe5807682cfb691a8ec52bcc672fb7c5eeb6922c80da4c075f7f003c8R147



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Status: Patch Available  (was: Open)

> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177]
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
>  * LUCENE-6437
>  * LUCENE-6119
>  * LUCENE-5951
>  ** Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
> Runtime.getRuntime().availableProcessors()/2));"
>  ** 
> https://github.com/apache/lucene/commit/33410e30c1af7105a6b8b922255af047d13be626#diff-ceb8ec6fe5807682cfb691a8ec52bcc672fb7c5eeb6922c80da4c075f7f003c8R147



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Description: 
[https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177]
{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}
This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:
{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}
which then simplifies to
{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}
So that you have a minimum of 4 maxThreadCount and max of coreCount/2.

Based on the history I could find, this has been this way forever.
 * LUCENE-6437
 * LUCENE-6119
 * LUCENE-5951
 ** Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
Runtime.getRuntime().availableProcessors()/2));"
 ** 
https://github.com/apache/lucene/commit/33410e30c1af7105a6b8b922255af047d13be626#diff-ceb8ec6fe5807682cfb691a8ec52bcc672fb7c5eeb6922c80da4c075f7f003c8R147

  was:
[https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177]
{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}
This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:
{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}
which then simplifies to
{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}
So that you have a minimum of 4 maxThreadCount and max of coreCount/2.

Based on the history I could find, this has been this way forever.
 * LUCENE-6437
 * LUCENE-6119
 * LUCENE-5951
 ** Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
Runtime.getRuntime().availableProcessors()/2));"


> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Minor
>
> [https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177]
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
>  * LUCENE-6437
>  * LUCENE-6119
>  * LUCENE-5951
>  ** Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
> Runtime.getRuntime().availableProcessors()/2));"
>  ** 
> https://github.com/apache/lucene/commit/33410e30c1af7105a6b8b922255af047d13be626#diff-ceb8ec6fe5807682cfb691a8ec52bcc672fb7c5eeb6922c80da4c075f7f003c8R147



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Description: 
[https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177]
{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}
This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:
{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}
which then simplifies to
{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}
So that you have a minimum of 4 maxThreadCount and max of coreCount/2.

Based on the history I could find, this has been this way forever.
 * LUCENE-6437
 * LUCENE-6119
 * LUCENE-5951
 ** Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
Runtime.getRuntime().availableProcessors()/2));"

  was:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the history I could find, this has been this way forever.
* LUCENE-6437
* LUCENE-6119
* LUCENE-5951
* Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
Runtime.getRuntime().availableProcessors()/2));"


> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Minor
>
> [https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177]
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
>  * LUCENE-6437
>  * LUCENE-6119
>  * LUCENE-5951
>  ** Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
> Runtime.getRuntime().availableProcessors()/2));"



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Description: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the history I could find, this has been this way forever.
* LUCENE-6437
* LUCENE-6119
* LUCENE-5951

  was:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the history I could find, this has been this way forever.
* LUCENE-6437
* LUCENE-6119


> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Minor
>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
> * LUCENE-6437
> * LUCENE-6119
> * LUCENE-5951



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Description: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the history I could find, this has been this way forever.
* LUCENE-6437
* LUCENE-6119
* LUCENE-5951
* Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
Runtime.getRuntime().availableProcessors()/2));"

  was:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the history I could find, this has been this way forever.
* LUCENE-6437
* LUCENE-6119
* LUCENE-5951


> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Minor
>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
> * LUCENE-6437
> * LUCENE-6119
> * LUCENE-5951
> * Introduced as "maxThreadCount = Math.max(1, Math.min(3, 
> Runtime.getRuntime().availableProcessors()/2));"



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Description: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the history I could find, this has been this way forever.
* LUCENE-6437
* LUCENE-6119

  was:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the history I could find, this has been this way forever.
* LUCENE-6437
* 


> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Minor
>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
> * LUCENE-6437
> * LUCENE-6119



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Description: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the history I could find, this has been this way forever.
* LUCENE-6437
* 

  was:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the history I could find, this has been this way forever.
* TODO


> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Minor
>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
> * LUCENE-6437
> * 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Priority: Minor  (was: Major)

> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Minor
>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Description: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the history I could find, this has been this way forever.
* TODO

  was:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the 


> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Minor
>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the history I could find, this has been this way forever.
> * TODO



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Description: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.





Based on the 

  was:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.


> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Major
>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.
> 
> Based on the 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10576:
--
Description: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.

If I understand it looks like 1 and 4 are mixed up and should instead be:

{code:java}
maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
{code}

which then simplifies to

{code:java}
maxThreadCount = Math.max(4, coreCount / 2);
{code}

So that you have a minimum of 4 maxThreadCount and max of coreCount/2.

  was:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.




> ConcurrentMergeScheduler maxThreadCount calculation is artificially low
> ---
>
> Key: LUCENE-10576
> URL: https://issues.apache.org/jira/browse/LUCENE-10576
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Priority: Major
>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177
> {code:java}
> maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
> {code}
> This has a practical limit of max of 4 threads due to the Math.min. This 
> doesn't take into account higher coreCount.
> I can't seem to tell if this is by design or this is just a mix up of logic 
> during the calculation.
> If I understand it looks like 1 and 4 are mixed up and should instead be:
> {code:java}
> maxThreadCount = Math.max(4, Math.min(1, coreCount / 2));
> {code}
> which then simplifies to
> {code:java}
> maxThreadCount = Math.max(4, coreCount / 2);
> {code}
> So that you have a minimum of 4 maxThreadCount and max of coreCount/2.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10576) ConcurrentMergeScheduler maxThreadCount calculation is artificially low

2022-05-16 Thread Kevin Risden (Jira)
Kevin Risden created LUCENE-10576:
-

 Summary: ConcurrentMergeScheduler maxThreadCount calculation is 
artificially low
 Key: LUCENE-10576
 URL: https://issues.apache.org/jira/browse/LUCENE-10576
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Kevin Risden


https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L177

{code:java}
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
{code}

This has a practical limit of max of 4 threads due to the Math.min. This 
doesn't take into account higher coreCount.

I can't seem to tell if this is by design or this is just a mix up of logic 
during the calculation.





--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction calls exists twice

2022-05-02 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Fix Version/s: 9.2

> MinFloatFunction / MaxFloatFunction calls exists twice
> --
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 9.2
>
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist twice. This change prevents the duplicate exists 
> check.
> Tested with JMH here: https://github.com/risdenk/lucene-jmh
> | Benchmark   | Mode  | 
> Cnt | Score and Error  | Units |
> |-|---|-|--|---|
> | MyBenchmark.testMaxFloatFunction| thrpt | 
> 25  | 64.159  ±  2.031 | ops/s |
> | MyBenchmark.testNewMaxFloatFunction | thrpt | 
> 25  | 94.997  ±  2.365 | ops/s |
> | MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 
> 25  | 244.921 ±  6.439 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 
> 25  | 239.288 ±  5.136 | ops/s |



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction calls exists twice

2022-05-02 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~hossman] appreciate the reviews and comments.

> MinFloatFunction / MaxFloatFunction calls exists twice
> --
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist twice. This change prevents the duplicate exists 
> check.
> Tested with JMH here: https://github.com/risdenk/lucene-jmh
> | Benchmark   | Mode  | 
> Cnt | Score and Error  | Units |
> |-|---|-|--|---|
> | MyBenchmark.testMaxFloatFunction| thrpt | 
> 25  | 64.159  ±  2.031 | ops/s |
> | MyBenchmark.testNewMaxFloatFunction | thrpt | 
> 25  | 94.997  ±  2.365 | ops/s |
> | MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 
> 25  | 244.921 ±  6.439 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 
> 25  | 239.288 ±  5.136 | ops/s |



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction calls exists twice

2022-05-02 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530809#comment-17530809
 ] 

Kevin Risden commented on LUCENE-10534:
---

Thanks I just pushed changes to simplify the tests based on the sum example. I 
forgot about that example and it definitely is a valid test.

{code:java}
I assume these test changes pass w/o your code changes? (
{code}

yes - checked before and after tests pass. Checked that my invalid assumption 
around returning zero on exists FAILS the new tests as well. 

{quote}you could also include a specific test of the "real world" example{quote}

Thanks that is much better - fixed and removed the contrived example. 

{code:java}
that's just you "re-formatting" the order of the checks to be consistent (so 
that exists is always checked before hits) correct?
{code}

correct. Tried to make it consistent.

> MinFloatFunction / MaxFloatFunction calls exists twice
> --
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist twice. This change prevents the duplicate exists 
> check.
> Tested with JMH here: https://github.com/risdenk/lucene-jmh
> | Benchmark   | Mode  | 
> Cnt | Score and Error  | Units |
> |-|---|-|--|---|
> | MyBenchmark.testMaxFloatFunction| thrpt | 
> 25  | 64.159  ±  2.031 | ops/s |
> | MyBenchmark.testNewMaxFloatFunction | thrpt | 
> 25  | 94.997  ±  2.365 | ops/s |
> | MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 
> 25  | 244.921 ±  6.439 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 
> 25  | 239.288 ±  5.136 | ops/s |



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction calls exists twice

2022-05-02 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530743#comment-17530743
 ] 

Kevin Risden commented on LUCENE-10534:
---

[~hossman] I added some edge case tests to 
https://github.com/apache/lucene/pull/837

> MinFloatFunction / MaxFloatFunction calls exists twice
> --
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist twice. This change prevents the duplicate exists 
> check.
> Tested with JMH here: https://github.com/risdenk/lucene-jmh
> | Benchmark   | Mode  | 
> Cnt | Score and Error  | Units |
> |-|---|-|--|---|
> | MyBenchmark.testMaxFloatFunction| thrpt | 
> 25  | 64.159  ±  2.031 | ops/s |
> | MyBenchmark.testNewMaxFloatFunction | thrpt | 
> 25  | 94.997  ±  2.365 | ops/s |
> | MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 
> 25  | 244.921 ±  6.439 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 
> 25  | 239.288 ±  5.136 | ops/s |



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10542) FieldSource exists implementations can avoid value retrieval

2022-04-29 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10542:
--
Fix Version/s: 9.2
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> FieldSource exists implementations can avoid value retrieval
> 
>
> Key: LUCENE-10542
> URL: https://issues.apache.org/jira/browse/LUCENE-10542
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 9.2
>
> Attachments: flamegraph_getValueForDoc.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While looking at LUCENE-10534, found that *FieldSource exists implementation 
> after LUCENE-7407 can avoid value lookup when just checking for exists.
> Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
> axis = stack trace bottom being first call top being last call
> Looking only at the left most getValueForDoc highlight only (and it helps to 
> make it bigger or download the original)
> !flamegraph_getValueForDoc.png|height=410,width=1000!
> LongFieldSource#exists spends MOST of its time doing a 
> LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its 
> time doing two things primarily:
> * FilterNumericDocValues#longValue()
> * advance()
> This makes sense based on looking at the code (copied below to make it easier 
> to see at once) 
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72
> {code:java}
>   private long getValueForDoc(int doc) throws IOException {
> if (doc < lastDocID) {
>   throw new IllegalArgumentException(
>   "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
> docID=" + doc);
> }
> lastDocID = doc;
> int curDocID = arr.docID();
> if (doc > curDocID) {
>   curDocID = arr.advance(doc);
> }
> if (doc == curDocID) {
>   return arr.longValue();
> } else {
>   return 0;
> }
>   }
> {code}
> LongFieldSource#exists - doesn't care about the actual longValue. Just that 
> there was a value found when iterating through the doc values.
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95
> {code:java}
>   @Override
>   public boolean exists(int doc) throws IOException {
> getValueForDoc(doc);
> return arr.docID() == doc;
>   }
> {code}
> So putting this all together for exists calling getValueForDoc, we spent ~50% 
> of the time trying to get the long value when we don't need it in exists. We 
> can save that 50% of time making exists not care about the actual value and 
> just return if doc == curDocID basically.
> This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
> exists() is being called a bunch. Eventually the value will be needed from 
> longVal(), but if we call exists() say 3 times for every longVal(), we are 
> spending a lot of time computing the value when we only need to check for 
> existence.
> I found the same pattern in DoubleFieldSource, EnumFieldSource, 
> FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
> showing what this would look like:
> 
> Simple JMH performance tests comparing the original FloatFieldSource to the 
> new ones from PR #847.
>  
> | Benchmark   | Mode  | 
> Cnt | Score and Error  | Units |
> |-|---|-|--|---|
> | MyBenchmark.testMaxFloatFunction| thrpt | 
> 25  | 64.159  ±  2.031 | ops/s |
> | MyBenchmark.testNewMaxFloatFunction | thrpt | 
> 25  | 94.997  ±  2.365 | ops/s |
> | MyBenchmark.testMaxFloatFunctionNewFloatFieldSource | thrpt | 
> 25  | 123.191 ±  9.291 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSource  | thrpt | 
> 25  | 123.817 ±  6.191 | ops/s |
> | MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 
> 25  | 244.921 ±  6.439 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 
> 25  | 239.288 ±  5.136 | ops/s |
> | MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField| thrpt | 
> 25  | 271.521 ±  3.870 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSourceRareField | thrpt | 
> 25  | 279.334 ± 10.511 | ops/s |
> Source: https://github.com/risdenk/lucene-jmh



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (LUCENE-10542) FieldSource exists implementations can avoid value retrieval

2022-04-29 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10542:
--
Summary: FieldSource exists implementations can avoid value retrieval  
(was: FieldSource exists implementation can avoid value retrieval)

> FieldSource exists implementations can avoid value retrieval
> 
>
> Key: LUCENE-10542
> URL: https://issues.apache.org/jira/browse/LUCENE-10542
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph_getValueForDoc.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While looking at LUCENE-10534, found that *FieldSource exists implementation 
> after LUCENE-7407 can avoid value lookup when just checking for exists.
> Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
> axis = stack trace bottom being first call top being last call
> Looking only at the left most getValueForDoc highlight only (and it helps to 
> make it bigger or download the original)
> !flamegraph_getValueForDoc.png|height=410,width=1000!
> LongFieldSource#exists spends MOST of its time doing a 
> LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its 
> time doing two things primarily:
> * FilterNumericDocValues#longValue()
> * advance()
> This makes sense based on looking at the code (copied below to make it easier 
> to see at once) 
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72
> {code:java}
>   private long getValueForDoc(int doc) throws IOException {
> if (doc < lastDocID) {
>   throw new IllegalArgumentException(
>   "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
> docID=" + doc);
> }
> lastDocID = doc;
> int curDocID = arr.docID();
> if (doc > curDocID) {
>   curDocID = arr.advance(doc);
> }
> if (doc == curDocID) {
>   return arr.longValue();
> } else {
>   return 0;
> }
>   }
> {code}
> LongFieldSource#exists - doesn't care about the actual longValue. Just that 
> there was a value found when iterating through the doc values.
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95
> {code:java}
>   @Override
>   public boolean exists(int doc) throws IOException {
> getValueForDoc(doc);
> return arr.docID() == doc;
>   }
> {code}
> So putting this all together for exists calling getValueForDoc, we spent ~50% 
> of the time trying to get the long value when we don't need it in exists. We 
> can save that 50% of time making exists not care about the actual value and 
> just return if doc == curDocID basically.
> This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
> exists() is being called a bunch. Eventually the value will be needed from 
> longVal(), but if we call exists() say 3 times for every longVal(), we are 
> spending a lot of time computing the value when we only need to check for 
> existence.
> I found the same pattern in DoubleFieldSource, EnumFieldSource, 
> FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
> showing what this would look like:
> 
> Simple JMH performance tests comparing the original FloatFieldSource to the 
> new ones from PR #847.
>  
> | Benchmark   | Mode  | 
> Cnt | Score and Error  | Units |
> |-|---|-|--|---|
> | MyBenchmark.testMaxFloatFunction| thrpt | 
> 25  | 64.159  ±  2.031 | ops/s |
> | MyBenchmark.testNewMaxFloatFunction | thrpt | 
> 25  | 94.997  ±  2.365 | ops/s |
> | MyBenchmark.testMaxFloatFunctionNewFloatFieldSource | thrpt | 
> 25  | 123.191 ±  9.291 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSource  | thrpt | 
> 25  | 123.817 ±  6.191 | ops/s |
> | MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 
> 25  | 244.921 ±  6.439 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 
> 25  | 239.288 ±  5.136 | ops/s |
> | MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField| thrpt | 
> 25  | 271.521 ±  3.870 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSourceRareField | thrpt | 
> 25  | 279.334 ± 10.511 | ops/s |
> Source: https://github.com/risdenk/lucene-jmh



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction calls exists twice

2022-04-29 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Description: 
MinFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
 and MaxFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
 both check if values exist twice. This change prevents the duplicate exists 
check.

Tested with JMH here: https://github.com/risdenk/lucene-jmh

| Benchmark   | Mode  | Cnt 
| Score and Error  | Units |
|-|---|-|--|---|
| MyBenchmark.testMaxFloatFunction| thrpt | 25  
| 64.159  ±  2.031 | ops/s |
| MyBenchmark.testNewMaxFloatFunction | thrpt | 25  
| 94.997  ±  2.365 | ops/s |
| MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 25  
| 244.921 ±  6.439 | ops/s |
| MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 25  
| 239.288 ±  5.136 | ops/s |

  was:MinFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
 and MaxFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
 both check if values exist twice. This change prevents the duplicate exists 
check.


> MinFloatFunction / MaxFloatFunction calls exists twice
> --
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist twice. This change prevents the duplicate exists 
> check.
> Tested with JMH here: https://github.com/risdenk/lucene-jmh
> | Benchmark   | Mode  | 
> Cnt | Score and Error  | Units |
> |-|---|-|--|---|
> | MyBenchmark.testMaxFloatFunction| thrpt | 
> 25  | 64.159  ±  2.031 | ops/s |
> | MyBenchmark.testNewMaxFloatFunction | thrpt | 
> 25  | 94.997  ±  2.365 | ops/s |
> | MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 
> 25  | 244.921 ±  6.439 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 
> 25  | 239.288 ±  5.136 | ops/s |



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction calls exists twice

2022-04-29 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Summary: MinFloatFunction / MaxFloatFunction calls exists twice  (was: 
MinFloatFunction / MaxFloatFunction exists check can be slow)

> MinFloatFunction / MaxFloatFunction calls exists twice
> --
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist twice. This change prevents the duplicate exists 
> check.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-29 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Description: MinFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
 and MaxFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
 both check if values exist twice. This change prevents the duplicate exists 
check.  (was: MinFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
 and MaxFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
 both check if values exist. This is needed since the underlying valuesource 
returns 0.0f as either a valid value or as a value when the document doesn't 
have a value.

Even though this is changed to anyExists and short circuits in the case a value 
is found in any document, the worst case is that there is no value found and 
requires checking all the way through to the raw data. This is only needed when 
0.0f is returned and need to determine if it is a valid value or the not found 
case.)

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist twice. This change prevents the duplicate exists 
> check.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-29 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530141#comment-17530141
 ] 

Kevin Risden commented on LUCENE-10542:
---

Thanks [~rcmuir] thats a good idea.

> FieldSource exists implementation can avoid value retrieval
> ---
>
> Key: LUCENE-10542
> URL: https://issues.apache.org/jira/browse/LUCENE-10542
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph_getValueForDoc.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While looking at LUCENE-10534, found that *FieldSource exists implementation 
> after LUCENE-7407 can avoid value lookup when just checking for exists.
> Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
> axis = stack trace bottom being first call top being last call
> Looking only at the left most getValueForDoc highlight only (and it helps to 
> make it bigger or download the original)
> !flamegraph_getValueForDoc.png|height=410,width=1000!
> LongFieldSource#exists spends MOST of its time doing a 
> LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its 
> time doing two things primarily:
> * FilterNumericDocValues#longValue()
> * advance()
> This makes sense based on looking at the code (copied below to make it easier 
> to see at once) 
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72
> {code:java}
>   private long getValueForDoc(int doc) throws IOException {
> if (doc < lastDocID) {
>   throw new IllegalArgumentException(
>   "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
> docID=" + doc);
> }
> lastDocID = doc;
> int curDocID = arr.docID();
> if (doc > curDocID) {
>   curDocID = arr.advance(doc);
> }
> if (doc == curDocID) {
>   return arr.longValue();
> } else {
>   return 0;
> }
>   }
> {code}
> LongFieldSource#exists - doesn't care about the actual longValue. Just that 
> there was a value found when iterating through the doc values.
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95
> {code:java}
>   @Override
>   public boolean exists(int doc) throws IOException {
> getValueForDoc(doc);
> return arr.docID() == doc;
>   }
> {code}
> So putting this all together for exists calling getValueForDoc, we spent ~50% 
> of the time trying to get the long value when we don't need it in exists. We 
> can save that 50% of time making exists not care about the actual value and 
> just return if doc == curDocID basically.
> This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
> exists() is being called a bunch. Eventually the value will be needed from 
> longVal(), but if we call exists() say 3 times for every longVal(), we are 
> spending a lot of time computing the value when we only need to check for 
> existence.
> I found the same pattern in DoubleFieldSource, EnumFieldSource, 
> FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
> showing what this would look like:
> 
> Simple JMH performance tests comparing the original FloatFieldSource to the 
> new ones from PR #847.
>  
> | Benchmark   | Mode  | 
> Cnt | Score and Error  | Units |
> |-|---|-|--|---|
> | MyBenchmark.testMaxFloatFunction| thrpt | 
> 25  | 64.159  ±  2.031 | ops/s |
> | MyBenchmark.testNewMaxFloatFunction | thrpt | 
> 25  | 94.997  ±  2.365 | ops/s |
> | MyBenchmark.testMaxFloatFunctionNewFloatFieldSource | thrpt | 
> 25  | 123.191 ±  9.291 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSource  | thrpt | 
> 25  | 123.817 ±  6.191 | ops/s |
> | MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 
> 25  | 244.921 ±  6.439 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 
> 25  | 239.288 ±  5.136 | ops/s |
> | MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField| thrpt | 
> 25  | 271.521 ±  3.870 | ops/s |
> | MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSourceRareField | thrpt | 
> 25  | 279.334 ± 10.511 | ops/s |
> Source: https://github.com/risdenk/lucene-jmh



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To 

[jira] [Comment Edited] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-29 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529572#comment-17529572
 ] 

Kevin Risden edited comment on LUCENE-10534 at 4/29/22 5:15 PM:


Updated metrics - there is a benefit to the new maxfloatfunction logic to avoid 
duplicate exists() even with using the new fieldsource stuff in LUCENE-10542. 

| Benchmark   | Mode  | Cnt 
| Score and Error  | Units |
|-|---|-|--|---|
| MyBenchmark.testMaxFloatFunction| thrpt | 25  
| 64.159  ±  2.031 | ops/s |
| MyBenchmark.testNewMaxFloatFunction | thrpt | 25  
| 94.997  ±  2.365 | ops/s |
| MyBenchmark.testMaxFloatFunctionNewFloatFieldSource | thrpt | 25  
| 123.191 ±  9.291 | ops/s |
| MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSource  | thrpt | 25  
| 123.817 ±  6.191 | ops/s |
| MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 25  
| 244.921 ±  6.439 | ops/s |
| MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 25  
| 239.288 ±  5.136 | ops/s |
| MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField| thrpt | 25  
| 271.521 ±  3.870 | ops/s |
| MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSourceRareField | thrpt | 25  
| 279.334 ± 10.511 | ops/s |


was (Author: risdenk):
Updated metrics - there is a benefit to the new maxfloatfunction logic to avoid 
duplicate exists() even with using the new fieldsource stuff in LUCENE-10542. 

| Benchmark   | Mode  | Cnt 
| Score and Error  | Units |
|-|---|-|--|---|
| MyBenchmark.testMaxFloatFunction| thrpt | 25  
| 69.949 ± 4.043   | ops/s |
| MyBenchmark.testMaxFloatFunctionNewFloatFieldSource | thrpt | 25  
| 112.326 ± 3.228  | ops/s |
| MyBenchmark.testNewMaxFloatFunction | thrpt | 25  
| 93.216 ± 2.757   | ops/s |
| MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSource  | thrpt | 25  
| 123.364 ± 7.861  | ops/s |
| MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 25  
| 257.339 ± 33.849 | ops/s |
| MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField| thrpt | 25  
| 287.175 ± 22.840 | ops/s |
| MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 25  
| 235.268 ± 4.103  | ops/s |
| MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSourceRareField | thrpt | 25  
| 272.397 ± 8.406  | ops/s |

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-29 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10542:
--
Description: 
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
axis = stack trace bottom being first call top being last call

Looking only at the left most getValueForDoc highlight only (and it helps to 
make it bigger or download the original)

!flamegraph_getValueForDoc.png|height=410,width=1000!

LongFieldSource#exists spends MOST of its time doing a 
LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its time 
doing two things primarily:
* FilterNumericDocValues#longValue()
* advance()

This makes sense based on looking at the code (copied below to make it easier 
to see at once) 
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72

{code:java}
  private long getValueForDoc(int doc) throws IOException {
if (doc < lastDocID) {
  throw new IllegalArgumentException(
  "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
docID=" + doc);
}
lastDocID = doc;
int curDocID = arr.docID();
if (doc > curDocID) {
  curDocID = arr.advance(doc);
}
if (doc == curDocID) {
  return arr.longValue();
} else {
  return 0;
}
  }
{code}

LongFieldSource#exists - doesn't care about the actual longValue. Just that 
there was a value found when iterating through the doc values.
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95

{code:java}
  @Override
  public boolean exists(int doc) throws IOException {
getValueForDoc(doc);
return arr.docID() == doc;
  }
{code}

So putting this all together for exists calling getValueForDoc, we spent ~50% 
of the time trying to get the long value when we don't need it in exists. We 
can save that 50% of time making exists not care about the actual value and 
just return if doc == curDocID basically.

This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
exists() is being called a bunch. Eventually the value will be needed from 
longVal(), but if we call exists() say 3 times for every longVal(), we are 
spending a lot of time computing the value when we only need to check for 
existence.

I found the same pattern in DoubleFieldSource, EnumFieldSource, 
FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
showing what this would look like:



Simple JMH performance tests comparing the original FloatFieldSource to the new 
ones from PR #847.
 
| Benchmark   | Mode  | Cnt 
| Score and Error  | Units |
|-|---|-|--|---|
| MyBenchmark.testMaxFloatFunction| thrpt | 25  
| 64.159  ±  2.031 | ops/s |
| MyBenchmark.testNewMaxFloatFunction | thrpt | 25  
| 94.997  ±  2.365 | ops/s |
| MyBenchmark.testMaxFloatFunctionNewFloatFieldSource | thrpt | 25  
| 123.191 ±  9.291 | ops/s |
| MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSource  | thrpt | 25  
| 123.817 ±  6.191 | ops/s |
| MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 25  
| 244.921 ±  6.439 | ops/s |
| MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 25  
| 239.288 ±  5.136 | ops/s |
| MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField| thrpt | 25  
| 271.521 ±  3.870 | ops/s |
| MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSourceRareField | thrpt | 25  
| 279.334 ± 10.511 | ops/s |

Source: https://github.com/risdenk/lucene-jmh

  was:
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
axis = stack trace bottom being first call top being last call

Looking only at the left most getValueForDoc highlight only (and it helps to 
make it bigger or download the original)

!flamegraph_getValueForDoc.png|height=410,width=1000!

LongFieldSource#exists spends MOST of its time doing a 
LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its time 
doing two things primarily:
* FilterNumericDocValues#longValue()
* advance()

This makes sense based on looking at the code (copied below to make it easier 
to see at once) 

[jira] [Updated] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-28 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10542:
--
Description: 
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
axis = stack trace bottom being first call top being last call

Looking only at the left most getValueForDoc highlight only (and it helps to 
make it bigger or download the original)

!flamegraph_getValueForDoc.png|height=410,width=1000!

LongFieldSource#exists spends MOST of its time doing a 
LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its time 
doing two things primarily:
* FilterNumericDocValues#longValue()
* advance()

This makes sense based on looking at the code (copied below to make it easier 
to see at once) 
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72

{code:java}
  private long getValueForDoc(int doc) throws IOException {
if (doc < lastDocID) {
  throw new IllegalArgumentException(
  "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
docID=" + doc);
}
lastDocID = doc;
int curDocID = arr.docID();
if (doc > curDocID) {
  curDocID = arr.advance(doc);
}
if (doc == curDocID) {
  return arr.longValue();
} else {
  return 0;
}
  }
{code}

LongFieldSource#exists - doesn't care about the actual longValue. Just that 
there was a value found when iterating through the doc values.
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95

{code:java}
  @Override
  public boolean exists(int doc) throws IOException {
getValueForDoc(doc);
return arr.docID() == doc;
  }
{code}

So putting this all together for exists calling getValueForDoc, we spent ~50% 
of the time trying to get the long value when we don't need it in exists. We 
can save that 50% of time making exists not care about the actual value and 
just return if doc == curDocID basically.

This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
exists() is being called a bunch. Eventually the value will be needed from 
longVal(), but if we call exists() say 3 times for every longVal(), we are 
spending a lot of time computing the value when we only need to check for 
existence.

I found the same pattern in DoubleFieldSource, EnumFieldSource, 
FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
showing what this would look like:



Simple JMH performance tests comparing the original FloatFieldSource to the new 
ones from PR #847.
 
| Benchmark   | Mode  | Cnt 
| Score and Error  | Units |
|-|---|-|--|---|
| MyBenchmark.testMaxFloatFunction| thrpt | 25  
| 69.949 ± 4.043   | ops/s |
| MyBenchmark.testMaxFloatFunctionNewFloatFieldSource | thrpt | 25  
| 112.326 ± 3.228  | ops/s |
| MyBenchmark.testNewMaxFloatFunction | thrpt | 25  
| 93.216 ± 2.757   | ops/s |
| MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSource  | thrpt | 25  
| 123.364 ± 7.861  | ops/s |
| MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 25  
| 257.339 ± 33.849 | ops/s |
| MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField| thrpt | 25  
| 287.175 ± 22.840 | ops/s |
| MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 25  
| 235.268 ± 4.103  | ops/s |
| MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSourceRareField | thrpt | 25  
| 272.397 ± 8.406  | ops/s |

Source: https://github.com/risdenk/lucene-jmh

  was:
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
axis = stack trace bottom being first call top being last call

Looking only at the left most getValueForDoc highlight only (and it helps to 
make it bigger or download the original)

!flamegraph_getValueForDoc.png|height=410,width=1000!

LongFieldSource#exists spends MOST of its time doing a 
LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its time 
doing two things primarily:
* FilterNumericDocValues#longValue()
* advance()

This makes sense based on looking at the code (copied below to make it easier 
to see at once) 

[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-28 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529572#comment-17529572
 ] 

Kevin Risden commented on LUCENE-10534:
---

Updated metrics - there is a benefit to the new maxfloatfunction logic to avoid 
duplicate exists() even with using the new fieldsource stuff in LUCENE-10542. 

| Benchmark   | Mode  | Cnt 
| Score and Error  | Units |
|-|---|-|--|---|
| MyBenchmark.testMaxFloatFunction| thrpt | 25  
| 69.949 ± 4.043   | ops/s |
| MyBenchmark.testMaxFloatFunctionNewFloatFieldSource | thrpt | 25  
| 112.326 ± 3.228  | ops/s |
| MyBenchmark.testNewMaxFloatFunction | thrpt | 25  
| 93.216 ± 2.757   | ops/s |
| MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSource  | thrpt | 25  
| 123.364 ± 7.861  | ops/s |
| MyBenchmark.testMaxFloatFunctionRareField   | thrpt | 25  
| 257.339 ± 33.849 | ops/s |
| MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField| thrpt | 25  
| 287.175 ± 22.840 | ops/s |
| MyBenchmark.testNewMaxFloatFunctionRareField| thrpt | 25  
| 235.268 ± 4.103  | ops/s |
| MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSourceRareField | thrpt | 25  
| 272.397 ± 8.406  | ops/s |

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-28 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529513#comment-17529513
 ] 

Kevin Risden commented on LUCENE-10534:
---

I reviewed the jmh tests again and realized I didn't test the new 
maxfloatfunction logic - I had copied the original code over :( Rerunning the 
benchmarks to see if there is an improvement.

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-27 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529009#comment-17529009
 ] 

Kevin Risden commented on LUCENE-10542:
---

[~hossman] I think I addressed your comments from LUCENE-10534. The PR I pushed 
updated BytesRefFieldSource and I did another check for the method name 
getValueForDoc.

> FieldSource exists implementation can avoid value retrieval
> ---
>
> Key: LUCENE-10542
> URL: https://issues.apache.org/jira/browse/LUCENE-10542
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph_getValueForDoc.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While looking at LUCENE-10534, found that *FieldSource exists implementation 
> after LUCENE-7407 can avoid value lookup when just checking for exists.
> Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
> axis = stack trace bottom being first call top being last call
> Looking only at the left most getValueForDoc highlight only (and it helps to 
> make it bigger or download the original)
> !flamegraph_getValueForDoc.png|height=410,width=1000!
> LongFieldSource#exists spends MOST of its time doing a 
> LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its 
> time doing two things primarily:
> * FilterNumericDocValues#longValue()
> * advance()
> This makes sense based on looking at the code (copied below to make it easier 
> to see at once) 
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72
> {code:java}
>   private long getValueForDoc(int doc) throws IOException {
> if (doc < lastDocID) {
>   throw new IllegalArgumentException(
>   "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
> docID=" + doc);
> }
> lastDocID = doc;
> int curDocID = arr.docID();
> if (doc > curDocID) {
>   curDocID = arr.advance(doc);
> }
> if (doc == curDocID) {
>   return arr.longValue();
> } else {
>   return 0;
> }
>   }
> {code}
> LongFieldSource#exists - doesn't care about the actual longValue. Just that 
> there was a value found when iterating through the doc values.
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95
> {code:java}
>   @Override
>   public boolean exists(int doc) throws IOException {
> getValueForDoc(doc);
> return arr.docID() == doc;
>   }
> {code}
> So putting this all together for exists calling getValueForDoc, we spent ~50% 
> of the time trying to get the long value when we don't need it in exists. We 
> can save that 50% of time making exists not care about the actual value and 
> just return if doc == curDocID basically.
> This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
> exists() is being called a bunch. Eventually the value will be needed from 
> longVal(), but if we call exists() say 3 times for every longVal(), we are 
> spending a lot of time computing the value when we only need to check for 
> existence.
> I found the same pattern in DoubleFieldSource, EnumFieldSource, 
> FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
> showing what this would look like:
> 
> Simple JMH performance tests comparing the original FloatFieldSource to the 
> new ones from PR #847.
>  
> ||Benchmark||Mode||Cnt||Score and Error||Units||
> |MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s|
> |MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± 
> 8.229|ops/s|
> |MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s|
> |MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997
>  ± 27.575|ops/s|
> Source: https://github.com/risdenk/lucene-jmh



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-27 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10542:
--
Description: 
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
axis = stack trace bottom being first call top being last call

Looking only at the left most getValueForDoc highlight only (and it helps to 
make it bigger or download the original)

!flamegraph_getValueForDoc.png|height=410,width=1000!

LongFieldSource#exists spends MOST of its time doing a 
LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its time 
doing two things primarily:
* FilterNumericDocValues#longValue()
* advance()

This makes sense based on looking at the code (copied below to make it easier 
to see at once) 
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72

{code:java}
  private long getValueForDoc(int doc) throws IOException {
if (doc < lastDocID) {
  throw new IllegalArgumentException(
  "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
docID=" + doc);
}
lastDocID = doc;
int curDocID = arr.docID();
if (doc > curDocID) {
  curDocID = arr.advance(doc);
}
if (doc == curDocID) {
  return arr.longValue();
} else {
  return 0;
}
  }
{code}

LongFieldSource#exists - doesn't care about the actual longValue. Just that 
there was a value found when iterating through the doc values.
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95

{code:java}
  @Override
  public boolean exists(int doc) throws IOException {
getValueForDoc(doc);
return arr.docID() == doc;
  }
{code}

So putting this all together for exists calling getValueForDoc, we spent ~50% 
of the time trying to get the long value when we don't need it in exists. We 
can save that 50% of time making exists not care about the actual value and 
just return if doc == curDocID basically.

This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
exists() is being called a bunch. Eventually the value will be needed from 
longVal(), but if we call exists() say 3 times for every longVal(), we are 
spending a lot of time computing the value when we only need to check for 
existence.

I found the same pattern in DoubleFieldSource, EnumFieldSource, 
FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
showing what this would look like:



Simple JMH performance tests comparing the original FloatFieldSource to the new 
ones from PR #847.
 
||Benchmark||Mode||Cnt||Score and Error||Units||
|MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± 
8.229|ops/s|
|MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997 
± 27.575|ops/s|

Source: https://github.com/risdenk/lucene-jmh

  was:
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
axis = stack trace bottom being first call top being last call

Looking only at the left most getValueForDoc highlight only (and it helps to 
make it bigger or download the original)

!flamegraph_getValueForDoc.png|height=500,width=500!

LongFieldSource#exists spends MOST of its time doing a 
LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its time 
doing two things primarily:
* FilterNumericDocValues#longValue()
* advance()

This makes sense based on looking at the code (copied below to make it easier 
to see at once) 
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72

{code:java}
  private long getValueForDoc(int doc) throws IOException {
if (doc < lastDocID) {
  throw new IllegalArgumentException(
  "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
docID=" + doc);
}
lastDocID = doc;
int curDocID = arr.docID();
if (doc > curDocID) {
  curDocID = arr.advance(doc);
}
if (doc == curDocID) {
  return arr.longValue();
} else {
  return 0;
}
  }
{code}

LongFieldSource#exists - doesn't care about the actual longValue. Just that 
there was a value found when iterating through the doc values.

[jira] [Updated] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-27 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10542:
--
Status: Patch Available  (was: Open)

> FieldSource exists implementation can avoid value retrieval
> ---
>
> Key: LUCENE-10542
> URL: https://issues.apache.org/jira/browse/LUCENE-10542
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph_getValueForDoc.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at LUCENE-10534, found that *FieldSource exists implementation 
> after LUCENE-7407 can avoid value lookup when just checking for exists.
> Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
> axis = stack trace bottom being first call top being last call
> Looking only at the left most getValueForDoc highlight only (and it helps to 
> make it bigger or download the original)
> !flamegraph_getValueForDoc.png|height=410,width=1000!
> LongFieldSource#exists spends MOST of its time doing a 
> LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its 
> time doing two things primarily:
> * FilterNumericDocValues#longValue()
> * advance()
> This makes sense based on looking at the code (copied below to make it easier 
> to see at once) 
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72
> {code:java}
>   private long getValueForDoc(int doc) throws IOException {
> if (doc < lastDocID) {
>   throw new IllegalArgumentException(
>   "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
> docID=" + doc);
> }
> lastDocID = doc;
> int curDocID = arr.docID();
> if (doc > curDocID) {
>   curDocID = arr.advance(doc);
> }
> if (doc == curDocID) {
>   return arr.longValue();
> } else {
>   return 0;
> }
>   }
> {code}
> LongFieldSource#exists - doesn't care about the actual longValue. Just that 
> there was a value found when iterating through the doc values.
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95
> {code:java}
>   @Override
>   public boolean exists(int doc) throws IOException {
> getValueForDoc(doc);
> return arr.docID() == doc;
>   }
> {code}
> So putting this all together for exists calling getValueForDoc, we spent ~50% 
> of the time trying to get the long value when we don't need it in exists. We 
> can save that 50% of time making exists not care about the actual value and 
> just return if doc == curDocID basically.
> This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
> exists() is being called a bunch. Eventually the value will be needed from 
> longVal(), but if we call exists() say 3 times for every longVal(), we are 
> spending a lot of time computing the value when we only need to check for 
> existence.
> I found the same pattern in DoubleFieldSource, EnumFieldSource, 
> FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
> showing what this would look like:
> 
> Simple JMH performance tests comparing the original FloatFieldSource to the 
> new ones from PR #847.
>  
> ||Benchmark||Mode||Cnt||Score and Error||Units||
> |MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s|
> |MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± 
> 8.229|ops/s|
> |MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s|
> |MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997
>  ± 27.575|ops/s|
> Source: https://github.com/risdenk/lucene-jmh



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-27 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10542:
--
Description: 
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
axis = stack trace bottom being first call top being last call

Looking only at the left most getValueForDoc highlight only (and it helps to 
make it bigger or download the original)

!flamegraph_getValueForDoc.png|height=500,width=500!

LongFieldSource#exists spends MOST of its time doing a 
LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its time 
doing two things primarily:
* FilterNumericDocValues#longValue()
* advance()

This makes sense based on looking at the code (copied below to make it easier 
to see at once) 
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72

{code:java}
  private long getValueForDoc(int doc) throws IOException {
if (doc < lastDocID) {
  throw new IllegalArgumentException(
  "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
docID=" + doc);
}
lastDocID = doc;
int curDocID = arr.docID();
if (doc > curDocID) {
  curDocID = arr.advance(doc);
}
if (doc == curDocID) {
  return arr.longValue();
} else {
  return 0;
}
  }
{code}

LongFieldSource#exists - doesn't care about the actual longValue. Just that 
there was a value found when iterating through the doc values.
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95

{code:java}
  @Override
  public boolean exists(int doc) throws IOException {
getValueForDoc(doc);
return arr.docID() == doc;
  }
{code}

So putting this all together for exists calling getValueForDoc, we spent ~50% 
of the time trying to get the long value when we don't need it in exists. We 
can save that 50% of time making exists not care about the actual value and 
just return if doc == curDocID basically.

This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
exists() is being called a bunch. Eventually the value will be needed from 
longVal(), but if we call exists() say 3 times for every longVal(), we are 
spending a lot of time computing the value when we only need to check for 
existence.

I found the same pattern in DoubleFieldSource, EnumFieldSource, 
FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
showing what this would look like:



Simple JMH performance tests comparing the original FloatFieldSource to the new 
ones from PR #847.
 
||Benchmark||Mode||Cnt||Score and Error||Units||
|MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± 
8.229|ops/s|
|MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997 
± 27.575|ops/s|

Source: https://github.com/risdenk/lucene-jmh

  was:
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
axis = stack trace bottom being first call top being last call

Looking only at the left most getValueForDoc highlight only (and it helps to 
make it bigger or download the original)

!flamegraph_getValueForDoc.png|height=250,width=250!

LongFieldSource#exists spends MOST of its time doing a 
LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its time 
doing two things primarily:
* FilterNumericDocValues#longValue()
* advance()

This makes sense based on looking at the code (copied below to make it easier 
to see at once) 
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72

{code:java}
  private long getValueForDoc(int doc) throws IOException {
if (doc < lastDocID) {
  throw new IllegalArgumentException(
  "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
docID=" + doc);
}
lastDocID = doc;
int curDocID = arr.docID();
if (doc > curDocID) {
  curDocID = arr.advance(doc);
}
if (doc == curDocID) {
  return arr.longValue();
} else {
  return 0;
}
  }
{code}

LongFieldSource#exists - doesn't care about the actual longValue. Just that 
there was a value found when iterating through the doc values.

[jira] [Updated] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-27 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10542:
--
Attachment: flamegraph_getValueForDoc.png

> FieldSource exists implementation can avoid value retrieval
> ---
>
> Key: LUCENE-10542
> URL: https://issues.apache.org/jira/browse/LUCENE-10542
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph_getValueForDoc.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at LUCENE-10534, found that *FieldSource exists implementation 
> after LUCENE-7407 can avoid value lookup when just checking for exists.
> Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
> axis = stack trace bottom being first call top being last call
> Looking only at the left most getValueForDoc highlight only (and it helps to 
> make it bigger or download the original)
> !flamegraph_getValueForDoc.png|height=250,width=250!
> LongFieldSource#exists spends MOST of its time doing a 
> LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its 
> time doing two things primarily:
> * FilterNumericDocValues#longValue()
> * advance()
> This makes sense based on looking at the code (copied below to make it easier 
> to see at once) 
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72
> {code:java}
>   private long getValueForDoc(int doc) throws IOException {
> if (doc < lastDocID) {
>   throw new IllegalArgumentException(
>   "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
> docID=" + doc);
> }
> lastDocID = doc;
> int curDocID = arr.docID();
> if (doc > curDocID) {
>   curDocID = arr.advance(doc);
> }
> if (doc == curDocID) {
>   return arr.longValue();
> } else {
>   return 0;
> }
>   }
> {code}
> LongFieldSource#exists - doesn't care about the actual longValue. Just that 
> there was a value found when iterating through the doc values.
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95
> {code:java}
>   @Override
>   public boolean exists(int doc) throws IOException {
> getValueForDoc(doc);
> return arr.docID() == doc;
>   }
> {code}
> So putting this all together for exists calling getValueForDoc, we spent ~50% 
> of the time trying to get the long value when we don't need it in exists. We 
> can save that 50% of time making exists not care about the actual value and 
> just return if doc == curDocID basically.
> This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
> exists() is being called a bunch. Eventually the value will be needed from 
> longVal(), but if we call exists() say 3 times for every longVal(), we are 
> spending a lot of time computing the value when we only need to check for 
> existence.
> I found the same pattern in DoubleFieldSource, EnumFieldSource, 
> FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
> showing what this would look like:
> 
> Simple JMH performance tests comparing the original FloatFieldSource to the 
> new ones from PR #847.
>  
> ||Benchmark||Mode||Cnt||Score and Error||Units||
> |MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s|
> |MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± 
> 8.229|ops/s|
> |MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s|
> |MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997
>  ± 27.575|ops/s|
> Source: https://github.com/risdenk/lucene-jmh



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-27 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10542:
--
Description: 
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
axis = stack trace bottom being first call top being last call

Looking only at the left most getValueForDoc highlight only (and it helps to 
make it bigger or download the original)

!flamegraph_getValueForDoc.png|height=250,width=250!

LongFieldSource#exists spends MOST of its time doing a 
LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its time 
doing two things primarily:
* FilterNumericDocValues#longValue()
* advance()

This makes sense based on looking at the code (copied below to make it easier 
to see at once) 
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72

{code:java}
  private long getValueForDoc(int doc) throws IOException {
if (doc < lastDocID) {
  throw new IllegalArgumentException(
  "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
docID=" + doc);
}
lastDocID = doc;
int curDocID = arr.docID();
if (doc > curDocID) {
  curDocID = arr.advance(doc);
}
if (doc == curDocID) {
  return arr.longValue();
} else {
  return 0;
}
  }
{code}

LongFieldSource#exists - doesn't care about the actual longValue. Just that 
there was a value found when iterating through the doc values.
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95

{code:java}
  @Override
  public boolean exists(int doc) throws IOException {
getValueForDoc(doc);
return arr.docID() == doc;
  }
{code}

So putting this all together for exists calling getValueForDoc, we spent ~50% 
of the time trying to get the long value when we don't need it in exists. We 
can save that 50% of time making exists not care about the actual value and 
just return if doc == curDocID basically.

This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
exists() is being called a bunch. Eventually the value will be needed from 
longVal(), but if we call exists() say 3 times for every longVal(), we are 
spending a lot of time computing the value when we only need to check for 
existence.

I found the same pattern in DoubleFieldSource, EnumFieldSource, 
FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
showing what this would look like:



Simple JMH performance tests comparing the original FloatFieldSource to the new 
ones from PR #847.
 
||Benchmark||Mode||Cnt||Score and Error||Units||
|MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± 
8.229|ops/s|
|MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997 
± 27.575|ops/s|

Source: https://github.com/risdenk/lucene-jmh

  was:
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Detailed analysis here: 
https://issues.apache.org/jira/browse/LUCENE-10534?focusedCommentId=17528369=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17528369

Simple JMH performance tests comparing the original FloatFieldSource to the new 
ones from PR #847.
 
||Benchmark||Mode||Cnt||Score and Error||Units||
|MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± 
8.229|ops/s|
|MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997 
± 27.575|ops/s|

Source: https://github.com/risdenk/lucene-jmh


> FieldSource exists implementation can avoid value retrieval
> ---
>
> Key: LUCENE-10542
> URL: https://issues.apache.org/jira/browse/LUCENE-10542
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at LUCENE-10534, found that *FieldSource exists implementation 
> after LUCENE-7407 can avoid value lookup when just checking for exists.
> Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
> axis = stack trace bottom being 

[jira] [Updated] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-27 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10542:
--
Description: 
While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.

Detailed analysis here: 
https://issues.apache.org/jira/browse/LUCENE-10534?focusedCommentId=17528369=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17528369

Simple JMH performance tests comparing the original FloatFieldSource to the new 
ones from PR #847.
 
||Benchmark||Mode||Cnt||Score and Error||Units||
|MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± 
8.229|ops/s|
|MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997 
± 27.575|ops/s|

Source: https://github.com/risdenk/lucene-jmh

  was:While looking at LUCENE-10534, found that *FieldSource exists 
implementation after LUCENE-7407 can avoid value lookup when just checking for 
exists.


> FieldSource exists implementation can avoid value retrieval
> ---
>
> Key: LUCENE-10542
> URL: https://issues.apache.org/jira/browse/LUCENE-10542
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at LUCENE-10534, found that *FieldSource exists implementation 
> after LUCENE-7407 can avoid value lookup when just checking for exists.
> Detailed analysis here: 
> https://issues.apache.org/jira/browse/LUCENE-10534?focusedCommentId=17528369=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17528369
> Simple JMH performance tests comparing the original FloatFieldSource to the 
> new ones from PR #847.
>  
> ||Benchmark||Mode||Cnt||Score and Error||Units||
> |MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s|
> |MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± 
> 8.229|ops/s|
> |MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s|
> |MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997
>  ± 27.575|ops/s|
> Source: https://github.com/risdenk/lucene-jmh



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-27 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528936#comment-17528936
 ] 

Kevin Risden commented on LUCENE-10534:
---

I moved the FieldSource exists stuff here: LUCENE-10542 since it is separate 
from the min/max conversation.

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10542) FieldSource exists implementation can avoid value retrieval

2022-04-27 Thread Kevin Risden (Jira)
Kevin Risden created LUCENE-10542:
-

 Summary: FieldSource exists implementation can avoid value 
retrieval
 Key: LUCENE-10542
 URL: https://issues.apache.org/jira/browse/LUCENE-10542
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Kevin Risden
Assignee: Kevin Risden


While looking at LUCENE-10534, found that *FieldSource exists implementation 
after LUCENE-7407 can avoid value lookup when just checking for exists.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-27 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528931#comment-17528931
 ] 

Kevin Risden commented on LUCENE-10534:
---

[~romseygeek] not sure I understand your question. I don't quite understand the 
ValueSource stuff enough to understand what you are trying to suggest.

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-27 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528798#comment-17528798
 ] 

Kevin Risden edited comment on LUCENE-10534 at 4/27/22 1:48 PM:


Simple JMH performance tests comparing the original MaxFloatFunction and 
original FloatFieldSource to the new ones from PR #837 and #840. Interestingly 
this shows that the new MaxFloatFunction doesn't help performance at all and 
makes it slightly worse. All the performance gains are from the new 
FloatFieldSource impl.
 
||Benchmark||Mode||Cnt||Score and Error||Units||
|MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± 
8.229|ops/s|
|MyBenchmark.testNewMaxFloatFunction|thrpt|25|64.588 ± 1.154|ops/s|
|MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSource|thrpt|25|115.084 ± 
12.421|ops/s|
|MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s|
|MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997 
± 27.575|ops/s|
|MyBenchmark.testNewMaxFloatFunctionRareField|thrpt|25|236.144 ± 5.528|ops/s|
|MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|269.662
 ± 8.247|ops/s|

Source: https://github.com/risdenk/lucene-jmh


was (Author: risdenk):
[https://github.com/risdenk/lucene-jmh]

Simple JMH performance tests comparing the original MaxFloatFunction and 
original FloatFieldSource to the new ones from PR #837 and #840. Interestingly 
this shows that the new MaxFloatFunction doesn't help performance at all and 
makes it slightly worse. All the performance gains are from the new 
FloatFieldSource impl.

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-27 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528798#comment-17528798
 ] 

Kevin Risden commented on LUCENE-10534:
---

[https://github.com/risdenk/lucene-jmh]

Simple JMH performance tests comparing the original MaxFloatFunction and 
original FloatFieldSource to the new ones from PR #837 and #840. Interestingly 
this shows that the new MaxFloatFunction doesn't help performance at all and 
makes it slightly worse. All the performance gains are from the new 
FloatFieldSource impl.

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-27 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528791#comment-17528791
 ] 

Kevin Risden commented on LUCENE-10534:
---

{quote}although  PR840 should probably have it's own Jira for 
tracking/CHANGES.txt purposes.{quote}

yea agreed will separate it out into its own Jira.

{quote}I don't see any new tests in either of your PRs ... did you forget to 
'git add' ?{quote}

I didn't add any new correctness tests to the Min/Max PR yet. I've been working 
on a few JMH tests to see if there is really a performance improvement. I'll 
push those up to github shortly. I'll go back to add the tests shortly.

{quote}Looks like BytesRefFieldSource has the exact same structure (but with a 
null check instead of a 0.0f check) {quote}

Thanks I'll take a look at BytesRefFieldSource

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-26 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528369#comment-17528369
 ] 

Kevin Risden commented on LUCENE-10534:
---

I'm still getting used to flamegraphs so I'll explain my understanding and see 
if we are on the same page. I'm going to put up a PR with suggested changes as 
well in case that helps.

Flamegraphs - x axis = time spent as a percentage of time being profiled, y 
axis = stack trace bottom being first call top being last call

Looking only at the left most getValueForDoc highlight only (and it helps to 
make it bigger or download the original)

!flamegraph_getValueForDoc.png|height=250,width=250!

LongFieldSource#exists spends MOST of its time doing a 
LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its time 
doing two things primarily:
* FilterNumericDocValues#longValue()
* advance()

This makes sense based on looking at the code (copied below to make it easier 
to see at once) 
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72

{code:java}
  private long getValueForDoc(int doc) throws IOException {
if (doc < lastDocID) {
  throw new IllegalArgumentException(
  "docs were sent out-of-order: lastDocID=" + lastDocID + " vs 
docID=" + doc);
}
lastDocID = doc;
int curDocID = arr.docID();
if (doc > curDocID) {
  curDocID = arr.advance(doc);
}
if (doc == curDocID) {
  return arr.longValue();
} else {
  return 0;
}
  }
{code}

LongFieldSource#exists - doesn't care about the actual longValue. Just that 
there was a value found when iterating through the doc values.
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95

{code:java}
  @Override
  public boolean exists(int doc) throws IOException {
getValueForDoc(doc);
return arr.docID() == doc;
  }
{code}

So putting this all together for exists calling getValueForDoc, we spent ~50% 
of the time trying to get the long.Value when we don't need it in exists. We 
can save that 50% of time making exists not care about the actual value and 
just return if doc == curDocID basically.

This 50% extra is exaggerated in MaxFloatFunction (and other places) since 
exists() is being called a bunch. Eventually the value will be needed from 
longVal(), but if we call exists() say 3 times for every longVal(), we are 
spending a lot of time computing the value when we only need to check for 
existence.

I found the same pattern in DoubleFieldSource, EnumFieldSource, 
FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change 
showing what this would look like:

https://github.com/apache/lucene/pull/840



I also fixed the Min/Max function PR to remove the duplicate exists checks and 
remove the special 0.0f casing.

https://github.com/apache/lucene/pull/837

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-26 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528331#comment-17528331
 ] 

Kevin Risden edited comment on LUCENE-10534 at 4/26/22 5:45 PM:


Here is a small part of the flamegraph highlighting the exists from the Max 
float function

 !flamegraph.png|height=250,width=250!

I think part of the issue might actually be the implementation of exists for 
FloatFieldSource and LongFieldSource after LUCENE-7407:
* 
https://github.com/apache/lucene/blame/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/FloatFieldSource.java#L84
* 
https://github.com/apache/lucene/blame/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L96

Both of these actually return the value through getValueForDoc even though we 
really only need to check if the doc ids match. The exists method doesn't even 
check the actual value at all. This pops up in the flamegraph under exists as 
well.

 !flamegraph_getValueForDoc.png|height=250,width=250!


was (Author: risdenk):
Here is a small part of the flamegraph highlighting the exists from the Max 
float function

 !flamegraph.png! 

I think part of the issue might actually be the implementation of exists for 
FloatFieldSource and LongFieldSource after LUCENE-7407:
* 
https://github.com/apache/lucene/blame/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/FloatFieldSource.java#L84
* 
https://github.com/apache/lucene/blame/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L96

Both of these actually return the value through getValueForDoc even though we 
really only need to check if the doc ids match. The exists method doesn't even 
check the actual value at all. This pops up in the flamegraph under exists as 
well.

 !flamegraph_getValueForDoc.png! 

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-26 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528331#comment-17528331
 ] 

Kevin Risden commented on LUCENE-10534:
---

Here is a small part of the flamegraph highlighting the exists from the Max 
float function

 !flamegraph.png! 

I think part of the issue might actually be the implementation of exists for 
FloatFieldSource and LongFieldSource after LUCENE-7407:
* 
https://github.com/apache/lucene/blame/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/FloatFieldSource.java#L84
* 
https://github.com/apache/lucene/blame/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L96

Both of these actually return the value through getValueForDoc even though we 
really only need to check if the doc ids match. The exists method doesn't even 
check the actual value at all. This pops up in the flamegraph under exists as 
well.

 !flamegraph_getValueForDoc.png! 

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-26 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Attachment: flamegraph_getValueForDoc.png

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png, flamegraph_getValueForDoc.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-26 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528329#comment-17528329
 ] 

Kevin Risden commented on LUCENE-10534:
---

So my thoughts about 0.0f were based on 
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/FloatFieldSource.java#L73.
 It looks like my understanding was wrong. I'll go back and clean up that wrong 
optimization. Most likely need a test case as well since the existing tests 
didn't find any issue w/ the assumption.

LUCENE-5961 and LUCENE-7407 interact in a bad way. I think there are ways to 
fix this.

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-26 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Attachment: flamegraph.png

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: flamegraph.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-26 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528159#comment-17528159
 ] 

Kevin Risden commented on LUCENE-10534:
---

Thanks [~hossman] for the comments and examples. I'll go back and dig into why 
I thought 0.0f was always the return value in the exception case. It could be 
that I misunderstood somewhere along the way.

I do think there is room for improvement like you suggested with anyExists. I 
was playing around with a few ideas, but haven't made progress yet.

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10533) SpellChecker.formGrams is missing bounds check

2022-04-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10533:
--
Fix Version/s: 9.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> SpellChecker.formGrams is missing bounds check
> --
>
> Key: LUCENE-10533
> URL: https://issues.apache.org/jira/browse/LUCENE-10533
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If using Solr IndexBasedSpellChecker and spellcheck.q is empty the following 
> exception occurs (found in SOLR-16169). There is an argument that the caller 
> should not be invalid, but a simple bounds check would prevent this in Lucene.
> {code:java}
> null:java.lang.NegativeArraySizeException: -1
>   at 
> org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:438)
>   at 
> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:345)
>   at 
> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:147)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at org.eclipse.jetty.server.Server.handle(Server.java:505)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>   at 
> 

[jira] [Updated] (LUCENE-10533) SpellChecker.formGrams is missing bounds check

2022-04-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10533:
--
Fix Version/s: 9.2
   (was: 9.1)

> SpellChecker.formGrams is missing bounds check
> --
>
> Key: LUCENE-10533
> URL: https://issues.apache.org/jira/browse/LUCENE-10533
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If using Solr IndexBasedSpellChecker and spellcheck.q is empty the following 
> exception occurs (found in SOLR-16169). There is an argument that the caller 
> should not be invalid, but a simple bounds check would prevent this in Lucene.
> {code:java}
> null:java.lang.NegativeArraySizeException: -1
>   at 
> org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:438)
>   at 
> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:345)
>   at 
> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:147)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at org.eclipse.jetty.server.Server.handle(Server.java:505)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>   at 
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>   at 
> 

[jira] [Commented] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-25 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527735#comment-17527735
 ] 

Kevin Risden commented on LUCENE-10534:
---

FWIW I am not 100% sure how to performance test this from within Lucene or 
luceneutil benchmarking. I didn't see any function query related performance 
stuff. I have some async-java-profiler flamegraphs from testing and see that 
the exists() call is very hot and slows down for min/max queries.

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Status: Patch Available  (was: Open)

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check is slow

2022-04-25 Thread Kevin Risden (Jira)
Kevin Risden created LUCENE-10534:
-

 Summary: MinFloatFunction / MaxFloatFunction exists check is slow
 Key: LUCENE-10534
 URL: https://issues.apache.org/jira/browse/LUCENE-10534
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Kevin Risden
Assignee: Kevin Risden


MinFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
 and MaxFloatFunction 
(https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
 both check if values exist. This is needed since the underlying valuesource 
returns 0.0f as either a valid value or as a value when the document doesn't 
have a value.

Even though this is changed to anyExists and short circuits in the case a value 
is found in any document, the worst case is that there is no value found and 
requires checking all the way through to the raw data. This is only needed when 
0.0f is returned and need to determine if it is a valid value or the not found 
case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10534) MinFloatFunction / MaxFloatFunction exists check can be slow

2022-04-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10534:
--
Summary: MinFloatFunction / MaxFloatFunction exists check can be slow  
(was: MinFloatFunction / MaxFloatFunction exists check is slow)

> MinFloatFunction / MaxFloatFunction exists check can be slow
> 
>
> Key: LUCENE-10534
> URL: https://issues.apache.org/jira/browse/LUCENE-10534
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>
> MinFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MinFloatFunction.java)
>  and MaxFloatFunction 
> (https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java)
>  both check if values exist. This is needed since the underlying valuesource 
> returns 0.0f as either a valid value or as a value when the document doesn't 
> have a value.
> Even though this is changed to anyExists and short circuits in the case a 
> value is found in any document, the worst case is that there is no value 
> found and requires checking all the way through to the raw data. This is only 
> needed when 0.0f is returned and need to determine if it is a valid value or 
> the not found case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10533) SpellChecker.formGrams is missing bounds check

2022-04-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated LUCENE-10533:
--
Status: Patch Available  (was: Open)

> SpellChecker.formGrams is missing bounds check
> --
>
> Key: LUCENE-10533
> URL: https://issues.apache.org/jira/browse/LUCENE-10533
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If using Solr IndexBasedSpellChecker and spellcheck.q is empty the following 
> exception occurs (found in SOLR-16169). There is an argument that the caller 
> should not be invalid, but a simple bounds check would prevent this in Lucene.
> {code:java}
> null:java.lang.NegativeArraySizeException: -1
>   at 
> org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:438)
>   at 
> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:345)
>   at 
> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:147)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at org.eclipse.jetty.server.Server.handle(Server.java:505)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>   at 
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>   at 
> 

[jira] [Created] (LUCENE-10533) SpellChecker.formGrams is missing bounds check

2022-04-25 Thread Kevin Risden (Jira)
Kevin Risden created LUCENE-10533:
-

 Summary: SpellChecker.formGrams is missing bounds check
 Key: LUCENE-10533
 URL: https://issues.apache.org/jira/browse/LUCENE-10533
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Kevin Risden
Assignee: Kevin Risden


If using Solr IndexBasedSpellChecker and spellcheck.q is empty the following 
exception occurs (found in SOLR-16169). There is an argument that the caller 
should not be invalid, but a simple bounds check would prevent this in Lucene.

{code:java}
null:java.lang.NegativeArraySizeException: -1
at 
org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:438)
at 
org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:345)
at 
org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:147)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:505)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917)
at java.base/java.lang.Thread.run(Unknown Source)
{code}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: 

[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)

2021-01-04 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258466#comment-17258466
 ] 

Kevin Risden commented on SOLR-15051:
-

{quote}"Hadoop filesystem interface" is an ideal choice{quote}

Not sure I would go so far as an "ideal choice" - it definitely brings in a lot 
of dependencies sadly. The nice part is that it has implementations for local, 
hdfs, s3, adls, and gcp at least. So you get multi cloud for free. What I do 
not know is how efficient each of those implementations are - especially the 
"local" one for file:// type paths.

Just wanted to point out the Hadoop filesystem connector stuff is separate from 
the full running of HDFS. Here are a few references:

* General overview: 
http://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-common/filesystem/index.html
* AWS: 
http://hadoop.apache.org/docs/current3/hadoop-aws/tools/hadoop-aws/index.html
* Azure: http://hadoop.apache.org/docs/current3/hadoop-azure/index.html
* Google: https://github.com/GoogleCloudDataproc/hadoop-connectors
* 

> Shared storage -- BlobDirectory (de-duping)
> ---
>
> Key: SOLR-15051
> URL: https://issues.apache.org/jira/browse/SOLR-15051
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This proposal is a way to accomplish shared storage in SolrCloud with a few 
> key characteristics: (A) using a Directory implementation, (B) delegates to a 
> backing local file Directory as a kind of read/write cache (C) replicas have 
> their own "space", (D) , de-duplication across replicas via reference 
> counting, (E) uses ZK but separately from SolrCloud stuff.
> The Directory abstraction is a good one, and helps isolate shared storage 
> from the rest of SolrCloud that doesn't care.  Using a backing normal file 
> Directory is faster for reads and is simpler than Solr's HDFSDirectory's 
> BlockCache.  Replicas having their own space solves the problem of multiple 
> writers (e.g. of the same shard) trying to own and write to the same space, 
> and it implies that any of Solr's replica types can be used along with what 
> goes along with them like peer-to-peer replication (sometimes faster/cheaper 
> than pulling from shared storage).  A de-duplication feature solves needless 
> duplication of files across replicas and from parent shards (i.e. from shard 
> splitting).  The de-duplication feature requires a place to cache directory 
> listings so that they can be shared across replicas and atomically updated; 
> this is handled via ZooKeeper.  Finally, some sort of Solr daemon / 
> auto-scaling code should be added to implement "autoAddReplicas", especially 
> to provide for a scenario where the leader is gone and can't be replicated 
> from directly but we can access shared storage.
> For more about shared storage concepts, consider looking at the description 
> in SOLR-13101 and the linked Google Doc.
> *[PROPOSAL 
> DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-12-22 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14951:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Upgrade Angular JS 1.7.9 to 1.8.0
> -
>
> Key: SOLR-14951
> URL: https://issues.apache.org/jira/browse/SOLR-14951
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)

2020-12-22 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253596#comment-17253596
 ] 

Kevin Risden commented on SOLR-15051:
-

I can't add a comment to the design doc, but wanted to address potentially 
misleading statements around the Solr HDFS integration.

{quote}Has an unfortunate search performance penalty. TODO ___ %.  Some 
indexing penalty too: ___ %.{quote}
There will be a performance penalty here coming from remote storage. I don't 
think this is completely avoidable. The biggest issue is on the indexing side 
where we need to ensure that documents are reliably written, but this isn't 
exactly fast on remote storage.

{quote}The implementation relies on a “BlockCache”, which means running Solr 
with large Java heaps.{quote}

The BlockCache is off heap typically with Java direct memory so shouldn't 
require a large Java heap.

{quote}
It’s not a generalized shared storage scheme; it’s HDFS specific.  It’s 
possible to plug in S3 and Alluxio to this but there is overhead.  HDFS is 
rather complex to operate, whereas say S3 is provided by cloud hosting 
providers natively.
{quote}

I'm not sure I understand this statement. There are a few parts to Hadoop. HDFS 
is the storage layer that can be complex to operate. The more interesting part 
is the Hadoop filesystem interface that is a semi generic adapter between the 
HDFS API and other storage backends (S3, ABFS, GCS, etc). The two pieces are 
separate and don't require each other to operate. The Hadoop filesystem 
interface provides the abstraction necessary to go between local filesystem to 
a lot of other cloud provider storage mechanisms.

There may be some overhead there, but I know there has been a lot of work in 
the past 1-2 years where the performance has been improved since there has been 
a push for cloud storage support.

> Shared storage -- BlobDirectory (de-duping)
> ---
>
> Key: SOLR-15051
> URL: https://issues.apache.org/jira/browse/SOLR-15051
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>
> This proposal is a way to accomplish shared storage in SolrCloud with a few 
> key characteristics: (A) using a Directory implementation, (B) delegates to a 
> backing local file Directory as a kind of read/write cache (C) replicas have 
> their own "space", (D) , de-duplication across replicas via reference 
> counting, (E) uses ZK but separately from SolrCloud stuff.
> The Directory abstraction is a good one, and helps isolate shared storage 
> from the rest of SolrCloud that doesn't care.  Using a backing normal file 
> Directory is faster for reads and is simpler than Solr's HDFSDirectory's 
> BlockCache.  Replicas having their own space solves the problem of multiple 
> writers (e.g. of the same shard) trying to own and write to the same space, 
> and it implies that any of Solr's replica types can be used along with what 
> goes along with them like peer-to-peer replication (sometimes faster/cheaper 
> than pulling from shared storage).  A de-duplication feature solves needless 
> duplication of files across replicas and from parent shards (i.e. from shard 
> splitting).  The de-duplication feature requires a place to cache directory 
> listings so that they can be shared across replicas and atomically updated; 
> this is handled via ZooKeeper.  Finally, some sort of Solr daemon / 
> auto-scaling code should be added to implement "autoAddReplicas", especially 
> to provide for a scenario where the leader is gone and can't be replicated 
> from directly but we can access shared storage.
> For more about shared storage concepts, consider looking at the description 
> in SOLR-13101 and the linked Google Doc.
> *[PROPOSAL 
> DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14088) Tika and commons-compress dependency in solr core causes classloader issue

2020-12-22 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-14088.
-
Resolution: Cannot Reproduce

This hasn't come up again and I'm not planning to track it down any further 
right now. Marking as "Cannot Reproduce"

> Tika and commons-compress dependency in solr core causes classloader issue
> --
>
> Key: SOLR-14088
> URL: https://issues.apache.org/jira/browse/SOLR-14088
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Kevin Risden
>Priority: Major
>
> SOLR-14086 found that if commons-compress is in core ivy.xml as a compile 
> dependency, it messes up the classloader for any commons-compress 
> dependencies. It causes issues with items like xz being loaded. 
> This is problematic where dependencies shouldn't matter based on classloader. 
> This jira to to determine if there is something wrong w/ Solr's classloader 
> or if its a commons-compress issue only.
> Error message from SOLR-14086 copied below:
> {code:java}
> 
> 
> 
> Error 500 java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.commons.compress.archivers.sevenz.Coders
> 
> HTTP ERROR 500 java.lang.NoClassDefFoundError: Could not initialize 
> class org.apache.commons.compress.archivers.sevenz.Coders
> 
> URI:/solr/tika-integration-example/update/extract
> STATUS:500
> MESSAGE:java.lang.NoClassDefFoundError: Could not initialize 
> class org.apache.commons.compress.archivers.sevenz.Coders
> SERVLET:default
> CAUSED BY:java.lang.NoClassDefFoundError: Could not 
> initialize class org.apache.commons.compress.archivers.sevenz.Coders
> 
> Caused by:java.lang.NoClassDefFoundError: Could not initialize 
> class org.apache.commons.compress.archivers.sevenz.Coders
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:437)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:355)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.init(SevenZFile.java:241)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.init(SevenZFile.java:108)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.init(SevenZFile.java:262)
>   at 
> org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:257)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2582)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)
>   at 
> 

[jira] [Resolved] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-23 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-14973.
-
Resolution: Fixed

Marking as fixed in 8.7. Thanks [~schuch] for finding the fix. Thanks 
[~shuremov] for confirming fixed in 8.7. Thanks [~tallison] for jumping in :D

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Assignee: Tim Allison
>Priority: Major
>  Labels: tika-parsers
> Fix For: 8.7
>
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-23 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14973:

Fix Version/s: 8.7

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Priority: Major
>  Labels: tika-parsers
> Fix For: 8.7
>
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-23 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden reassigned SOLR-14973:
---

Assignee: Tim Allison

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Assignee: Tim Allison
>Priority: Major
>  Labels: tika-parsers
> Fix For: 8.7
>
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-18 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234688#comment-17234688
 ] 

Kevin Risden commented on SOLR-14973:
-

[~tallison] If its not a lot of work, then sure backport it. Unless there are 
plans to do a 8.6.x release - then not sure it makes sense to backport the 
changes since it would never get into a release. 

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Priority: Major
>  Labels: tika-parsers
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-09 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228769#comment-17228769
 ] 

Kevin Risden commented on SOLR-14973:
-

FYI [~tallison] - not sure who updated Tika libraries last :D I can help look 
at this I think.

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Priority: Major
>  Labels: tika-parsers
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-11-09 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14951:

Fix Version/s: 8.8

> Upgrade Angular JS 1.7.9 to 1.8.0
> -
>
> Key: SOLR-14951
> URL: https://issues.apache.org/jira/browse/SOLR-14951
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-11-09 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228695#comment-17228695
 ] 

Kevin Risden commented on SOLR-14951:
-

Sigh I missed merging it. I just saw the review and will get it merged soon.

> Upgrade Angular JS 1.7.9 to 1.8.0
> -
>
> Key: SOLR-14951
> URL: https://issues.apache.org/jira/browse/SOLR-14951
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13506) Upgrade carrot2-guava-*.jar

2020-10-22 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219200#comment-17219200
 ] 

Kevin Risden commented on SOLR-13506:
-

FYI [~dweiss] - [~sourabhsparkala] filed SOLR-14960 which looks to be a 
duplicate of this.

> Upgrade carrot2-guava-*.jar 
> 
>
> Key: SOLR-13506
> URL: https://issues.apache.org/jira/browse/SOLR-13506
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 7.7.1, 8.0, 8.1
>Reporter: DW
>Assignee: Dawid Weiss
>Priority: Major
>
> The Solr package contains /contrib/clustering/lib/carrot2-guava-18.0.jar.
> [cpe:/a:google:guava:18.0|https://web.nvd.nist.gov/view/vuln/search-results?adv_search=true=on_version=cpe%3A%2Fa%3Agoogle%3Aguava%3A18.0]
>  has know security vulnerabilities. 
> Can you please upgrade the library or remove if not needed.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14960) Solr-clustering is bringing in CVE-2018-10237 vulnerable guava

2020-10-22 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-14960.
-
Resolution: Duplicate

Duplicate of SOLR-13506

> Solr-clustering is bringing in CVE-2018-10237 vulnerable guava
> --
>
> Key: SOLR-14960
> URL: https://issues.apache.org/jira/browse/SOLR-14960
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3
>Reporter: Sourabh Sarvotham Parkala
>Priority: Major
>
> Hello Team, we find that Solr-Clustering module is bringing in a Vulnerable 
> library `org.carrot2.shaded:carrot2-guava:18.0`. 
> The vulnerability is 
> [CVE-2018-10237|https://nvd.nist.gov/vuln/detail/CVE-2018-10237] 
>  Severity: Medium
>  CVSS Score 5.9
> [INFO] +- org.apache.solr:solr-clustering:jar:8.6.3:compile
>  [INFO] | +- com.carrotsearch.thirdparty:simple-xml-safe:jar:2.7.1:compile
>  [INFO] | +- org.carrot2:carrot2-mini:jar:3.16.0:compile
>  [INFO] | +- org.carrot2.attributes:attributes-binder:jar:1.3.3:compile
>  [INFO] | - org.carrot2.shaded:carrot2-guava:jar:18.0:compile
> Hence, creating this BUG to request you to remove the dependency of Carrot2 
> from the Solr Module. As the last update from 
> [carrot2|https://mvnrepository.com/artifact/org.carrot2.shaded/carrot2-guava] 
> library seems to be in 2015. And we cannot be sure if they will release a new 
> version with the updated guava library fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-10-19 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217181#comment-17217181
 ] 

Kevin Risden commented on SOLR-14951:
-

PR: https://github.com/apache/lucene-solr/pull/2008

> Upgrade Angular JS 1.7.9 to 1.8.0
> -
>
> Key: SOLR-14951
> URL: https://issues.apache.org/jira/browse/SOLR-14951
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-10-19 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14951:

Status: Patch Available  (was: Open)

> Upgrade Angular JS 1.7.9 to 1.8.0
> -
>
> Key: SOLR-14951
> URL: https://issues.apache.org/jira/browse/SOLR-14951
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-10-19 Thread Kevin Risden (Jira)
Kevin Risden created SOLR-14951:
---

 Summary: Upgrade Angular JS 1.7.9 to 1.8.0
 Key: SOLR-14951
 URL: https://issues.apache.org/jira/browse/SOLR-14951
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Admin UI
Reporter: Kevin Risden
Assignee: Kevin Risden


Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14549) Listing of Files in a Directory on Solr Admin is Broken

2020-10-19 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14549:

Fix Version/s: (was: 8.8)
   8.7

> Listing of Files in a Directory on Solr Admin is Broken
> ---
>
> Key: SOLR-14549
> URL: https://issues.apache.org/jira/browse/SOLR-14549
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: master (9.0), 8.6, 8.5.1, 8.5.2, 8.6.1, 8.6.2, 8.6.3
>Reporter: David Eric Pugh
>Assignee: Kevin Risden
>Priority: Major
> Fix For: master (9.0), 8.7
>
> Attachments: Screenshot at Jun 09 07-40-06.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Admin interface for showing files only lets you see the top level files, 
> no nested files in a directory:
> http://localhost:8983/solr/#/gettingstarted/files?file=lang%2F
> Choosing a nested directory doesn't generate any console errors, but the tree 
> doesn't open.
> I believe this was introduced during SOLR-14209 upgrade in Jquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14549) Listing of Files in a Directory on Solr Admin is Broken

2020-10-19 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14549:

Fix Version/s: master (9.0)

> Listing of Files in a Directory on Solr Admin is Broken
> ---
>
> Key: SOLR-14549
> URL: https://issues.apache.org/jira/browse/SOLR-14549
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: master (9.0), 8.6, 8.5.1, 8.5.2, 8.6.1, 8.6.2, 8.6.3
>Reporter: David Eric Pugh
>Assignee: Kevin Risden
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: Screenshot at Jun 09 07-40-06.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Admin interface for showing files only lets you see the top level files, 
> no nested files in a directory:
> http://localhost:8983/solr/#/gettingstarted/files?file=lang%2F
> Choosing a nested directory doesn't generate any console errors, but the tree 
> doesn't open.
> I believe this was introduced during SOLR-14209 upgrade in Jquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14549) Listing of Files in a Directory on Solr Admin is Broken

2020-10-15 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214987#comment-17214987
 ] 

Kevin Risden commented on SOLR-14549:
-

PR https://github.com/apache/lucene-solr/pull/1989

> Listing of Files in a Directory on Solr Admin is Broken
> ---
>
> Key: SOLR-14549
> URL: https://issues.apache.org/jira/browse/SOLR-14549
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: master (9.0), 8.6, 8.5.1, 8.5.2, 8.6.1, 8.6.2, 8.6.3
>Reporter: David Eric Pugh
>Assignee: Kevin Risden
>Priority: Major
> Attachments: Screenshot at Jun 09 07-40-06.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Admin interface for showing files only lets you see the top level files, 
> no nested files in a directory:
> http://localhost:8983/solr/#/gettingstarted/files?file=lang%2F
> Choosing a nested directory doesn't generate any console errors, but the tree 
> doesn't open.
> I believe this was introduced during SOLR-14209 upgrade in Jquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14549) Listing of Files in a Directory on Solr Admin is Broken

2020-10-15 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14549:

Status: Patch Available  (was: Open)

> Listing of Files in a Directory on Solr Admin is Broken
> ---
>
> Key: SOLR-14549
> URL: https://issues.apache.org/jira/browse/SOLR-14549
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: master (9.0), 8.5.1, 8.5.2
>Reporter: David Eric Pugh
>Assignee: Kevin Risden
>Priority: Major
> Attachments: Screenshot at Jun 09 07-40-06.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Admin interface for showing files only lets you see the top level files, 
> no nested files in a directory:
> http://localhost:8983/solr/#/gettingstarted/files?file=lang%2F
> Choosing a nested directory doesn't generate any console errors, but the tree 
> doesn't open.
> I believe this was introduced during SOLR-14209 upgrade in Jquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14549) Listing of Files in a Directory on Solr Admin is Broken

2020-10-15 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14549:

Affects Version/s: 8.6
   8.6.1
   8.6.2
   8.6.3

> Listing of Files in a Directory on Solr Admin is Broken
> ---
>
> Key: SOLR-14549
> URL: https://issues.apache.org/jira/browse/SOLR-14549
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: master (9.0), 8.6, 8.5.1, 8.5.2, 8.6.1, 8.6.2, 8.6.3
>Reporter: David Eric Pugh
>Assignee: Kevin Risden
>Priority: Major
> Attachments: Screenshot at Jun 09 07-40-06.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Admin interface for showing files only lets you see the top level files, 
> no nested files in a directory:
> http://localhost:8983/solr/#/gettingstarted/files?file=lang%2F
> Choosing a nested directory doesn't generate any console errors, but the tree 
> doesn't open.
> I believe this was introduced during SOLR-14209 upgrade in Jquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14549) Listing of Files in a Directory on Solr Admin is Broken

2020-10-13 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213551#comment-17213551
 ] 

Kevin Risden commented on SOLR-14549:
-

So I'm about 80% sure that it has something to do with the Files.list and 
recursively trying to build up the children for jstree. The children array is 
being modified in the recursive process call and somehow this gets all out of 
order. I'll need to stare at it again another day.

> Listing of Files in a Directory on Solr Admin is Broken
> ---
>
> Key: SOLR-14549
> URL: https://issues.apache.org/jira/browse/SOLR-14549
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: master (9.0), 8.5.1, 8.5.2
>Reporter: David Eric Pugh
>Assignee: Kevin Risden
>Priority: Major
> Attachments: Screenshot at Jun 09 07-40-06.png
>
>
> The Admin interface for showing files only lets you see the top level files, 
> no nested files in a directory:
> http://localhost:8983/solr/#/gettingstarted/files?file=lang%2F
> Choosing a nested directory doesn't generate any console errors, but the tree 
> doesn't open.
> I believe this was introduced during SOLR-14209 upgrade in Jquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14549) Listing of Files in a Directory on Solr Admin is Broken

2020-10-13 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213513#comment-17213513
 ] 

Kevin Risden commented on SOLR-14549:
-

Ok well took a bit more time to get back to this. I can definitely see this is 
happening AND I can see kinda what is going on. Somehow the data being loaded 
isn't loaded "in time" it looks like. If I hardcode a set of children in 
https://github.com/apache/lucene-solr/blob/master/solr/webapp/web/js/angular/controllers/files.js#L62
 then the tree displays properly. It is almost like things are happening out of 
order. If I add console.log statements I can see them happening in an order 
that doesn't make sense to me. 

So once I fix the ordering or loading of the changed data - then should be good 
to go.

> Listing of Files in a Directory on Solr Admin is Broken
> ---
>
> Key: SOLR-14549
> URL: https://issues.apache.org/jira/browse/SOLR-14549
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: master (9.0), 8.5.1, 8.5.2
>Reporter: David Eric Pugh
>Assignee: Kevin Risden
>Priority: Major
> Attachments: Screenshot at Jun 09 07-40-06.png
>
>
> The Admin interface for showing files only lets you see the top level files, 
> no nested files in a directory:
> http://localhost:8983/solr/#/gettingstarted/files?file=lang%2F
> Choosing a nested directory doesn't generate any console errors, but the tree 
> doesn't open.
> I believe this was introduced during SOLR-14209 upgrade in Jquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14887) Upgrade JQuery to 3.5.1

2020-10-13 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-14887.
-
Resolution: Fixed

Thanks [~gezapeti]!

> Upgrade JQuery to 3.5.1
> ---
>
> Key: SOLR-14887
> URL: https://issues.apache.org/jira/browse/SOLR-14887
> Project: Solr
>  Issue Type: Task
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: master (9.0), 8.7
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Solr admin UI currently uses JQuery 3.4.1 (SOLR-14209). JQuery 3.5.1 is 
> out and addresses some security vulnerabilities. It would be good to upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14887) Upgrade JQuery to 3.5.1

2020-10-13 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14887:

Fix Version/s: 8.7

> Upgrade JQuery to 3.5.1
> ---
>
> Key: SOLR-14887
> URL: https://issues.apache.org/jira/browse/SOLR-14887
> Project: Solr
>  Issue Type: Task
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: master (9.0), 8.7
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Solr admin UI currently uses JQuery 3.4.1 (SOLR-14209). JQuery 3.5.1 is 
> out and addresses some security vulnerabilities. It would be good to upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14851) Http2SolrClient doesn't handle keystore type correctly

2020-10-13 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14851:

Fix Version/s: (was: 8.7)
   (was: master (9.0))

> Http2SolrClient doesn't handle keystore type correctly
> --
>
> Key: SOLR-14851
> URL: https://issues.apache.org/jira/browse/SOLR-14851
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Reporter: Andras Salamon
>Assignee: Kevin Risden
>Priority: Major
> Attachments: SOLR-14851-01.patch
>
>
> I wanted to use Solr SSL using bcfks keystore type. Even after specifying the 
> following JVM properties, Solr was not able to start: 
> {{-Djavax.net.ssl.keyStoreType=bcfks -Djavax.net.ssl.trustStoreType=bcfks 
> -Dsolr.jetty.truststore.type=bcfks -Dsolr.jetty.keystore.type=bcfks}}
> The error message in the log:
> {noformat}2020-09-07 14:42:29.429 ERROR (main) [   ] o.a.s.c.SolrCore 
> null:org.apache.solr.common.SolrException: Error instantiating 
> shardHandlerFactory class [HttpShardHandlerFactory]: java.io.IOException: 
> Invalid keystore format
> at 
> org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:56)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:660)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:262)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:182)
> at 
> org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:134)
> at 
> org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:751)
> at 
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
> at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
> at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
> at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:360)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1445)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1409)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:822)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:275)
> at 
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:524)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:46)
> at 
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:513)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:154)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:173)
> at 
> org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:447)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:66)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:784)
> at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:753)
> at org.eclipse.jetty.util.Scanner.scan(Scanner.java:641)
> at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:540)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:146)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:599)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:249)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
> at org.eclipse.jetty.server.Server.start(Server.java:407)
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
> at 
> 

[jira] [Updated] (SOLR-14887) Upgrade JQuery to 3.5.1

2020-10-13 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14887:

Fix Version/s: (was: 8.7)

> Upgrade JQuery to 3.5.1
> ---
>
> Key: SOLR-14887
> URL: https://issues.apache.org/jira/browse/SOLR-14887
> Project: Solr
>  Issue Type: Task
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Solr admin UI currently uses JQuery 3.4.1 (SOLR-14209). JQuery 3.5.1 is 
> out and addresses some security vulnerabilities. It would be good to upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14887) Upgrade JQuery to 3.5.1

2020-10-13 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden reassigned SOLR-14887:
---

Assignee: Kevin Risden

> Upgrade JQuery to 3.5.1
> ---
>
> Key: SOLR-14887
> URL: https://issues.apache.org/jira/browse/SOLR-14887
> Project: Solr
>  Issue Type: Task
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Solr admin UI currently uses JQuery 3.4.1 (SOLR-14209). JQuery 3.5.1 is 
> out and addresses some security vulnerabilities. It would be good to upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14887) Upgrade JQuery to 3.5.1

2020-10-13 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14887:

Fix Version/s: 8.7
   master (9.0)

> Upgrade JQuery to 3.5.1
> ---
>
> Key: SOLR-14887
> URL: https://issues.apache.org/jira/browse/SOLR-14887
> Project: Solr
>  Issue Type: Task
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: master (9.0), 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Solr admin UI currently uses JQuery 3.4.1 (SOLR-14209). JQuery 3.5.1 is 
> out and addresses some security vulnerabilities. It would be good to upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14851) Http2SolrClient doesn't handle keystore type correctly

2020-10-13 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden reassigned SOLR-14851:
---

Assignee: Kevin Risden

> Http2SolrClient doesn't handle keystore type correctly
> --
>
> Key: SOLR-14851
> URL: https://issues.apache.org/jira/browse/SOLR-14851
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Reporter: Andras Salamon
>Assignee: Kevin Risden
>Priority: Major
> Attachments: SOLR-14851-01.patch
>
>
> I wanted to use Solr SSL using bcfks keystore type. Even after specifying the 
> following JVM properties, Solr was not able to start: 
> {{-Djavax.net.ssl.keyStoreType=bcfks -Djavax.net.ssl.trustStoreType=bcfks 
> -Dsolr.jetty.truststore.type=bcfks -Dsolr.jetty.keystore.type=bcfks}}
> The error message in the log:
> {noformat}2020-09-07 14:42:29.429 ERROR (main) [   ] o.a.s.c.SolrCore 
> null:org.apache.solr.common.SolrException: Error instantiating 
> shardHandlerFactory class [HttpShardHandlerFactory]: java.io.IOException: 
> Invalid keystore format
> at 
> org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:56)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:660)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:262)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:182)
> at 
> org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:134)
> at 
> org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:751)
> at 
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
> at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
> at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
> at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:360)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1445)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1409)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:822)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:275)
> at 
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:524)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:46)
> at 
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:513)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:154)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:173)
> at 
> org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:447)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:66)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:784)
> at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:753)
> at org.eclipse.jetty.util.Scanner.scan(Scanner.java:641)
> at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:540)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:146)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:599)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:249)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
> at org.eclipse.jetty.server.Server.start(Server.java:407)
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
> at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97)
> at 

[jira] [Updated] (SOLR-14851) Http2SolrClient doesn't handle keystore type correctly

2020-10-13 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14851:

Fix Version/s: 8.7
   master (9.0)

> Http2SolrClient doesn't handle keystore type correctly
> --
>
> Key: SOLR-14851
> URL: https://issues.apache.org/jira/browse/SOLR-14851
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Reporter: Andras Salamon
>Assignee: Kevin Risden
>Priority: Major
> Fix For: master (9.0), 8.7
>
> Attachments: SOLR-14851-01.patch
>
>
> I wanted to use Solr SSL using bcfks keystore type. Even after specifying the 
> following JVM properties, Solr was not able to start: 
> {{-Djavax.net.ssl.keyStoreType=bcfks -Djavax.net.ssl.trustStoreType=bcfks 
> -Dsolr.jetty.truststore.type=bcfks -Dsolr.jetty.keystore.type=bcfks}}
> The error message in the log:
> {noformat}2020-09-07 14:42:29.429 ERROR (main) [   ] o.a.s.c.SolrCore 
> null:org.apache.solr.common.SolrException: Error instantiating 
> shardHandlerFactory class [HttpShardHandlerFactory]: java.io.IOException: 
> Invalid keystore format
> at 
> org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:56)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:660)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:262)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:182)
> at 
> org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:134)
> at 
> org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:751)
> at 
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
> at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
> at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
> at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:360)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1445)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1409)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:822)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:275)
> at 
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:524)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:46)
> at 
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:513)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:154)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:173)
> at 
> org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:447)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:66)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:784)
> at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:753)
> at org.eclipse.jetty.util.Scanner.scan(Scanner.java:641)
> at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:540)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:146)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:599)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:249)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
> at org.eclipse.jetty.server.Server.start(Server.java:407)
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
> at 
> 

[jira] [Updated] (SOLR-14851) Http2SolrClient doesn't handle keystore type correctly

2020-10-13 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14851:

Component/s: Server

> Http2SolrClient doesn't handle keystore type correctly
> --
>
> Key: SOLR-14851
> URL: https://issues.apache.org/jira/browse/SOLR-14851
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Reporter: Andras Salamon
>Priority: Major
> Attachments: SOLR-14851-01.patch
>
>
> I wanted to use Solr SSL using bcfks keystore type. Even after specifying the 
> following JVM properties, Solr was not able to start: 
> {{-Djavax.net.ssl.keyStoreType=bcfks -Djavax.net.ssl.trustStoreType=bcfks 
> -Dsolr.jetty.truststore.type=bcfks -Dsolr.jetty.keystore.type=bcfks}}
> The error message in the log:
> {noformat}2020-09-07 14:42:29.429 ERROR (main) [   ] o.a.s.c.SolrCore 
> null:org.apache.solr.common.SolrException: Error instantiating 
> shardHandlerFactory class [HttpShardHandlerFactory]: java.io.IOException: 
> Invalid keystore format
> at 
> org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:56)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:660)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:262)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:182)
> at 
> org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:134)
> at 
> org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:751)
> at 
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
> at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
> at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
> at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:360)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1445)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1409)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:822)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:275)
> at 
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:524)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:46)
> at 
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:513)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:154)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:173)
> at 
> org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:447)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:66)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:784)
> at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:753)
> at org.eclipse.jetty.util.Scanner.scan(Scanner.java:641)
> at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:540)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:146)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:599)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:249)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
> at org.eclipse.jetty.server.Server.start(Server.java:407)
> at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
> at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97)
> at org.eclipse.jetty.server.Server.doStart(Server.java:371)
>

[jira] [Created] (SOLR-14887) Upgrade JQuery to 3.5.1

2020-09-22 Thread Kevin Risden (Jira)
Kevin Risden created SOLR-14887:
---

 Summary: Upgrade JQuery to 3.5.1
 Key: SOLR-14887
 URL: https://issues.apache.org/jira/browse/SOLR-14887
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Admin UI
Reporter: Kevin Risden


The Solr admin UI currently uses JQuery 3.4.1 (SOLR-14209). JQuery 3.5.1 is out 
and addresses some security vulnerabilities. It would be good to upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14549) Listing of Files in a Directory on Solr Admin is Broken

2020-09-22 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200116#comment-17200116
 ] 

Kevin Risden commented on SOLR-14549:
-

Sorry [~epugh] I missed your ping here. I saw what looks to be a similar report 
just filed today. I may look at JQuery 3.5.1 this week or next for Solr - so 
can probably look at this as well.

> Listing of Files in a Directory on Solr Admin is Broken
> ---
>
> Key: SOLR-14549
> URL: https://issues.apache.org/jira/browse/SOLR-14549
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: master (9.0), 8.5.1, 8.5.2
>Reporter: David Eric Pugh
>Priority: Major
> Attachments: Screenshot at Jun 09 07-40-06.png
>
>
> The Admin interface for showing files only lets you see the top level files, 
> no nested files in a directory:
> http://localhost:8983/solr/#/gettingstarted/files?file=lang%2F
> Choosing a nested directory doesn't generate any console errors, but the tree 
> doesn't open.
> I believe this was introduced during SOLR-14209 upgrade in Jquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14549) Listing of Files in a Directory on Solr Admin is Broken

2020-09-22 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden reassigned SOLR-14549:
---

Assignee: Kevin Risden

> Listing of Files in a Directory on Solr Admin is Broken
> ---
>
> Key: SOLR-14549
> URL: https://issues.apache.org/jira/browse/SOLR-14549
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: master (9.0), 8.5.1, 8.5.2
>Reporter: David Eric Pugh
>Assignee: Kevin Risden
>Priority: Major
> Attachments: Screenshot at Jun 09 07-40-06.png
>
>
> The Admin interface for showing files only lets you see the top level files, 
> no nested files in a directory:
> http://localhost:8983/solr/#/gettingstarted/files?file=lang%2F
> Choosing a nested directory doesn't generate any console errors, but the tree 
> doesn't open.
> I believe this was introduced during SOLR-14209 upgrade in Jquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14885) Solr GUI not displaying folders, directories in Files section

2020-09-22 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-14885.
-
Resolution: Duplicate

As far as I can tell this duplicates SOLR-14549.

> Solr GUI not displaying folders, directories in Files section
> -
>
> Key: SOLR-14885
> URL: https://issues.apache.org/jira/browse/SOLR-14885
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.6.2
>Reporter: A
>Priority: Minor
>  Labels: interface
> Attachments: solr1.PNG, solr2.png
>
>
> Solr GUI Files section is not displaying directory, folder content, please 
> see the pictures attached.
> solr2.png shows folder structure on my local server.
> solr1.png shows Solr GUI.
> To reproduce just create new core (./solr create -c db)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14422) Solr 8.5 Admin UI shows Angular placeholders on first load / refresh

2020-06-30 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148923#comment-17148923
 ] 

Kevin Risden commented on SOLR-14422:
-

Looks good [~epugh]. Sorry for dropping the ball here.

> Solr 8.5 Admin UI shows Angular placeholders on first load / refresh
> 
>
> Key: SOLR-14422
> URL: https://issues.apache.org/jira/browse/SOLR-14422
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: 8.5, 8.5.1, 8.5.2
>Reporter: Colvin Cowie
>Priority: Minor
> Attachments: SOLR-14422.patch, image-2020-04-21-14-51-18-923.png
>
>
> When loading / refreshing the Admin UI in 8.5.1, it briefly but _visibly_ 
> shows a placeholder for the "SolrCore Initialization Failures" error message, 
> with a lot of redness. It looks like there is a real problem. Obviously the 
> message then disappears, and it can be ignored.
> However, if I was a first time user, it would not give me confidence that 
> everything is okay. In a way, an error message that appears briefly then 
> disappears before I can finish reading it is worse than one which just stays 
> there.
>  
> Here's a screenshot of what I mean  !image-2020-04-21-14-51-18-923.png!
>  
> I suspect that SOLR-14132 will have caused this
>  
> From a (very) brief googling it seems like using the ng-cloak attribute is 
> the right way to fix this, and it certainly seems to work for me. 
> https://docs.angularjs.org/api/ng/directive/ngCloak
> I will attach a patch with it, but if someone who actually knows Angular etc 
> has a better approach then please go for it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14449) Two reproducing failures hdfs tests, CheckHdfsIndexTest and StressHdfsTest

2020-04-30 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096779#comment-17096779
 ] 

Kevin Risden commented on SOLR-14449:
-

[~erickerickson] not sure if this is fixed by an updated commit for SOLR-14237

> Two reproducing failures hdfs tests, CheckHdfsIndexTest and StressHdfsTest
> --
>
> Key: SOLR-14449
> URL: https://issues.apache.org/jira/browse/SOLR-14449
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Priority: Major
>
> ant test  -Dtestcase=CheckHdfsIndexTest -Dtests.method=doTest 
> -Dtests.seed=3364EB8B88BAC12F -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.badapples=true -Dtests.locale=he -Dtests.timezone=Europe/London 
> -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> ant test  -Dtestcase=StressHdfsTest -Dtests.method=test 
> -Dtests.seed=36356390EB3AB8E5 -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true 
> -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt
>  -Dtests.locale=mk -Dtests.timezone=Europe/Monaco -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1
> Here's the jenkins links for as long as they'll be around.
> https://builds.apache.org/job/Lucene-Solr-NightlyTests-8.x/428/consoleFull
> https://builds.apache.org/job/Lucene-Solr-BadApples-Tests-8.x/434/consoleFull
> They both fail on the same line, so probably the same cause: Here's the stack 
> trace from the second:
> ERROR   50.4s | StressHdfsTest.test <<<
>[junit4]> Throwable #1: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://127.0.0.1:50018/z/u/delete_data_dir: 
> java.lang.NullPointerException
>[junit4]>  at 
> org.apache.solr.handler.admin.SystemInfoHandler.getSecurityInfo(SystemInfoHandler.java:326)
>[junit4]>  at 
> org.apache.solr.handler.admin.SystemInfoHandler.handleRequestBody(SystemInfoHandler.java:146)
>[junit4]>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>[junit4]>  at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2600)
>[junit4]>  at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:803)
>[junit4]>  at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:582)
>[junit4]>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:432)
>[junit4]>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
>[junit4]>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
>[junit4]>  at 
> org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:166)
>[junit4]>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
>[junit4]>  at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
>[junit4]>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>[junit4]>  at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
>[junit4]>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>[junit4]>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
>[junit4]>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>[junit4]>  at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
>[junit4]>  at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
>[junit4]>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>[junit4]>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
>[junit4]>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>[junit4]>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>[junit4]>  at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
>[junit4]>  at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:767)
>[junit4]>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>[junit4]>  at 
> org.eclipse.jetty.server.Server.handle(Server.java:500)
>[junit4]>  at 

[jira] [Commented] (SOLR-13886) HDFSSyncSliceTest and SyncSliceTest started failing frequently

2020-04-24 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091773#comment-17091773
 ] 

Kevin Risden commented on SOLR-13886:
-

Thanks [~erickerickson] yea I've been checking about once a day too and haven't 
seen any failures. It also fixed local Jenkins run failures I had seen for this 
test.

> HDFSSyncSliceTest and SyncSliceTest started failing frequently
> --
>
> Key: SOLR-13886
> URL: https://issues.apache.org/jira/browse/SOLR-13886
> Project: Solr
>  Issue Type: Bug
>  Components: Tests
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.6
>
> Attachments: SOLR-13886.patch, SOLR-13886_jenkins_log.txt.gz
>
>
> While I can see some failures of this test in the past, they weren't frequent 
> and were usually things like port bindings (maybe SOLR-13871) or timeouts. 
> I've started this failure in Jenkins (and locally) frequently:
> {noformat}
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/5410/
> Java: 64bit/jdk-13 -XX:-UseCompressedOops -XX:+UseParallelGC
> 2 tests failed.
> FAILED:  org.apache.solr.cloud.SyncSliceTest.test
> Error Message:
> expected:<5> but was:<4>
> Stack Trace:
> java.lang.AssertionError: expected:<5> but was:<4>
> at 
> __randomizedtesting.SeedInfo.seed([F8E3B768E16E848D:70B788B24F92E975]:0)
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:631)
> at org.apache.solr.cloud.SyncSliceTest.test(SyncSliceTest.java:150)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:567)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> at 
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1082)
> at 
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1054)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> 

[jira] [Comment Edited] (SOLR-14422) Solr 8.5 Admin UI shows Angular placeholders on first load / refresh

2020-04-21 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088734#comment-17088734
 ] 

Kevin Risden edited comment on SOLR-14422 at 4/21/20, 2:24 PM:
---

Thanks [~cjcowie]! Sorry if I caused this upgrading Angular :( The patch looks 
reasonable at first glance. I'm not sure if ngCloak can be put on a smaller dom 
element than the body like the docs say. I'm not super familiar with it. I'll 
try to take look soon.


was (Author: risdenk):
Thanks [~cjcowie]! Sorry if I caused this upgrading Angular :( The patch looks 
reasonable at first glane. I'm not sure if ngCloak can be put on a smaller dom 
element than the body like the docs say. I'm not super familiar with it. I'll 
try to take look soon.

> Solr 8.5 Admin UI shows Angular placeholders on first load / refresh
> 
>
> Key: SOLR-14422
> URL: https://issues.apache.org/jira/browse/SOLR-14422
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Priority: Minor
> Attachments: SOLR-14422.patch, image-2020-04-21-14-51-18-923.png
>
>
> When loading / refreshing the Admin UI in 8.5.1, it briefly but _visibly_ 
> shows a placeholder for the "SolrCore Initialization Failures" error message, 
> with a lot of redness. It looks like there is a real problem. Obviously the 
> message then disappears, and it can be ignored.
> However, if I was a first time user, it would not give me confidence that 
> everything is okay. In a way, an error message that appears briefly then 
> disappears before I can finish reading it is worse than one which just stays 
> there.
>  
> Here's a screenshot of what I mean  !image-2020-04-21-14-51-18-923.png!
>  
> I suspect that SOLR-14132 will have caused this
>  
> From a (very) brief googling it seems like using the ng-cloak attribute is 
> the right way to fix this, and it certainly seems to work for me. 
> https://docs.angularjs.org/api/ng/directive/ngCloak
> I will attach a patch with it, but if someone who actually knows Angular etc 
> has a better approach then please go for it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   3   4   5   6   >