from:"David Smiley \(Jira\)"

[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2022-06-08 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551750#comment-17551750
 ] 

David Smiley commented on LUCENE-4574:
--

Maybe, but more recently, see LUCENE-10252 which Solr does not yet have as it 
was done in 9.1 (Solr is on Lucene 9.0 still).

> FunctionQuery ValueSource value computed twice per document
> ---
>
> Key: LUCENE-4574
> URL: https://issues.apache.org/jira/browse/LUCENE-4574
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0, 4.1
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-4574.patch, LUCENE-4574.patch, LUCENE-4574.patch, 
> LUCENE-4574.patch, Test_for_LUCENE-4574.patch
>
>
> I was working on a custom ValueSource and did some basic profiling and 
> debugging to see if it was being used optimally.  To my surprise, the value 
> was being fetched twice per document in a row.  This computation isn't 
> exactly cheap to calculate so this is a big problem.  I was able to 
> work-around this problem trivially on my end by caching the last value with 
> corresponding docid in my FunctionValues implementation.
> Here is an excerpt of the code path to the first execution:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> And here is the 2nd call:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
> at 
> org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> The 2nd call appears to use some score caching mechanism, which is all well 
> and good, but that same mechanism wasn't used in the first call so there's no 
> cached value to retrieve.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10236) CombinedFieldsQuery to use fieldAndWeights.values() when constructing MultiNormsLeafSimScorer for scoring

2022-06-02 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545527#comment-17545527
 ] 

David Smiley commented on LUCENE-10236:
---

If this is a "improvement", then I think it doesn't go to 8.11.x unless it's a 
bug.  I also observe it's "minor".

> CombinedFieldsQuery to use fieldAndWeights.values() when constructing 
> MultiNormsLeafSimScorer for scoring
> -
>
> Key: LUCENE-10236
> URL: https://issues.apache.org/jira/browse/LUCENE-10236
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/sandbox
>Reporter: Zach Chen
>Assignee: Zach Chen
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> This is a spin-off issue from discussion in 
> [https://github.com/apache/lucene/pull/418#issuecomment-967790816], for a 
> quick fix in CombinedFieldsQuery scoring.
> Currently CombinedFieldsQuery would use a constructed 
> [fields|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L420-L421]
>  object to create a MultiNormsLeafSimScorer for scoring, but the fields 
> object may contain duplicated field-weight pairs as it is [built from looping 
> over 
> fieldTerms|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L404-L414],
>  resulting into duplicated norms being added during scoring calculation in 
> MultiNormsLeafSimScorer. 
> E.g. for CombinedFieldsQuery with two fields and two values matching a 
> particular doc:
> {code:java}
> CombinedFieldQuery query =
> new CombinedFieldQuery.Builder()
> .addField("field1", (float) 1.0)
> .addField("field2", (float) 1.0)
> .addTerm(new BytesRef("foo"))
> .addTerm(new BytesRef("zoo"))
> .build(); {code}
> I would imagine the scoring to be based on the following:
>  # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + 
> freq(field1:zoo) + freq(field2:zoo)
>  # Sum of norms on doc = norm(field1) + norm(field2)
> but the current logic would use the following for scoring:
>  # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + 
> freq(field1:zoo) + freq(field2:zoo)
>  # Sum of norms on doc = norm(field1) + norm(field2) + norm(field1) + 
> norm(field2)
>  
> In addition, this differs from how MultiNormsLeafSimScorer is constructed 
> from CombinedFieldsQuery explain function, which [uses 
> fieldAndWeights.values()|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L387-L389]
>  and does not contain duplicated field-weight pairs. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8519) MultiDocValues.getNormValues should not call getMergedFieldInfos

2022-05-21 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-8519.
--
Fix Version/s: 9.3
   Resolution: Fixed

Thanks for contributing Rushabh!

> MultiDocValues.getNormValues should not call getMergedFieldInfos
> 
>
> Key: LUCENE-8519
> URL: https://issues.apache.org/jira/browse/LUCENE-8519
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {{MultiDocValues.getNormValues}} should not call {{getMergedFieldInfos}} 
> because it's a needless expense.  getNormValues simply wants to know if each 
> LeafReader that has this field has norms as well; that's all.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10454) UnifiedHighlighter can miss terms because of query rewrites

2022-03-25 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-10454:
--
Status: Open  (was: Open)

WeightMatches is the default as of 9.0: LUCENE-9431 -- the ramifications of 
this early term extraction is nullified in this case and your test will pass.

When WeightMatches isn't being used, there is at least PhraseHelper which does 
extraction and attempts to know wether it should rewrite the query or not.  
Perhaps the UH should see that terms is empty and fallback on PhraseHelper.  
See my patch; tests pass including yours.  It's imperfect though if there 
are position sensitive terms or MTQ/automata, this won't work I suppose.


> UnifiedHighlighter can miss terms because of query rewrites
> ---
>
> Key: LUCENE-10454
> URL: https://issues.apache.org/jira/browse/LUCENE-10454
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Julie Tibshirani
>Priority: Minor
> Attachments: LUCENE-10454-fix.patch, LUCENE-10454.patch
>
>
> Before extracting terms from a query, UnifiedHighlighter rewrites the query 
> using an empty searcher. If the query rewrites to MatchNoDocsQuery when the 
> reader is empty, then the highlighter will fail to extract terms. This is 
> more of an issue now that we rewrite BooleanQuery to MatchNoDocsQuery when 
> any of its required clauses is MatchNoDocsQuery 
> (https://issues.apache.org/jira/browse/LUCENE-10412). I attached a patch 
> showing the problem.
> This feels like a pretty esoteric issue, but I figured it was worth raising 
> for awareness. I think it only applies when weightMatches=false, which isn't 
> the default. I couldn't find any existing queries in Lucene that would be 
> affected.
> We ran into it while upgrading Elasticsearch to the latest Lucene snapshot, 
> since a couple custom queries rewrite to MatchNoDocsQuery when the reader is 
> empty.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10454) UnifiedHighlighter can miss terms because of query rewrites

2022-03-25 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-10454:
--
Attachment: LUCENE-10454-fix.patch

> UnifiedHighlighter can miss terms because of query rewrites
> ---
>
> Key: LUCENE-10454
> URL: https://issues.apache.org/jira/browse/LUCENE-10454
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Julie Tibshirani
>Priority: Minor
> Attachments: LUCENE-10454-fix.patch, LUCENE-10454.patch
>
>
> Before extracting terms from a query, UnifiedHighlighter rewrites the query 
> using an empty searcher. If the query rewrites to MatchNoDocsQuery when the 
> reader is empty, then the highlighter will fail to extract terms. This is 
> more of an issue now that we rewrite BooleanQuery to MatchNoDocsQuery when 
> any of its required clauses is MatchNoDocsQuery 
> (https://issues.apache.org/jira/browse/LUCENE-10412). I attached a patch 
> showing the problem.
> This feels like a pretty esoteric issue, but I figured it was worth raising 
> for awareness. I think it only applies when weightMatches=false, which isn't 
> the default. I couldn't find any existing queries in Lucene that would be 
> affected.
> We ran into it while upgrading Elasticsearch to the latest Lucene snapshot, 
> since a couple custom queries rewrite to MatchNoDocsQuery when the reader is 
> empty.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2022-03-03 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500803#comment-17500803
 ] 

David Smiley commented on LUCENE-10302:
---

I attached my WIP as a patch file.  Looking back, I started with defining the 
Builder code.  I didn't yet implement "heapify".   Nobody calls any of it.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
> Attachments: LUCENE_PriorityQueue_Builder_with_heapify.patch
>
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2022-03-03 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-10302:
--
Attachment: LUCENE_PriorityQueue_Builder_with_heapify.patch

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
> Attachments: LUCENE_PriorityQueue_Builder_with_heapify.patch
>
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Closed] (LUCENE-6121) Fix CachingTokenFilter to propagate reset() the first time

2022-01-05 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed LUCENE-6121.


> Fix CachingTokenFilter to propagate reset() the first time
> --
>
> Key: LUCENE-6121
> URL: https://issues.apache.org/jira/browse/LUCENE-6121
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Labels: random-chains
> Fix For: 5.0, 6.0
>
> Attachments: 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch, 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch
>
>
> CachingTokenFilter should have been propagating reset() _but only the first 
> time_ and thus you would then use CachingTokenFilter in a more normal way – 
> wrap it and call reset() then increment in a loop, etc., instead of knowing 
> you need to reset() on what it wraps but not this token filter itself. That's 
> weird. It's ab-normal for a TokenFilter to never propagate reset, so every 
> user of CachingTokenFilter to date has worked around this by calling reset() 
> on the underlying input instead of the final wrapping token filter 
> (CachingTokenFilter in this case).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-6121) Fix CachingTokenFilter to propagate reset() the first time

2022-01-05 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-6121.
--
Resolution: Fixed

> Fix CachingTokenFilter to propagate reset() the first time
> --
>
> Key: LUCENE-6121
> URL: https://issues.apache.org/jira/browse/LUCENE-6121
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Labels: random-chains
> Fix For: 6.0, 5.0
>
> Attachments: 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch, 
> LUCENE-6121_CachingTokenFilter_reset_propagates_reset_if_not_cached.patch
>
>
> CachingTokenFilter should have been propagating reset() _but only the first 
> time_ and thus you would then use CachingTokenFilter in a more normal way – 
> wrap it and call reset() then increment in a loop, etc., instead of knowing 
> you need to reset() on what it wraps but not this token filter itself. That's 
> weird. It's ab-normal for a TokenFilter to never propagate reset, so every 
> user of CachingTokenFilter to date has worked around this by calling reset() 
> on the underlying input instead of the final wrapping token filter 
> (CachingTokenFilter in this case).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10252) ValueSource.asDoubleValues shouldn't fetch score

2022-01-03 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-10252.
---
Fix Version/s: 9.1
   Resolution: Fixed

> ValueSource.asDoubleValues shouldn't fetch score
> 
>
> Key: LUCENE-10252
> URL: https://issues.apache.org/jira/browse/LUCENE-10252
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/query
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Fix For: 9.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The ValueSource.asDoubleValuesSource() method bridges the old API to the new 
> one.  It's rather important because boosting a query no longer has an old 
> API; in its place is using this method and passing to 
> FunctionScoreQuery.boostByValue.  Unfortunately, asDoubleValuesSource will 
> fetch/compute the score for the document in order to expose it in a Scorable 
> on the "scorer" key of the context Map.  AFAICT nothing in Lucene or Solr 
> actually uses this.  If it should be kept, the Scorable's score() method 
> could fetch it at that time (e.g. on-demand).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10197) UnifiedHighlighter should use builders for thread-safety

2021-12-19 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462295#comment-17462295
 ] 

David Smiley commented on LUCENE-10197:
---

Both paths are viable (do now or do later); it can be done in bulk (with other 
deprecation removals for other stuff) later.  Later will be at least a year 
away.

> UnifiedHighlighter should use builders for thread-safety
> 
>
> Key: LUCENE-10197
> URL: https://issues.apache.org/jira/browse/LUCENE-10197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Animesh Pandey
>Priority: Minor
>  Labels: newdev
> Fix For: 9.1
>
> Attachments: LUCENE-10197.patch
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> UnifiedHighlighter is not thread-safe due to the presence of setters. We can 
> move the fields to builder so that the class becomes thread-safe.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10197) UnifiedHighlighter should use builders for thread-safety

2021-12-19 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-10197.
---
Fix Version/s: 9.1
   Resolution: Fixed

Woohoo; thanks Animesh!

> UnifiedHighlighter should use builders for thread-safety
> 
>
> Key: LUCENE-10197
> URL: https://issues.apache.org/jira/browse/LUCENE-10197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Animesh Pandey
>Priority: Minor
>  Labels: newdev
> Fix For: 9.1
>
> Attachments: LUCENE-10197.patch
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> UnifiedHighlighter is not thread-safe due to the presence of setters. We can 
> move the fields to builder so that the class becomes thread-safe.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2021-12-09 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456448#comment-17456448
 ] 

David Smiley commented on LUCENE-10302:
---

This builder could provide a buildList method in addition to buildQueue since 
some consumers don't really need a PriorityQueue specifically -- PQ is an 
implementation detail to get a top-N, typically want sorted top-N, and this 
list can be sorted quickly & easily (be it the unsorted input array or the heap 
array).  I'm not sure if it's actually faster to pop() the heap array 
one-by-one; I suspect not.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2021-12-09 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456443#comment-17456443
 ] 

David Smiley commented on LUCENE-10302:
---

Fun fact: there are 56 usages of Lucene's PriorityQueue in Lucene.  I counted 
via adding the find-usages of both constructors (55 + 1).

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2021-12-09 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456441#comment-17456441
 ] 

David Smiley commented on LUCENE-10302:
---

As an aside; it's weird/annoying to have to subclass this PriorityQueue just to 
supply the comparator.  Can't we do a Comparator?

I have the start of some code in progress I can share to anyone interested.  
I'm not sure when I'll get back to this.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2021-12-09 Thread David Smiley (Jira)

David Smiley created LUCENE-10302:
-

 Summary: PriorityQueue: optimize where we collect then iterate by 
using O(N) heapify
 Key: LUCENE-10302
 URL: https://issues.apache.org/jira/browse/LUCENE-10302
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: David Smiley


Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to wondering 
if there was faster-than O(N*log(N)) way of loading a PriorityQueue when we 
provide a bulk array to initialize the heap/PriorityQueue.  It turns out there 
is: the JDK's PriorityQueue supports this in its constructors, referring to 
"This classic algorithm due to Floyd (1964) is known to be O(size)" -- 
heapify() method.  There's 
[another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may or 
may not be the same; I didn't look too closely yet.  I see a number of uses of 
Lucene's PriorityQueue that first collects values and only after collecting 
want to do something with the results (typical / unsurprising).  This lends 
itself to a builder pattern that can look similar to 
LargeNumHitsTopDocsCollector in terms of first having an array used like a list 
and then move over to the PriorityQueue if/when it gets full (it may not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10252) ValueSource.asDoubleValues shouldn't fetch score

2021-12-06 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454023#comment-17454023
 ] 

David Smiley commented on LUCENE-10252:
---

I think this could reasonably be qualified as a perf regression bug (especially 
felt by Solr), applicable to 8.11 bug-fix release.  WDYT?  Admittedly I didn't 
detect it in such a way but nonetheless I'm sure calculating the score more 
than needed absolutely leads to a big performance loss in some cases, which I 
have run into in the past.

> ValueSource.asDoubleValues shouldn't fetch score
> 
>
> Key: LUCENE-10252
> URL: https://issues.apache.org/jira/browse/LUCENE-10252
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/query
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ValueSource.asDoubleValuesSource() method bridges the old API to the new 
> one.  It's rather important because boosting a query no longer has an old 
> API; in its place is using this method and passing to 
> FunctionScoreQuery.boostByValue.  Unfortunately, asDoubleValuesSource will 
> fetch/compute the score for the document in order to expose it in a Scorable 
> on the "scorer" key of the context Map.  AFAICT nothing in Lucene or Solr 
> actually uses this.  If it should be kept, the Scorable's score() method 
> could fetch it at that time (e.g. on-demand).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10197) UnifiedHighlighter should use builders for thread-safety

2021-12-06 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454010#comment-17454010
 ] 

David Smiley commented on LUCENE-10197:
---

I think a single JIRA is fine.  I suppose if we merely deprecate things in 9.1 
that are removed in 10 then we needn't have a CHANGES.txt entry for 10 -- thus 
one entry for CHANGES.txt for 9.1 mentioning both the builder and also 
deprecating mutability.

> UnifiedHighlighter should use builders for thread-safety
> 
>
> Key: LUCENE-10197
> URL: https://issues.apache.org/jira/browse/LUCENE-10197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Animesh Pandey
>Priority: Minor
>  Labels: newdev
> Attachments: LUCENE-10197.patch
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> UnifiedHighlighter is not thread-safe due to the presence of setters. We can 
> move the fields to builder so that the class becomes thread-safe.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-10252) ValueSource.asDoubleValues shouldn't fetch score

2021-12-05 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-10252:
-

Assignee: David Smiley

> ValueSource.asDoubleValues shouldn't fetch score
> 
>
> Key: LUCENE-10252
> URL: https://issues.apache.org/jira/browse/LUCENE-10252
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/query
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>
> The ValueSource.asDoubleValuesSource() method bridges the old API to the new 
> one.  It's rather important because boosting a query no longer has an old 
> API; in its place is using this method and passing to 
> FunctionScoreQuery.boostByValue.  Unfortunately, asDoubleValuesSource will 
> fetch/compute the score for the document in order to expose it in a Scorable 
> on the "scorer" key of the context Map.  AFAICT nothing in Lucene or Solr 
> actually uses this.  If it should be kept, the Scorable's score() method 
> could fetch it at that time (e.g. on-demand).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10197) UnifiedHighlighter should use builders for thread-safety

2021-12-05 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453674#comment-17453674
 ] 

David Smiley commented on LUCENE-10197:
---

I think this could be back-ported to 9.x so long as we retain the current API.  
Removed methods (e.g. setters) stay but become deprecated; so would the 
constructors.  Its not *quite* that simple though... the flags computation 
needs to be as it is already on 9x instead of moved off to the builder.

> UnifiedHighlighter should use builders for thread-safety
> 
>
> Key: LUCENE-10197
> URL: https://issues.apache.org/jira/browse/LUCENE-10197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Animesh Pandey
>Priority: Minor
>  Labels: newdev
> Attachments: LUCENE-10197.patch
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> UnifiedHighlighter is not thread-safe due to the presence of setters. We can 
> move the fields to builder so that the class becomes thread-safe.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10252) ValueSource.asDoubleValues shouldn't fetch score

2021-11-22 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447782#comment-17447782
 ] 

David Smiley commented on LUCENE-10252:
---

I commented out putting the "scorer" key in this map and Lucene & Solr tests 
pass.  If we don't even have tests that exercise this, I wonder why this was 
added in the first place?  CC [~romseygeek] 

> ValueSource.asDoubleValues shouldn't fetch score
> 
>
> Key: LUCENE-10252
> URL: https://issues.apache.org/jira/browse/LUCENE-10252
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/query
>Reporter: David Smiley
>Priority: Major
>
> The ValueSource.asDoubleValuesSource() method bridges the old API to the new 
> one.  It's rather important because boosting a query no longer has an old 
> API; in its place is using this method and passing to 
> FunctionScoreQuery.boostByValue.  Unfortunately, asDoubleValuesSource will 
> fetch/compute the score for the document in order to expose it in a Scorable 
> on the "scorer" key of the context Map.  AFAICT nothing in Lucene or Solr 
> actually uses this.  If it should be kept, the Scorable's score() method 
> could fetch it at that time (e.g. on-demand).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10252) ValueSource.asDoubleValues shouldn't fetch score

2021-11-22 Thread David Smiley (Jira)

David Smiley created LUCENE-10252:
-

 Summary: ValueSource.asDoubleValues shouldn't fetch score
 Key: LUCENE-10252
 URL: https://issues.apache.org/jira/browse/LUCENE-10252
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/query
Reporter: David Smiley


The ValueSource.asDoubleValuesSource() method bridges the old API to the new 
one.  It's rather important because boosting a query no longer has an old API; 
in its place is using this method and passing to 
FunctionScoreQuery.boostByValue.  Unfortunately, asDoubleValuesSource will 
fetch/compute the score for the document in order to expose it in a Scorable on 
the "scorer" key of the context Map.  AFAICT nothing in Lucene or Solr actually 
uses this.  If it should be kept, the Scorable's score() method could fetch it 
at that time (e.g. on-demand).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10201) Upgrade Spatial4j to 0.8

2021-10-29 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-10201.
---
Fix Version/s: main (9.0)
   Resolution: Fixed

> Upgrade Spatial4j to 0.8
> 
>
> Key: LUCENE-10201
> URL: https://issues.apache.org/jira/browse/LUCENE-10201
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Spatial4j has been at 0.8 for some time.  We should upgrade.
> [https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9660) gradle task cache should not cache --tests

2021-10-25 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434036#comment-17434036
 ] 

David Smiley commented on LUCENE-9660:
--

Thanks for reconsidering this.  Rob's argument was very convincing :)

> gradle task cache should not cache --tests
> --
>
> Key: LUCENE-9660
> URL: https://issues.apache.org/jira/browse/LUCENE-9660
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: David Smiley
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I recently ran a specific test at the CLI via gradle to see if a particular 
> build failure repeats.  It includes the {{--tests}} command line option to 
> specify the test.  The test passed.  Later I wanted to run it again; I 
> suspected it might be flakey.  Gradle completed in 10 seconds, and I'm 
> certain it didn't actually run the test. There was no printout and the 
> build/test-results/test/outputs/...  from the test run still had not changed 
> from previously.
> Mike Drob informed me of "gradlew cleanTest" but I'd prefer to not have to 
> know about that, at least not for the specific case of wanting to execute a 
> specific test.
> CC [~dweiss]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10202) spatial: expose dependencies using Gradle Feature Variants

2021-10-24 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-10202:
--
Summary: spatial: expose dependencies using Gradle Feature Variants  (was: 
Expose dependencies using Gradle Feature Variants)

> spatial: expose dependencies using Gradle Feature Variants
> --
>
> Key: LUCENE-10202
> URL: https://issues.apache.org/jira/browse/LUCENE-10202
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>
> The spatial-extras module has several dependencies.  However some of them 
> like spatial3d (aka Geo3d) are only needed for certain features.  Likewise, 
> JTS could be exposed here as well, and should be opt-in.  In Maven, these 
> should be "optional".  Gradle has a cool alternative for Gradle consumers to 
> select named "feature variants" this module could expose so that it doesn't 
> have to pick the right dependency versions.
> https://docs.gradle.org/current/userguide/feature_variants.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10202) Expose dependencies using Gradle Feature Variants

2021-10-24 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433521#comment-17433521
 ] 

David Smiley commented on LUCENE-10202:
---

[~daddywri] when I was working on the PR to name the "feature variant" 
implemented by the spatial3d module, I was contemplating what to name it.  I 
should probably pick the obvious name – "spatial3d" after the dependency it 
brings in.  But its name had me thinking (and we've been over this in the past) 
about how difficult it is to name that module.  "3d" gives people the wrong 
impression.  When I tell people about that module, I say it has "surface of a 
sphere geometry" (and yes it does ellepsoid, I know).  Thus perhaps 
"spatial-spherical" might have been a better name?

> Expose dependencies using Gradle Feature Variants
> -
>
> Key: LUCENE-10202
> URL: https://issues.apache.org/jira/browse/LUCENE-10202
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>
> The spatial-extras module has several dependencies.  However some of them 
> like spatial3d (aka Geo3d) are only needed for certain features.  Likewise, 
> JTS could be exposed here as well, and should be opt-in.  In Maven, these 
> should be "optional".  Gradle has a cool alternative for Gradle consumers to 
> select named "feature variants" this module could expose so that it doesn't 
> have to pick the right dependency versions.
> https://docs.gradle.org/current/userguide/feature_variants.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10202) Expose dependencies using Gradle Feature Variants

2021-10-24 Thread David Smiley (Jira)

David Smiley created LUCENE-10202:
-

 Summary: Expose dependencies using Gradle Feature Variants
 Key: LUCENE-10202
 URL: https://issues.apache.org/jira/browse/LUCENE-10202
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial-extras
Reporter: David Smiley
Assignee: David Smiley


The spatial-extras module has several dependencies.  However some of them like 
spatial3d (aka Geo3d) are only needed for certain features.  Likewise, JTS 
could be exposed here as well, and should be opt-in.  In Maven, these should be 
"optional".  Gradle has a cool alternative for Gradle consumers to select named 
"feature variants" this module could expose so that it doesn't have to pick the 
right dependency versions.

https://docs.gradle.org/current/userguide/feature_variants.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10201) Upgrade Spatial4j to 0.8

2021-10-23 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433371#comment-17433371
 ] 

David Smiley commented on LUCENE-10201:
---

I'll copy-paste the release notes:

_from CHANGES.md_

[#197|https://github.com/locationtech/spatial4j/pull/197]: Require Java 8, AKA 
v1.8. (David Smiley)

[#194|https://github.com/locationtech/spatial4j/pull/194]: Circles that cross a 
dateline can now be converted to a JTS Geometry. Previous attempts would throw 
an exception. (Stijn Caerts)

[#194|https://github.com/locationtech/spatial4j/pull/194]: JtsGeometry now 
supports input an Geometry that crosses the dateline multiple times (wraps the 
globe multiple times). Previous attempts would yield erroneous behavior. (Stijn 
Caerts)

[#188|https://github.com/locationtech/spatial4j/pull/188]: Upgraded to JTS 
1.17.0. This JTS release has a small API change. (Jim Hughes)

[#177|https://github.com/locationtech/spatial4j/issues/177]: Improve conversion 
of a Circle to Shape. JtsShapeFactory allows converting from a Shape object to 
a JTS Geometry object. Geodetic circles now translate to a polygon that has 
points equidistant from the center. Before the change, there was potentially a 
large inaccuracy. (Hrishi Bakshi)

[#163|https://github.com/locationtech/spatial4j/issues/163]: "Empty" points in 
JTS are now convertible to a Spatial4j Shape instead of throwing an exception. 
(David Smiley)

[#162|https://github.com/locationtech/spatial4j/issues/162]: Fixed WKT & 
GeoJSON [de]serialization of "empty" points and geometrycollections. (Jeen 
Broekstra, David Smiley)

[#165|https://github.com/locationtech/spatial4j/pull/165]: Added 
ShapeFactory.pointLatLon convenience method. (MoeweX)

[#167|https://github.com/locationtech/spatial4j/pull/167]: WKTWriter now has a 
means to customize the NumberFromat. (MoeweX)

[#175|https://github.com/locationtech/spatial4j/issues/175]: ShapesAsWKTModule, 
a Jackson databind module, didn't deserialize WKT inside JSON to a Spatial4j 
Shape at all. Now it does. It continues to serialize correctly. (David Smiley)

> Upgrade Spatial4j to 0.8
> 
>
> Key: LUCENE-10201
> URL: https://issues.apache.org/jira/browse/LUCENE-10201
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>
> Spatial4j has been at 0.8 for some time.  We should upgrade.
> [https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10201) Upgrade Spatial4j to 0.8

2021-10-23 Thread David Smiley (Jira)

David Smiley created LUCENE-10201:
-

 Summary: Upgrade Spatial4j to 0.8
 Key: LUCENE-10201
 URL: https://issues.apache.org/jira/browse/LUCENE-10201
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial-extras
Reporter: David Smiley
Assignee: David Smiley


Spatial4j has been at 0.8 for some time.  We should upgrade.

[https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10197) UnifiedHighlighter should use builders for thread-safety

2021-10-23 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433366#comment-17433366
 ] 

David Smiley commented on LUCENE-10197:
---

Development wouldn't be halted but it wouldn't be "released" until Lucene 10, 
likely in 2023.

Are you opposed to the GitHub PR process?  I find it superior for collaborative 
development than old patch files.

I took a brief look at your patch.  The builder itself looks nice but I think 
overall it misses the point because build() calls a bunch of setters on 
UnifiedHighlighter.  A key outcome here is for UH to be immutable.  The 
constructor for the UH could take this builder and populate itself accordingly. 
 Remember we need to support UH subclassing too.

> UnifiedHighlighter should use builders for thread-safety
> 
>
> Key: LUCENE-10197
> URL: https://issues.apache.org/jira/browse/LUCENE-10197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Animesh Pandey
>Priority: Minor
>  Labels: newdev
> Attachments: LUCENE-10197.patch
>
>
> UnifiedHighlighter is not thread-safe due to the presence of setters. We can 
> move the fields to builder so that the class becomes thread-safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10197) UnifiedHighlighter should use builders for thread-safety

2021-10-22 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433218#comment-17433218
 ] 

David Smiley commented on LUCENE-10197:
---

If this misses Nov 1st, then yes, thus won't ship till probably 2023.  Any way, 
I'm here to help review promptly.

> UnifiedHighlighter should use builders for thread-safety
> 
>
> Key: LUCENE-10197
> URL: https://issues.apache.org/jira/browse/LUCENE-10197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Animesh Pandey
>Priority: Minor
>  Labels: newdev
>
> UnifiedHighlighter is not thread-safe due to the presence of setters. We can 
> move the fields to builder so that the class becomes thread-safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10197) UnifiedHighlighter should use builders for thread-safety

2021-10-22 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433196#comment-17433196
 ] 

David Smiley commented on LUCENE-10197:
---

FYI Lucene 9 feature-freeze is November 1st.  Something like this, I'm not sure 
how comfortable I would be doing in a minor release.

> UnifiedHighlighter should use builders for thread-safety
> 
>
> Key: LUCENE-10197
> URL: https://issues.apache.org/jira/browse/LUCENE-10197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Animesh Pandey
>Priority: Minor
>  Labels: newdev
>
> UnifiedHighlighter is not thread-safe due to the presence of setters. We can 
> move the fields to builder so that the class becomes thread-safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9431) UnifiedHighlighter: Make WEIGHT_MATCHES the default

2021-10-22 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433195#comment-17433195
 ] 

David Smiley commented on LUCENE-9431:
--

Other than closing an issue that becomes invalid, contributors can't change the 
status of a Jira issue.  Assignees are only for committers and is more of a 
hint/clue that someone in particular intends to do it or shepherd a 
contribution in.  It's often overlooked/ignored.

> UnifiedHighlighter: Make WEIGHT_MATCHES the default
> ---
>
> Key: LUCENE-9431
> URL: https://issues.apache.org/jira/browse/LUCENE-9431
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Labels: newdev
> Fix For: main (9.0)
>
> Attachments: LUCENE-9431.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This mode uses Lucene's modern mechanism of exposing information that 
> previously required complicated highlighting machinery.  It's also likely to 
> generally work better out-of-the-box and with custom queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9431) UnifiedHighlighter: Make WEIGHT_MATCHES the default

2021-10-22 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-9431.
--
  Assignee: David Smiley
Resolution: Fixed

> UnifiedHighlighter: Make WEIGHT_MATCHES the default
> ---
>
> Key: LUCENE-9431
> URL: https://issues.apache.org/jira/browse/LUCENE-9431
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Labels: newdev
> Fix For: main (9.0)
>
> Attachments: LUCENE-9431.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This mode uses Lucene's modern mechanism of exposing information that 
> previously required complicated highlighting machinery.  It's also likely to 
> generally work better out-of-the-box and with custom queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9431) UnifiedHighlighter: Make WEIGHT_MATCHES the default

2021-10-07 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425542#comment-17425542
 ] 

David Smiley commented on LUCENE-9431:
--

Animesh; do you have a GitHub account?  There is some peer review going on 
there.

> UnifiedHighlighter: Make WEIGHT_MATCHES the default
> ---
>
> Key: LUCENE-9431
> URL: https://issues.apache.org/jira/browse/LUCENE-9431
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Priority: Minor
>  Labels: newdev
> Fix For: main (9.0)
>
> Attachments: LUCENE-9431.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This mode uses Lucene's modern mechanism of exposing information that 
> previously required complicated highlighting machinery.  It's also likely to 
> generally work better out-of-the-box and with custom queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9431) UnifiedHighlighter: Make WEIGHT_MATCHES the default

2021-10-06 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425235#comment-17425235
 ] 

David Smiley commented on LUCENE-9431:
--

Thanks Animesh!  There was a documentation aspect that needed tending to.  I 
created a PR that is now linked.

> UnifiedHighlighter: Make WEIGHT_MATCHES the default
> ---
>
> Key: LUCENE-9431
> URL: https://issues.apache.org/jira/browse/LUCENE-9431
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Priority: Minor
>  Labels: newdev
> Fix For: main (9.0)
>
> Attachments: LUCENE-9431.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This mode uses Lucene's modern mechanism of exposing information that 
> previously required complicated highlighting machinery.  It's also likely to 
> generally work better out-of-the-box and with custom queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9431) UnifiedHighlighter: Make WEIGHT_MATCHES the default

2021-10-04 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424284#comment-17424284
 ] 

David Smiley commented on LUCENE-9431:
--

If I recall, the ability to turn off support for multiTermQueries (e.g. 
wildcards) and position sensitive queries (e.g. phrases) in the name of 
performance or whatever is incompatible with WEIGHT_MATCHES mode.

if you look at getFlags, the idea is to add WEIGHT_MATCHES to it so long as 
both PHRASES & MULTITERM_QUERY are already in it, and that 
PASSAGE_RELEVANCY_OVER_SPEED is not.

> UnifiedHighlighter: Make WEIGHT_MATCHES the default
> ---
>
> Key: LUCENE-9431
> URL: https://issues.apache.org/jira/browse/LUCENE-9431
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Priority: Minor
>  Labels: newdev
> Fix For: main (9.0)
>
>
> This mode uses Lucene's modern mechanism of exposing information that 
> previously required complicated highlighting machinery.  It's also likely to 
> generally work better out-of-the-box and with custom queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10018) Remove Fields from TermVector reader related usage

2021-07-29 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390274#comment-17390274
 ] 

David Smiley commented on LUCENE-10018:
---

[~rmuir] what do you think of the goal stated in this Jira issue?  The outcome 
would leave Fields but it would be low-level (within PostingsFormats) and 
dedicated to one purpose instead of involving TVs.  As a user, the code is 
easier to navigate when this class is used for one purpose and not two.  I know 
you vetoed the PR but that PR is merely one approach.

If that sounds reasonable, the next question is what its replacement is for the 
TV use-case.  I think relegating Fields to a low level internal use in some 
sense reduces the mental overhead a Lucene user has in things they need to know 
about, freeing up space for maybe introducing a new public class – thus I think 
this exercise is net neutral on classes that matter.  Do you have an idea here? 
 My (second?) idea I shared is 
[https://github.com/apache/lucene/pull/180#issuecomment-876482149] which would 
add some methods to TermVectors making it be somewhat like Fields.  I had 
started down that path briefly (maybe an hour) and found the implementation to 
be a bit awkward (albeit doable I guess).  I had second thoughts on this and 
shelved it to go for a simple class DocTermVectors to replace Fields for TVs -- 
[https://github.com/apache/lucene/pull/216]

> Remove Fields from TermVector reader related usage
> --
>
> Key: LUCENE-10018
> URL: https://issues.apache.org/jira/browse/LUCENE-10018
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/codecs, core/index
>Reporter: Zach Chen
>Assignee: David Smiley
>Priority: Minor
>
> This is a spin-off issue from [https://github.com/apache/lucene/pull/180] for 
> Fields class deprecation / removal in TermVector reader usage. As Fields 
> class is generally meant as internal class reserved for posting index, we 
> would like to have some dedicated TermVector abstractions and APIs instead. 
> The relevant discussions are available here:
>  * [https://github.com/apache/lucene/pull/180#pullrequestreview-686320076]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-863254651]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-863262562]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-863775298]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-864720190]
>  * [https://github.com/apache/lucene/pull/180#pullrequestreview-688023901]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-871155896]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-871922823]
>  
> One potential API design for this can be found here 
> [https://github.com/apache/lucene/pull/180#issuecomment-871155896] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-07-19 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383314#comment-17383314
 ] 

David Smiley commented on LUCENE-9959:
--

Rob, please don't needlessly personalize your critical feedback towards me, and 
plus the swearing just magnifies whatever your message is against the person 
and not the code.  And I'm sure you are smart enough to look at just about 
anyone's work and understand where the motivation comes from, even if you don't 
agree with the approach.  Let's not talk to each other or each other's code in 
this way.

I'm not pleased with the extra public class either – 
[https://github.com/apache/lucene/pull/180#pullrequestreview-686320076] I said 
as much.  At least "Fields" can become purely internal and thus the net change 
is just one more class for TVs (Zach added "TermVectors").  In 
[https://github.com/apache/lucene/pull/180#issuecomment-876482149] I thought of 
an approach that may work but last night I had second thoughts and went for 
simplicity of the change (adding DocTermVectors).  Based on your feedback Rob, 
lets ignore what I came up with last night and I'll post a different PR that 
doesn't introduce it.

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10003) Disallow C-style array declarations

2021-07-07 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376554#comment-17376554
 ] 

David Smiley commented on LUCENE-10003:
---

This isn't about bugs; it's about style.
As long as the overhead is only executed on modified files, then surely it's 
negligible?  Or only do in GitHub PR, even more neglible :-)

Spotless is cool but adding a new custom formatter step is much more work than 
what I did above.  A spotless formatter step has to actually do the repair, 
whereas the regexp detector above doesn't.  The Spotless people shared docs 
with me on how to write a formatter step: 
https://github.com/diffplug/spotless/blob/main/CONTRIBUTING.md#how-to-add-a-new-formatterstep

> Disallow C-style array declarations
> ---
>
> Key: LUCENE-10003
> URL: https://issues.apache.org/jira/browse/LUCENE-10003
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Google Java Format, that which we adhere to, disallows c-style array 
> declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
> It's also known to "Error Prone":
> https://errorprone.info/bugpattern/MixedArrayDimensions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10003) Disallow C-style array declarations

2021-07-06 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375864#comment-17375864
 ] 

David Smiley commented on LUCENE-10003:
---

I did some experimentation with using a regexp in 
{{validate-source-patterns.gradle}} {{invalidJavaOnlyPatterns}} and came up 
with this:
{noformat}
  (~$/(?m)^(?!\s*(/[/*]|\*).*).*\b\w+\s+\w+(\[])+\s*[=,;]/$) : 'C style 
array declarations disallowed; move the brackets'
{noformat}
The first part is a negative lookahead to ensure the line doesn't start with a 
comment, because there are a number of comments that would otherwise match this 
expression.  There is only one place where there's a false positive 
(MemoryIndex line 949), and it can be addressed trivially via adding a newline.

I'll push a PR that just does the changes, and separately push another PR 
oriented on the automated detection using the above regexp.

> Disallow C-style array declarations
> ---
>
> Key: LUCENE-10003
> URL: https://issues.apache.org/jira/browse/LUCENE-10003
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>
> The Google Java Format, that which we adhere to, disallows c-style array 
> declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
> It's also known to "Error Prone":
> https://errorprone.info/bugpattern/MixedArrayDimensions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-10018) Remove Fields from TermVector reader related usage

2021-07-02 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-10018:
-

Assignee: David Smiley

> Remove Fields from TermVector reader related usage
> --
>
> Key: LUCENE-10018
> URL: https://issues.apache.org/jira/browse/LUCENE-10018
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/codecs, core/index
>Reporter: Zach Chen
>Assignee: David Smiley
>Priority: Minor
>
> This is a spin-off issue from [https://github.com/apache/lucene/pull/180] for 
> Fields class deprecation / removal in TermVector reader usage. As Fields 
> class is generally meant as internal class reserved for posting index, we 
> would like to have some dedicated TermVector abstractions and APIs instead. 
> The relevant discussions are available here:
>  * [https://github.com/apache/lucene/pull/180#pullrequestreview-686320076]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-863254651]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-863262562]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-863775298]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-864720190]
>  * [https://github.com/apache/lucene/pull/180#pullrequestreview-688023901]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-871155896]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-871922823]
>  
> One potential API design for this can be found here 
> [https://github.com/apache/lucene/pull/180#issuecomment-871155896] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10018) Remove Fields from TermVector reader related usage

2021-07-02 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373723#comment-17373723
 ] 

David Smiley commented on LUCENE-10018:
---

I'm happy to do this after LUCENE-9959 (TermVectorReader/Base stuff)

> Remove Fields from TermVector reader related usage
> --
>
> Key: LUCENE-10018
> URL: https://issues.apache.org/jira/browse/LUCENE-10018
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/codecs, core/index
>Reporter: Zach Chen
>Priority: Minor
>
> This is a spin-off issue from [https://github.com/apache/lucene/pull/180] for 
> Fields class deprecation / removal in TermVector reader usage. As Fields 
> class is generally meant as internal class reserved for posting index, we 
> would like to have some dedicated TermVector abstractions and APIs instead. 
> The relevant discussions are available here:
>  * [https://github.com/apache/lucene/pull/180#pullrequestreview-686320076]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-863254651]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-863262562]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-863775298]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-864720190]
>  * [https://github.com/apache/lucene/pull/180#pullrequestreview-688023901]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-871155896]
>  * [https://github.com/apache/lucene/pull/180#issuecomment-871922823]
>  
> One potential API design for this can be found here 
> [https://github.com/apache/lucene/pull/180#issuecomment-871155896] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8638) Remove deprecated code in master

2021-06-29 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-8638:
-
Priority: Blocker  (was: Major)

> Remove deprecated code in master
> 
>
> Key: LUCENE-8638
> URL: https://issues.apache.org/jira/browse/LUCENE-8638
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Blocker
> Fix For: main (9.0)
>
>
> There are a number of deprecations in master that should be removed. This 
> issue is to keep track of deprecations as a whole, some individual 
> deprecations may require their own issues.
>  
> Work on this issue should be pushed to the `master-deprecations` branch on 
> gitbox



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9204) Move span queries to the queries module

2021-06-20 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366383#comment-17366383
 ] 

David Smiley commented on LUCENE-9204:
--

Nice work Michael G!

bq.  baseline and candidate code are the same

Thus only the Task and QPS columns (first two) are the only thing interesting 
in the output.  I initially looked over your comment about that and was 
wondering by the end what was being compared ;-)  That said, looking at the 
last comparison, can we see Intervals is substantially faster than Spans?

> Move span queries to the queries module
> ---
>
> Key: LUCENE-9204
> URL: https://issues.apache.org/jira/browse/LUCENE-9204
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have a slightly odd situation currently, with two parallel query 
> structures for building complex positional queries: the long-standing span 
> queries, in core; and interval queries, in the queries module.  Given that 
> interval queries solve at least some of the problems we've had with Spans, I 
> think we should be pushing users more towards these implementations.  It's 
> counter-intuitive to do that when Spans are in core though.  I've opened this 
> issue to discuss moving the spans package as a whole to the queries module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9143) Add more static analysis and clean up resulting warnings/errors

2021-06-20 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-9143.
--
Resolution: Fixed

Resolving.  We can and are already doing this incrementally.   I don't think an 
umbrella issue like this is useful for this type of work.

> Add more static analysis and clean up resulting warnings/errors
> ---
>
> Key: LUCENE-9143
> URL: https://issues.apache.org/jira/browse/LUCENE-9143
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Part of the discussion with Mark Miller was the need for better bug finding - 
> especially in tricky areas like concurrency. One of the ways we can do this 
> is with added static analysis and increased tooling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10003) Disallow C-style array declarations

2021-06-15 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363603#comment-17363603
 ] 

David Smiley commented on LUCENE-10003:
---

I think it's enough for GitHub PRs to do the checks, and thus don't fail CI 
builds.  Could/should "precommit" be made to be equivalent and run error-prone? 
 I share Rob's concern of not wanting checks that only occur in nightly jobs, 
which is rather annoying to deal with for non-errors especially.  I *think* the 
intention with our error-prone config was that over time we'd improve more and 
more of the code and then we'd be able to let error-prone run more of it's 
available rules.  ~114 rules are disabled in error-prone.gradle today.

> Disallow C-style array declarations
> ---
>
> Key: LUCENE-10003
> URL: https://issues.apache.org/jira/browse/LUCENE-10003
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>
> The Google Java Format, that which we adhere to, disallows c-style array 
> declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
> It's also known to "Error Prone":
> https://errorprone.info/bugpattern/MixedArrayDimensions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10003) Disallow C-style array declarations

2021-06-14 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363170#comment-17363170
 ] 

David Smiley commented on LUCENE-10003:
---

I had IntelliJ check for this, and it happens in hundreds of spots.  It can fix 
them automatically as well.  I'm happy to submit a PR to fix these.

I reported this to the Google Java Format plugin and they rejected the idea 
because it's just a formatter of source code, and this matter isn't formatting 
(strictly speaking).  I figure the same fate might apply to suggesting it to 
Spotless.

We can do this in {{error-prone.gradle}} via a config arg like so: 
{{'-Xep:MixedArrayDimensions:ERROR'}}
Sadly our "error-prone" checks only run nightly but let's just add it here any 
way?

> Disallow C-style array declarations
> ---
>
> Key: LUCENE-10003
> URL: https://issues.apache.org/jira/browse/LUCENE-10003
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Minor
>
> The Google Java Format, that which we adhere to, disallows c-style array 
> declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
> It's also known to "Error Prone":
> https://errorprone.info/bugpattern/MixedArrayDimensions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-10003) Disallow C-style array declarations

2021-06-14 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-10003:
-

Assignee: David Smiley

> Disallow C-style array declarations
> ---
>
> Key: LUCENE-10003
> URL: https://issues.apache.org/jira/browse/LUCENE-10003
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>
> The Google Java Format, that which we adhere to, disallows c-style array 
> declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
> It's also known to "Error Prone":
> https://errorprone.info/bugpattern/MixedArrayDimensions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10003) Disallow C-style array declarations

2021-06-14 Thread David Smiley (Jira)

David Smiley created LUCENE-10003:
-

 Summary: Disallow C-style array declarations
 Key: LUCENE-10003
 URL: https://issues.apache.org/jira/browse/LUCENE-10003
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: David Smiley


The Google Java Format, that which we adhere to, disallows c-style array 
declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
It's also known to "Error Prone":
https://errorprone.info/bugpattern/MixedArrayDimensions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9145) Address warnings found by static analysis

2021-06-14 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363149#comment-17363149
 ] 

David Smiley commented on LUCENE-9145:
--

Should this remain open or maybe close it and file new more specific issues?  
And maybe file no issue; only do a PR if it's simple.

> Address warnings found by static analysis
> -
>
> Key: LUCENE-9145
> URL: https://issues.apache.org/jira/browse/LUCENE-9145
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3973) Incorporate PMD / FindBugs

2021-06-14 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-3973.
--
Resolution: Won't Fix

We use error-prone now.  Closing this issue.

> Incorporate PMD / FindBugs
> --
>
> Key: LUCENE-3973
> URL: https://issues.apache.org/jira/browse/LUCENE-3973
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Chris Male
>Priority: Major
>  Labels: newdev
> Attachments: LUCENE-3973.patch, LUCENE-3973.patch, LUCENE-3973.patch, 
> LUCENE-3973.patch, LUCENE-3973.patch, LUCENE-3973.patch, LUCENE-3973.patch, 
> LUCENE-3973.patch, core.html, findbugs-lucene.patch, solr-core.html
>
>
> This has been touched on a few times over the years.  Having static analysis 
> as part of our build seems like a big win.  For example, we could use PMD to 
> look at {{System.out.println}} statements like discussed in LUCENE-3877 and 
> we could possibly incorporate the nocommit / @author checks as well.
> There are a few things to work out as part of this:
> - Should we use both PMD and FindBugs or just one of them? They look at code 
> from different perspectives (bytecode vs source code) and target different 
> issues.  At the moment I'm in favour of trying both but that might be too 
> heavy handed for our needs.
> - What checks should we use? There's no point having the analysis if it's 
> going to raise too many false-positives or problems we don't deem 
> problematic.  
> - How should the analysis be integrated in our build? Need to work out when 
> the analysis should run, how it should be incorporated in Ant and/or Maven, 
> what impact errors should have.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-7764) Add FindBugs analysis to precommit

2021-06-14 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-7764.
--
Resolution: Won't Fix

We use "error-prone" instead now.

> Add FindBugs analysis to precommit
> --
>
> Key: LUCENE-7764
> URL: https://issues.apache.org/jira/browse/LUCENE-7764
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/build
>Reporter: Daniel Jelinski
>Priority: Minor
> Attachments: LUCENE-7764-ivy.patch, LUCENE-7764.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8638) Remove deprecated code in master

2021-06-09 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360556#comment-17360556
 ] 

David Smiley commented on LUCENE-8638:
--

Folks, let's return to actually finish at least what we started.

[~rmuir], on the dev list in February [you expressed 
concerns|https://lists.apache.org/thread.html/r89bb1ad9e959e8552aefc917e28776e997fd8506499655f736a986c5%40%3Cdev.lucene.apache.org%3E]
 with handling a bunch of deprecations in one go.  You said it was "scary". 
There's a branch here "master-deprecations" with a bunch of changes. The 
following GitHub link shows a diff with master: 
[https://github.com/apache/lucene-solr/compare/master...master-deprecations] 
Much of the changes are trivial removals of methods/classes that are not used 
within our codebase.  Some changes are not so trivial but they can be called 
out in the PR for better peer review.  Alan has in fact called out a bunch of 
matters above.  WDYT?

> Remove deprecated code in master
> 
>
> Key: LUCENE-8638
> URL: https://issues.apache.org/jira/browse/LUCENE-8638
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: main (9.0)
>
>
> There are a number of deprecations in master that should be removed. This 
> issue is to keep track of deprecations as a whole, some individual 
> deprecations may require their own issues.
>  
> Work on this issue should be pushed to the `master-deprecations` branch on 
> gitbox



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-05-29 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353770#comment-17353770
 ] 

David Smiley commented on LUCENE-9379:
--

Rob, please tone down your language.  Don't speak of how much others are 
"uneducated"; merely point to what you want to show to help others understand 
your point of view.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8143) Remove SpanBoostQuery

2021-05-26 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352054#comment-17352054
 ] 

David Smiley commented on LUCENE-8143:
--

I agree.  If inner boosting is broken, SpanBoostQuery is trap-py and not 
providing value.

> Remove SpanBoostQuery
> -
>
> Key: LUCENE-8143
> URL: https://issues.apache.org/jira/browse/LUCENE-8143
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Alan Woodward
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I initially added it so that span queries could still be boosted, but this 
> was actually a mistake: boosts are ignored on inner span queries, only the 
> boost of the top-level span query, the one that performs scoring, is not 
> ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9836) Fix 8.x Maven Validation and publication to work with Maven Central and HTTPS again; remove pure Maven build (did not work anymore)

2021-05-11 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342820#comment-17342820
 ] 

David Smiley commented on LUCENE-9836:
--

Thanks Uwe!

> Fix 8.x Maven Validation and publication to work with Maven Central and HTTPS 
> again; remove pure Maven build (did not work anymore)
> ---
>
> Key: LUCENE-9836
> URL: https://issues.apache.org/jira/browse/LUCENE-9836
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 8.x, 8.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: 8.x, 8.9
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currenty the Maven related stuff in 8.x completely fails, because 
> Maven-Ant-Tasks is so outdated, that it has hardcoded Maven Central without 
> HTTPS. This makes downloading fail.
> You can mostly fix this with an additional remote repository, so it can 
> fallback to that one.
> I'd like to do the following on 8.x:
> - Remove the Ant-Support for Maven: {{ant run-maven-build}} (this no longer 
> bootsraps, because Maven Ant Tasks can't download Maven, as here is no way to 
> override hardcoded repo; I have a workaround in forbiddenapis, but that's too 
> complicated, so I will simply remoe that task)
> - Fix the dependency checker: This works, but unfortunately there are some 
> artifacts which itsself have "http:" in their POM file, those fail to 
> download. Newer Maven versions have an hardcoded "fixer" in it, but Maven Ant 
> Tasks again is missing this. I have no idea how to handle that.
> I already tried some heavy committing, but the only way to solve this is to 
> replace maven-ant-tasks with the followup ant task. I am not sure if this 
> worth the trouble!
> What do others think? Should we maybe simply disable the Maven Dependency 
> checker?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9836) Fix 8.x Maven Validation and publication to work with Maven Central and HTTPS again; remove pure Maven build (did not work anymore)

2021-05-10 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341985#comment-17341985
 ] 

David Smiley commented on LUCENE-9836:
--

Is there any chance this is related to the smokerelease failure:
Missing artifact 
/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-Solr-SmokeRelease-8.x/lucene/build/smokeTestRelease/tmp/maven/org/apache/lucene/lucene-analysis-modules-aggregator/8.9.0/lucene-analysis-modules-aggregator-8.9.0.pom
 ?
Hossman suggested this maybe in an email March 18th in response to a 
smokerelease failure.  I saw the same failure today:
https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-SmokeRelease-8.x/ 
(#199).  Last 5 smoke tests fail.  Last passed March 15th, immediately prior to 
the last commit here. 

> Fix 8.x Maven Validation and publication to work with Maven Central and HTTPS 
> again; remove pure Maven build (did not work anymore)
> ---
>
> Key: LUCENE-9836
> URL: https://issues.apache.org/jira/browse/LUCENE-9836
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 8.x, 8.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: 8.x, 8.9
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currenty the Maven related stuff in 8.x completely fails, because 
> Maven-Ant-Tasks is so outdated, that it has hardcoded Maven Central without 
> HTTPS. This makes downloading fail.
> You can mostly fix this with an additional remote repository, so it can 
> fallback to that one.
> I'd like to do the following on 8.x:
> - Remove the Ant-Support for Maven: {{ant run-maven-build}} (this no longer 
> bootsraps, because Maven Ant Tasks can't download Maven, as here is no way to 
> override hardcoded repo; I have a workaround in forbiddenapis, but that's too 
> complicated, so I will simply remoe that task)
> - Fix the dependency checker: This works, but unfortunately there are some 
> artifacts which itsself have "http:" in their POM file, those fail to 
> download. Newer Maven versions have an hardcoded "fixer" in it, but Maven Ant 
> Tasks again is missing this. I have no idea how to handle that.
> I already tried some heavy committing, but the only way to solve this is to 
> replace maven-ant-tasks with the followup ant task. I am not sure if this 
> worth the trouble!
> What do others think? Should we maybe simply disable the Maven Dependency 
> checker?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9938) ConjunctionDISI should recognize DocIdSetIterator.all

2021-04-26 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-9938.
--
Resolution: Invalid

Nevermind.  The TPI design requires that the "approximation" DISI coupled with 
it is positioned, because TPI.matches() is no-arg and thus relies on the 
approximation's position.  Maybe the issue is still valid for other scenarios 
not coupled with a TPI but I can't think of any.  I wish the TPI design were 
different; alas.

> ConjunctionDISI should recognize DocIdSetIterator.all
> -
>
> Key: LUCENE-9938
> URL: https://issues.apache.org/jira/browse/LUCENE-9938
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: David Smiley
>Priority: Major
>
> ConjunctionDISI creates an aggregation of some leaf DocIdSetIterators & 
> TwoPhaseIterators.  It's not uncommon for DocIdSetIterator.all(...) to wind 
> up inside it, producing an aggregation that is a little more bulky than it 
> needs be.  This can happen frequently with the ValueSource APIs which have 
> queries like FunctionMatchQuery that use this for the "approximation" 
> alongside a TwoPhaseIterator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9938) ConjunctionDISI should recognize DocIdSetIterator.all

2021-04-26 Thread David Smiley (Jira)

David Smiley created LUCENE-9938:


 Summary: ConjunctionDISI should recognize DocIdSetIterator.all
 Key: LUCENE-9938
 URL: https://issues.apache.org/jira/browse/LUCENE-9938
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: David Smiley


ConjunctionDISI creates an aggregation of some leaf DocIdSetIterators & 
TwoPhaseIterators.  It's not uncommon for DocIdSetIterator.all(...) to wind up 
inside it, producing an aggregation that is a little more bulky than it needs 
be.  This can happen frequently with the ValueSource APIs which have queries 
like FunctionMatchQuery that use this for the "approximation" alongside a 
TwoPhaseIterator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9934) MuseDev on Lucene?

2021-04-23 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330804#comment-17330804
 ] 

David Smiley commented on LUCENE-9934:
--

I'm asking you to please suggest how to word the request to Infra.  I have no 
idea what an Infra engineer needs to be told to do this.

> MuseDev on Lucene?
> --
>
> Key: LUCENE-9934
> URL: https://issues.apache.org/jira/browse/LUCENE-9934
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Thomas DuBuisson
>Priority: Minor
>
> Muse was running on the combined LuceneSolr github repository but is not 
> running on the lucene (or solr) repositories.  The analysis seemed to be 
> providing appreciated feedback, should we ask ASF Infra to install the app on 
> the solr and lucene repositories?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9934) MuseDev on Lucene?

2021-04-21 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327079#comment-17327079
 ] 

David Smiley commented on LUCENE-9934:
--

+1 Yes definitely.  The "INFRA" Jira project is how we communicate with ASF 
Infra.  We could ask for Lucene & Solr in one go.  How exactly do we ask?  I 
searched existing tickets for "muse" and didn't find anything relevant.

> MuseDev on Lucene?
> --
>
> Key: LUCENE-9934
> URL: https://issues.apache.org/jira/browse/LUCENE-9934
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Thomas DuBuisson
>Priority: Minor
>
> Muse was running on the combined LuceneSolr github repository but is not 
> running on the lucene (or solr) repositories.  The analysis seemed to be 
> providing appreciated feedback, should we ask ASF Infra to install the app on 
> the solr and lucene repositories?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9334) Require consistency between data-structures on a per-field basis

2021-04-20 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326128#comment-17326128
 ] 

David Smiley commented on LUCENE-9334:
--

Mike, the issue you just filed is effectively a duplicate of SOLR-15356 which I 
spent time debugging for that one.  Already solved :-). I sent a message to the 
dev list about this too the other day.

> Require consistency between data-structures on a per-field basis
> 
>
> Key: LUCENE-9334
> URL: https://issues.apache.org/jira/browse/LUCENE-9334
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Blocker
> Fix For: main (9.0)
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Follow-up of 
> https://lists.apache.org/thread.html/r747de568afd7502008c45783b74cc3aeb31dab8aa60fcafaf65d5431%40%3Cdev.lucene.apache.org%3E.
> We would like to start requiring consitency across data-structures on a 
> per-field basis in order to make it easier to do the right thing by default: 
> range queries can run faster if doc values are enabled, sorted queries can 
> run faster if points by indexed, etc.
> This would be a big change, so it should be rolled out in a major.
> Strict validation is tricky to implement, but we should still implement 
> best-effort validation:
>  - Documents all use the same data-structures, e.g. it is illegal for a 
> document to only enable points and another document to only enable doc values,
>  - When possible, check whether values are consistent too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9836) Fix 8.x Maven Validation and publication to work with Maven Central and HTTPS again; remove pure Maven build (did not work anymore)

2021-04-17 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324302#comment-17324302
 ] 

David Smiley commented on LUCENE-9836:
--

I'm a bit confused what's going away.  Would an "mvn install -DskipTests" work?

> Fix 8.x Maven Validation and publication to work with Maven Central and HTTPS 
> again; remove pure Maven build (did not work anymore)
> ---
>
> Key: LUCENE-9836
> URL: https://issues.apache.org/jira/browse/LUCENE-9836
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 8.x, 8.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: 8.x, 8.9
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currenty the Maven related stuff in 8.x completely fails, because 
> Maven-Ant-Tasks is so outdated, that it has hardcoded Maven Central without 
> HTTPS. This makes downloading fail.
> You can mostly fix this with an additional remote repository, so it can 
> fallback to that one.
> I'd like to do the following on 8.x:
> - Remove the Ant-Support for Maven: {{ant run-maven-build}} (this no longer 
> bootsraps, because Maven Ant Tasks can't download Maven, as here is no way to 
> override hardcoded repo; I have a workaround in forbiddenapis, but that's too 
> complicated, so I will simply remoe that task)
> - Fix the dependency checker: This works, but unfortunately there are some 
> artifacts which itsself have "http:" in their POM file, those fail to 
> download. Newer Maven versions have an hardcoded "fixer" in it, but Maven Ant 
> Tasks again is missing this. I have no idea how to handle that.
> I already tried some heavy committing, but the only way to solve this is to 
> replace maven-ant-tasks with the followup ant task. I am not sure if this 
> worth the trouble!
> What do others think? Should we maybe simply disable the Maven Dependency 
> checker?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9838) simd version of VectorUtil.dotProduct

2021-03-20 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305572#comment-17305572
 ] 

David Smiley commented on LUCENE-9838:
--

Rob, is the concern that it'll be forever before Lucene requires JDK 16 or 
later?  Yeah.  I think there are ways to mitigate that.  Lucene could have a 
module containing functionality that has different implementations for 
different JVMs.  It could be published as either a multi-release JAR file or 
separate JAR files that are compatible.  There's [a blog post on 
gradle.com|https://blog.gradle.org/mrjars] discussing these techniques.  Today, 
Lucene has FutureArrays and FutureObjects classes which are kind of a baby step 
to these ideas.

CC [~uschindler]

> simd version of VectorUtil.dotProduct
> -
>
> Key: LUCENE-9838
> URL: https://issues.apache.org/jira/browse/LUCENE-9838
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9838.patch, LUCENE-9838_standalone.patch
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Followup to LUCENE-9837
> Let's explore using JDK 16 vector API to speed this up more. It might be a 
> hassle to try to MR-JAR/package up for users (adding commandline flags and 
> stuff), but it gives good performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-9078) Term vectors options should not be configurable per-doc

2021-03-15 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-9078:
-
Comment: was deleted

(was: Rebecca Sheffer )

> Term vectors options should not be configurable per-doc
> ---
>
> Key: LUCENE-9078
> URL: https://issues.apache.org/jira/browse/LUCENE-9078
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>
> Make term vectors constant across the index. Remove the user ability to 
> modify the term vector options per doc, IndexWriter allows this.
> Once done, consider removing Fields, as the list of fields could be obtained 
> from FieldInfos. See the discussion in LUCENE-8041.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-9078) Term vectors options should not be configurable per-doc

2021-03-15 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-9078:
-
Comment: was deleted

(was: Becca Sheffer 1061 university place Schenectady NY 12308 need more info  
rebeccasheffe...@gmail.com)

> Term vectors options should not be configurable per-doc
> ---
>
> Key: LUCENE-9078
> URL: https://issues.apache.org/jira/browse/LUCENE-9078
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>
> Make term vectors constant across the index. Remove the user ability to 
> modify the term vector options per doc, IndexWriter allows this.
> Once done, consider removing Fields, as the list of fields could be obtained 
> from FieldInfos. See the discussion in LUCENE-8041.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12899) geodist returns polygon with incorrect distance

2021-03-11 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299566#comment-17299566
 ] 

David Smiley commented on SOLR-12899:
-

No.  JTS's classes are resolved dynamically in a rather unusual way, and the 
consequence is that JTS must go in the same place (same Classloader) as where 
lucene-spatial-extras is placed – WEB-INF/lib.  One day I imagine Solr 
separating out these spatial components into a separate 1st party package from 
core, and I think that would help the matter.

> geodist returns polygon with incorrect distance
> ---
>
> Key: SOLR-12899
> URL: https://issues.apache.org/jira/browse/SOLR-12899
> Project: Solr
>  Issue Type: Bug
>  Components: spatial
>Reporter: Neil Ireson
>Priority: Minor
>
> I have an RPT field which contains a mix of points and polygons.
> When I perform a geodist query, where the query point is within an indexed 
> polygon, the polygon is correctly returned, however it is sorted as being the 
> most distant, and the returned distance to the polygon exceeds the maximum 
> distance specified in the query.
> Obviously some calculation is being performed which puts the polygon within 
> 100m of the query point, however the distance being calculated for sorting 
> or/and response is not the same. It appears from playing with the query that 
> if the Polygon is within the query distance then it is always returned as the 
> last document, whether or not the documents are sorted.
>  
>  
> This is the query and the final two documents returned...
>  
> http://localhost:8983/solr/Naptan/select?q=*:*=*,dist:geodist()=100=\{!geofilt}=location=53.3805565,-1.4645408=0.1=geodist()%20asc
> ...
> |35| |
> |ATCOCode|"370022835"|
> |StopType|"BCT"|
> |StopCategory|"Bus"|
> |location| |
> |0|"53.3813701862,-1.4650627934"|
> |_version_|161497943831688|
> |dist|0.096849374|
> |36| |
> |ATCOCode|"502432214"|
> |StopType|"BCS"|
> |StopCategory|"Bus"|
> |location| |
> |0|"POLYGON ((-1.4646256 53.3796518, -1.4635259 53.3796806, -1.4636171 
> 53.3805894, -1.4627105 53.3810406, -1.4647973 53.3811781, -1.4646256 
> 53.3796518))"|
> |_version_|1615121382652248000|
> |dist|20015.115|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14827) Refactor schema loading to not use XPath

2021-03-10 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299184#comment-17299184
 ] 

David Smiley commented on SOLR-14827:
-

[~noble.paul] I observed that this commit included changes to 
CloudConfigSetService that appear to be dormant/unused relating to caching 
config nodes.  Am I missing something?  I suspect your intention was to re-used 
the DOM (parsed XML) but that isn't happening here.  BTW I recall the 
CuratorFramework has facilities for this sort of thing to make this transparent.

> Refactor schema loading to not use XPath
> 
>
> Key: SOLR-14827
> URL: https://issues.apache.org/jira/browse/SOLR-14827
> Project: Solr
>  Issue Type: Task
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Labels: perfomance
> Fix For: 8.8
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> XPath is slower compared to DOM. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15198) Slf4j logs threadName incorrectly in some cases

2021-03-09 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298544#comment-17298544
 ] 

David Smiley commented on SOLR-15198:
-

I think these thread names with MDC is deliberate.  See 
org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor#execute

> Slf4j logs threadName incorrectly in some cases
> ---
>
> Key: SOLR-15198
> URL: https://issues.apache.org/jira/browse/SOLR-15198
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: logging
>Affects Versions: 8.6
>Reporter: Megan Carey
>Priority: Minor
>
> I'm running Solr 8.6, and I'm seeing some logs report threadName with the 
> entire log message and MDC. I haven't dug in too much, but CoreContainer logs 
> seem to be the biggest culprit. Not sure if it's an issue of thread naming 
> (including delimiter in name?), SolrLogFormat parsing, or something else 
> altogether.
> ```
> { [
>    CoreAdminHandler.action: CREATE
>    CoreAdminHandler.asyncId: 
> rebalance_replicas_trigger/41b16883a3bcTdzc5xqm9leudgekft79dpj5zl/372246269149506
>    collection: collectionName
>    core: collectionName_shard1_0_0_0_1_0_0_0_1_1_0_1_0_0_replica_s7606
>    level: INFO
>    logger: org.apache.solr.core.CoreContainer
>    message: Creating SolrCore 
> 'collectionName_shard1_0_0_0_1_0_0_0_1_1_0_1_0_0_replica_s7606' using 
> configuration from configset crm, trusted=true
>    node_name: REDACTED_HOSTNAME:8983_solr
>    replica: core_node7607
>    shard: shard1_0_0_0_1_0_0_0_1_1_0_1_0_0
>    threadId: 4861
>    *threadName: 
> parallelCoreAdminExecutor-19-thread-6-processing-n:REDACTED_HOSTNAME:8983_solr
>  x:collectionName_shard1_0_0_0_1_0_0_0_1_1_0_1_0_0_replica_s7606 
> t:Shared-932ac44c-06a4-44d3-ba8f-5a64c8f8d708 
> rebalance_replicas_trigger//41b16883a3bcTdzc5xqm9leudgekft79dpj5zl//372246269149506
>  CREATE*
>    timestamp: 2021-02-03T16:26:56.482Z
>    trace_id: Shared-932ac44c-06a4-44d3-ba8f-5a64c8f8d708
> }
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15038) Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to elevation functionality

2021-03-08 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297840#comment-17297840
 ] 

David Smiley commented on SOLR-15038:
-

FYI I filed SOLR-15222 to have Solr *stop* auto-creating "userfiles".

> Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to 
> elevation functionality
> 
>
> Key: SOLR-15038
> URL: https://issues.apache.org/jira/browse/SOLR-15038
> Project: Solr
>  Issue Type: Improvement
>  Components: query
>Reporter: Tobias Kässmann
>Priority: Minor
> Fix For: 8.9
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We've worked a lot with Query Elevation component in the last time and we 
> were missing two features:
>  * Elevate only documents that are part of the search result
>  * In combination with collapsing: Only show the representative if the 
> elevated documents does have the same collapse field value.
> Because of this, we've added these two feature toggles 
> _elevateDocsWithoutMatchingQ_ and _onlyElevatedRepresentative._
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Deleted] (SOLR-15228) Single host in a bad state can block collection creation for the cluster with autoscaling enabled

2021-03-08 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley deleted SOLR-15228:



> Single host in a bad state can block collection creation for the cluster with 
> autoscaling enabled
> -
>
> Key: SOLR-15228
> URL: https://issues.apache.org/jira/browse/SOLR-15228
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andy Throgmorton
>Priority: Minor
>
> We configured a SolrCloud cluster (running 8.2) with this cluster autoscaling 
> policy:
> {noformat}
> {
>   "set-cluster-preferences":[
> {
>   "minimize":"cores",
>   "precision":5
> },
> {
>   "maximize":"freedisk",
>   "precision":25
> },
> {
>   "minimize":"sysLoadAvg",
>   "precision":10
> }],
>   "set-cluster-policy":[
> {
>   "replica": "<2",
>   "node": "#ANY"
> }],
>   "set-trigger": {
> "name":".auto_add_replicas",
> "event":"nodeLost",
> "waitFor":"10m",
> "enabled":true,
> "actions":[
>   {
> "name":"auto_add_replicas_plan",
> "class":"solr.AutoAddReplicasPlanAction"},
>   {
> "name":"execute_plan",
> "class":"solr.ExecutePlanAction"}]
>   }
> }{noformat}
> A node was rebooted at one point, and when that node came back, it had 
> trouble establishing a connection with ZK when it was initializing the 
> CoreContainer. As a result, it returns 404s for (I think?) all admin requests.
> Now, any call to create a collection in that cluster throw an error, with 
> this stacktrace:
> {noformat}
> 2021-03-04 12:47:03.615 ERROR 
> (OverseerThreadFactory-141-thread-4-processing-n:HOST_REDACTED:8983_solr) [   
> ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: COLLECTON_REDACTED 
> operation: create failed:org.apache.solr.common.SolrException: Error getting 
> replica locations : unable to get autoscaling policy session
> at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:195)
> at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
> at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
> at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.solr.common.SolrException: unable to get autoscaling 
> policy session
> at 
> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getReplicaLocations(PolicyHelper.java:129)
> at 
> org.apache.solr.cloud.api.collections.Assign.getPositionsUsingPolicy(Assign.java:382)
> at 
> org.apache.solr.cloud.api.collections.Assign$PolicyBasedAssignStrategy.assign(Assign.java:630)
> at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:410)
> at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:190)
> ... 6 more
> Caused by: org.apache.solr.common.SolrException: 
> org.apache.solr.common.SolrException: Error getting remote info
> at 
> org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:78)
> at 
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(SolrClientNodeStateProvider.java:139)
> at 
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(SolrClientNodeStateProvider.java:128)
> at org.apache.solr.client.solrj.cloud.autoscaling.Row.(Row.java:71)
> at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.(Policy.java:575)
> at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:396)
> at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:358)
> at 
> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.createSession(PolicyHelper.java:492)
> at 
> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.get(PolicyHelper.java:457)
> at 
> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSession(PolicyHelper.java:513)
> at 
> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getReplicaLocations(PolicyHelper.java:127)
> ... 10 more
> Caused by: org.apache.solr.common.SolrException: Error getting remote info
> at 
>

[jira] [Resolved] (SOLR-2852) SolrJ doesn't need woodstox jar

2021-03-08 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-2852.

Fix Version/s: master (9.0)
   Resolution: Fixed

> SolrJ doesn't need woodstox jar
> ---
>
> Key: SOLR-2852
> URL: https://issues.apache.org/jira/browse/SOLR-2852
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The /dist/solrj-lib/ directory contains wstx-asl-3.2.7.jar (Woodstox StAX 
> API).  SolrJ doesn't actually have any type of dependency on this library. 
> The maven build doesn't have it as a dependency and the tests pass.  Perhaps 
> Woodstox is faster than the JDK's StAX, I don't know, but I find that point 
> quite moot since SolrJ can use the efficient binary format.  Woodstox is not 
> a small library either, weighting in at 524KB, and of course if someone 
> actually wants to use it, they can.
> I propose woodstox be removed as a SolrJ dependency.  I am *not* proposing it 
> be removed as a Solr WAR dependency since it is actually required there due 
> to an obscure XSLT issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15225) standardise test class naming

2021-03-08 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297627#comment-17297627
 ] 

David Smiley commented on SOLR-15225:
-

It appears [~markrmil...@gmail.com] is finally/actually working on the 
review-ability of the reference branch.  At least, I've been chatting with him 
on Slack lately on this subject.  Upon success of that (even partial success if 
we want bits/pieces), it will be a nightmare if tests are renamed.  Let's give 
this some more time, please?  We can at least bikeshed what the standard names 
should be :)  

> standardise test class naming
> -
>
> Key: SOLR-15225
> URL: https://issues.apache.org/jira/browse/SOLR-15225
> Project: Solr
>  Issue Type: Test
>Reporter: Christine Poerschke
>Priority: Major
>
> LUCENE-8626 started out as standardisation effort for both Lucene and Solr 
> test.
> The standardisation for Lucene tests ({{org.apache.lucene}} package space) is 
> now complete and enforced.
> This SOLR ticket here is for the standardisation of Solr test class names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13434) OpenTracing support for Solr

2021-03-08 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297362#comment-17297362
 ] 

David Smiley commented on SOLR-13434:
-

Why is GlobalTracer.get().close() called from SolrDispatchFilter.close instead 
of CoreContainer.shutdown()?  After all, CC _creates_ the tracer so it is most 
appropriate that it manage the life-cycle.

> OpenTracing support for Solr
> 
>
> Key: SOLR-13434
> URL: https://issues.apache.org/jira/browse/SOLR-13434
> Project: Solr
>  Issue Type: New Feature
>Reporter: Shalin Shekhar Mangar
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 8.2, master (9.0)
>
> Attachments: SOLR-13434.patch
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> [OpenTracing|https://opentracing.io/] is a vendor neutral API and 
> infrastructure for distributed tracing. Many OSS tracers just as Jaeger, 
> OpenZipkin, Apache SkyWalking as well as commercial tools support OpenTracing 
> APIs. Ideally, we can implement it once and have integrations for popular 
> tracers like we have with metrics and prometheus.
> I'm aware of SOLR-9641 but HTrace has since retired from incubator for lack 
> of activity so this is a fresh attempt at solving this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15215) SolrJ: Remove needless Netty dependency

2021-03-07 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296970#comment-17296970
 ] 

David Smiley commented on SOLR-15215:
-

bq. would like to have the ZK netty option still available

To be clear, it will still be *available*.  It's a opt-in vs opt-out matter.  
ZK made the choice of having to opt-in but it made the dependency a normal 
dependency (you get it even if you don't opt-in).  In SolrJ, the proposal I 
make here is that clients wanting to choose Netty need to add it to their 
classpath themselves, in addition to set the system property that ZK uses to 
opt-in and/or configure SSL.

I would prefer that the base SolrJ dependency not have ZooKeeper either; we'd 
have another "solrj-zk" which would include ZK and maybe/maybe-not Netty.
And a "solrj-expressions" to hold the streaming expressions code + commons-math 
dependency, which are non-trivial.  Until any of this happens, we still only 
have one "solrj" which has too many dependencies.

> SolrJ: Remove needless Netty dependency
> ---
>
> Key: SOLR-15215
> URL: https://issues.apache.org/jira/browse/SOLR-15215
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SolrJ depends on Netty transitively via ZooKeeper.  But ZooKeeper's Netty 
> dependency should be considered optional -- you have to opt-in.
> BTW it's only needed in Solr-core because of Hadoop/HDFS which ought to move 
> to a contrib and take this dependency with it over there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14166) Use TwoPhaseIterator for non-cached filter queries

2021-03-07 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296964#comment-17296964
 ] 

David Smiley commented on SOLR-14166:
-

CC [~yonik] [~jbernste] [~hossman] as possible reviewers for this attached PR 
which is rather technical into code which few people have touched but you all 
three in some shape/form.  Please review the issue description, and take a look 
at the PR.  In the PR, each commit is well isolated to the what the commit 
message says, so you may prefer to go commit-by-commit, or you could just look 
at the thing as a whole.  In a comment above I pondered "Maybe we could make a 
wrapping query that wraps the underlying TPI.matchCost"; as you'll see in the 
PR, I did that.  The test works in validating that match() isn't called more 
than it needs to be.  It used to be called more which is verifiable by copying 
the test to the 8x line (if I recall, it was called two additional times).  I 
suspect the test doesn't test that MatchCostQuery is having an effect... I may 
need to think a bit more on how to do that.

I suspect someone will ask me if I did some performance tests.  No I did not.  
My goal is removal of tech debt -- Filter, and in the process expect some 
performance improvements that Filter was blocking.  So in this issue, anyone 
with non-cached filter queries may see a benefit, especially when those queries 
have TwoPhaseIterators (phrase queries, frange, spatial, more).  The benefit 
may be further pronounced if the main query also has TPIs because Lucene 
cleverly sees through the boolean queries to group the TPIs of required clauses 
in the tree.

> Use TwoPhaseIterator for non-cached filter queries
> --
>
> Key: SOLR-14166
> URL: https://issues.apache.org/jira/browse/SOLR-14166
> Project: Solr
>  Issue Type: Sub-task
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> "fq" filter queries that have cache=false and which aren't processed as a 
> PostFilter (thus either aren't a PostFilter or have a cost < 100) are 
> processed in SolrIndexSearcher using a custom Filter thingy which uses a 
> cost-ordered series of DocIdSetIterators.  This is not TwoPhaseIterator 
> aware, and thus the match() method may be called on docs that ideally would 
> have been filtered by lower-cost filter queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15045) 2x latency of synchronous commits due to serial execution on local and distributed leaders

2021-03-06 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-15045:

Summary: 2x latency of synchronous commits due to serial execution on local 
and distributed leaders  (was: Commit through curl command is causing delay in 
issuing commit)

> 2x latency of synchronous commits due to serial execution on local and 
> distributed leaders
> --
>
> Key: SOLR-15045
> URL: https://issues.apache.org/jira/browse/SOLR-15045
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.5.2
> Environment: Operating system: Linux (centos 7.7.1908)
>Reporter: Raj Yadav
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi All,
> When we issue commit through curl command, not all the shards are getting 
> `start commit` requests at the same time.
> *Solr Setup Detail : (Running in solrCloud mode)*
>  It has 6 shards, and each shard has only one replica (which is also a
>  leader) and the replica type is NRT.
>  Each shards are hosted on the separate physical host.
> Zookeeper => We are using external zookeeper ensemble (3 separate node
>  cluster)
> *Shard and Host name*
>  shard1_0=>solr_199
>  shard1_1=>solr_200
>  shard2_0=> solr_254
>  shard2_1=> solr_132
>  shard3_0=>solr_133
>  shard3_1=>solr_198
> *Request rate on the system is currently zero and only hourly indexing*
>  *running on it.*
> We are using curl command to issue commit.
> {code:java}
> curl
> "http://solr_254:8389/solr/my_collection/update?openSearcher=true=true=json"{code}
> (Using solr_254 host to issue commit)
> On using the above command all the shards have started processing commit (i.e
>  getting `start commit` request) except the one used in curl command (i.e
>  shard2_0 which is hosted on solr_254). Individually each shards takes around
>  10 to 12 min to process hard commit (most of this time is spent on reloading
>  external files).
>  As per logs, shard2_0 is getting `start commit` request after 10 minutes
>  (approx). This leads to following timeout error.
> {code:java}
> 2020-12-06 18:47:47.013 ERROR
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://solr_132:9744/solr/my_collection_shard2_1_replica_n21/update?update.distrib=TOLEADER=http%3A%2F%2Fsolr_254%3A9744%2Fsolr%2Fmy_collection_shard2_0_replica_n11%2F
>       at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407)
>       at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753)
>       at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369)
>       at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
>       at
> org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344)
>       at
> org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
>       at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
>     Caused by: java.util.concurrent.TimeoutException
>       at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
>       at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398)
>       ... 13 more{code}
> Above timeout error is between solr_254 and solr_132. Similar errors are
>  there between solr_254 and other 4 shards
> Since query load is zero, mostly CPU utilization is around 3%.
>  After issuing curl commit command, CPU goes up to 14% on all shards except
>  shard2_0 (host: solr_254, the one used in curl command).
>  And after 10 minutes (i.e after getting the `start commit` request)  CPU  on
>  shard2_0 also goes up to 14%.
> As I mentioned earlier each shards take around 10-12 mins to process commit
>  and due to delay in starting commit process on one shard (shard2_0) our
>  overall commit time is doubled now. (22-24 minutes approx).
> *We are observing this delay in both hard and soft commit.*
> In our solr-5.4.0(having similar setup), we use

[jira] [Updated] (SOLR-15223) Deprecate HttpSolrClient, mark httpcomponents dep as "optional" in SolrJ

2021-03-06 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-15223:

Priority: Blocker  (was: Major)

> Deprecate HttpSolrClient, mark httpcomponents dep as "optional" in SolrJ
> 
>
> Key: SOLR-15223
> URL: https://issues.apache.org/jira/browse/SOLR-15223
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Reporter: David Smiley
>Priority: Blocker
> Fix For: master (9.0)
>
>
> Solr has had an HTTP/2 based SolrClient since 8.0.  Maintaining both HTTP/1 
> and HTTP/2 clients is a pain for maintenance of the project as it sometimes 
> means duplicative (or partially implemented) work, especially for 
> authentication but also sometimes metrics or tracing.  Both adds extra 
> dependencies for SolrJ and thus our users.  It's difficult to grok a codebase 
> using two different HTTP client frameworks.
> In this issue, mark HttpSolrClient (and related ones) as deprecated; point to 
> HTTP/2 equivalents.  Furthermore, mark the Apache "httpcomponents" libs as 
> "optional" in the produced Maven pom.xml so that users have to explicitly 
> opt-in to use it.  Announce this in the Solr users list as well.
> Out of scope to this issue is completely cutting over within Solr itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-9027) Add GraphTermsQuery to limit traversal on high frequency nodes

2021-03-06 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296594#comment-17296594
 ] 

David Smiley commented on SOLR-9027:


I was looking back at {{GraphTermsQuery}} today during the course analyzing 
usages of non-cached filters.  It occurred to me that we didn't need yet 
another query parser to name.  To the user, this is semantically equivalent as 
the existing {{terms}} query parser with the added feature of an optional 
{{maxDocFreq}}.  We still need the code that was written here, however.  Also, 
the name "graph" in this QP seems a bit strange.  Yes, it's useful for higher 
level graph operations but that doesn't mean the query itself (low level) is 
doing any graph traversal -- it isn't.

> Add GraphTermsQuery to limit traversal on high frequency nodes
> --
>
> Key: SOLR-9027
> URL: https://issues.apache.org/jira/browse/SOLR-9027
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 6.1
>
> Attachments: SOLR-9027.patch, SOLR-9027.patch, SOLR-9027.patch, 
> SOLR-9027.patch
>
>
> The gatherNodes() Streaming Expression is currently using a basic disjunction 
> query to perform the traversals. This ticket is to create a specific 
> GraphTermsQuery for performing the traversals. 
> The GraphTermsQuery will be based off of the TermsQuery, but will also 
> include an option for a docFreq cutoff. Terms that are above the docFreq 
> cutoff will not be included in the query. This will help users do a more 
> precise and efficient traversal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15223) Deprecate HttpSolrClient, mark httpcomponents dep as "optional" in SolrJ

2021-03-05 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-15223:

Fix Version/s: master (9.0)
  Summary: Deprecate HttpSolrClient, mark httpcomponents dep as 
"optional" in SolrJ  (was: Deprecate HttpSolrClient, mark httpclient as 
"optional" in SolrJ)

There might be some sneaky references to the httpcomponents dependencies in 
SolrJ (those that are not tightly linked to HttpSolrClient); remember to look.

> Deprecate HttpSolrClient, mark httpcomponents dep as "optional" in SolrJ
> 
>
> Key: SOLR-15223
> URL: https://issues.apache.org/jira/browse/SOLR-15223
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Reporter: David Smiley
>Priority: Major
> Fix For: master (9.0)
>
>
> Solr has had an HTTP/2 based SolrClient since 8.0.  Maintaining both HTTP/1 
> and HTTP/2 clients is a pain for maintenance of the project as it sometimes 
> means duplicative (or partially implemented) work, especially for 
> authentication but also sometimes metrics or tracing.  Both adds extra 
> dependencies for SolrJ and thus our users.  It's difficult to grok a codebase 
> using two different HTTP client frameworks.
> In this issue, mark HttpSolrClient (and related ones) as deprecated; point to 
> HTTP/2 equivalents.  Furthermore, mark the Apache "httpcomponents" libs as 
> "optional" in the produced Maven pom.xml so that users have to explicitly 
> opt-in to use it.  Announce this in the Solr users list as well.
> Out of scope to this issue is completely cutting over within Solr itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15223) Deprecate HttpSolrClient, mark httpclient as "optional" in SolrJ

2021-03-05 Thread David Smiley (Jira)

David Smiley created SOLR-15223:
---

 Summary: Deprecate HttpSolrClient, mark httpclient as "optional" 
in SolrJ
 Key: SOLR-15223
 URL: https://issues.apache.org/jira/browse/SOLR-15223
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrJ
Reporter: David Smiley


Solr has had an HTTP/2 based SolrClient since 8.0.  Maintaining both HTTP/1 and 
HTTP/2 clients is a pain for maintenance of the project as it sometimes means 
duplicative (or partially implemented) work, especially for authentication but 
also sometimes metrics or tracing.  Both adds extra dependencies for SolrJ and 
thus our users.  It's difficult to grok a codebase using two different HTTP 
client frameworks.

In this issue, mark HttpSolrClient (and related ones) as deprecated; point to 
HTTP/2 equivalents.  Furthermore, mark the Apache "httpcomponents" libs as 
"optional" in the produced Maven pom.xml so that users have to explicitly 
opt-in to use it.  Announce this in the Solr users list as well.

Out of scope to this issue is completely cutting over within Solr itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15222) Solr should not auto-create the "userfiles" dir

2021-03-05 Thread David Smiley (Jira)

David Smiley created SOLR-15222:
---

 Summary: Solr should not auto-create the "userfiles" dir
 Key: SOLR-15222
 URL: https://issues.apache.org/jira/browse/SOLR-15222
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: David Smiley


The "userfiles" feature is relatively obscure and might be subsumed by the 
"file store".  I don't think an obscure feature should be auto-creating its 
"userfiles" directory.  Even a popular one; not sure it makes sense.  If a user 
wants to use this feature, they are welcome to create the directory.  Solr has 
other optional directories, like solr-home/lib that are not auto-created; it's 
not clear to me why this one is.  I've found the auto-creation of this dir to 
be annoying in two ways.  One is in Solr's tests – there are existing Jira 
issues that show stack traces about this even though it's ignored.  Secondly is 
as a down-stream consumer for running/building Solr plugins that have a Solr 
home dir pointing somewhere that suddenly has this userfiles dir popping up 
despite me having no plans to use it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-15215) SolrJ: Remove needless Netty dependency

2021-03-05 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned SOLR-15215:
---

Assignee: (was: David Smiley)

While I think that'd be a fine thing, we don't have that today and I don't view 
it as a prerequisite to slimming down SolrJ's dependencies to represent what I 
think the majority of people use.  I prefer the essence the PR here with 
additional javadocs around when Netty is needed, maybe ref-guide too somewhere, 
and a comment on the Netty dependency in solr-core to mention it's for 
Zookeeper SSL.

> SolrJ: Remove needless Netty dependency
> ---
>
> Key: SOLR-15215
> URL: https://issues.apache.org/jira/browse/SOLR-15215
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SolrJ depends on Netty transitively via ZooKeeper.  But ZooKeeper's Netty 
> dependency should be considered optional -- you have to opt-in.
> BTW it's only needed in Solr-core because of Hadoop/HDFS which ought to move 
> to a contrib and take this dependency with it over there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-15219) TestPointFields.testIntPointFieldMultiValuedRangeFacet fails for seed

2021-03-05 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-15219.
-
Fix Version/s: 8.9
   Resolution: Fixed

> TestPointFields.testIntPointFieldMultiValuedRangeFacet fails for seed
> -
>
> Key: SOLR-15219
> URL: https://issues.apache.org/jira/browse/SOLR-15219
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.9
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This reproduces:
>  {{gradlew :solr:core:test --tests 
> "org.apache.solr.schema.TestPointFields.testIntPointFieldMultiValuedRangeFacet"
>  -Ptests.jvms=6 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 
> -Ptests.seed=8E7691162850731 -Ptests.file.encoding=ISO-8859-1}}
>  Line 391.
> From my build emails, this test has failed twice last year.  I checked if 
> it's related to Mike Drob's last commit RE a static analysis discovered 
> refactoring, and it isn't.
> The facet range is:
>  {{facet.range.start=-1899777513=2145600248}}
>  That is really quite a wide range. Both the start and end value are 
> integers, but subtracting the two overflows an integer.  This suggests an 
> integer overflow error somewhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14660) Migrating HDFS into a package

2021-03-05 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296274#comment-17296274
 ] 

David Smiley commented on SOLR-14660:
-

Correction: Netty will stay in Solr-core for the foreseeable future as it's 
needed for SSL to ZooKeeper.

> Migrating HDFS into a package
> -
>
> Key: SOLR-14660
> URL: https://issues.apache.org/jira/browse/SOLR-14660
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Priority: Major
>  Labels: package, packagemanager
>
> Following up on the deprecation of HDFS (SOLR-14021), we need to work on 
> isolating it away from Solr core and making a package for this. This issue is 
> to track the efforts for that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-2852) SolrJ doesn't need woodstox jar

2021-03-05 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296269#comment-17296269
 ] 

David Smiley commented on SOLR-2852:


It does *not* mean that.  The JDK has had an XML parser since JDK 1.4 back in 
2002 – [https://javaalmanac.io/jdk/1.4/]

> SolrJ doesn't need woodstox jar
> ---
>
> Key: SOLR-2852
> URL: https://issues.apache.org/jira/browse/SOLR-2852
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The /dist/solrj-lib/ directory contains wstx-asl-3.2.7.jar (Woodstox StAX 
> API).  SolrJ doesn't actually have any type of dependency on this library. 
> The maven build doesn't have it as a dependency and the tests pass.  Perhaps 
> Woodstox is faster than the JDK's StAX, I don't know, but I find that point 
> quite moot since SolrJ can use the efficient binary format.  Woodstox is not 
> a small library either, weighting in at 524KB, and of course if someone 
> actually wants to use it, they can.
> I propose woodstox be removed as a SolrJ dependency.  I am *not* proposing it 
> be removed as a Solr WAR dependency since it is actually required there due 
> to an obscure XSLT issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15215) SolrJ: Remove needless Netty dependency

2021-03-05 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296241#comment-17296241
 ] 

David Smiley commented on SOLR-15215:
-

See [~janhoy] comment on the PR -- 
https://github.com/apache/lucene-solr/pull/2458#issuecomment-791505707
It was more worthy of greater visibility here in JIRA than code details there.  
Basically; Netty is needed if ZK is accessed via SSL.  It was added in 
SOLR-13502

I think that we should not include it in SolrJ nonetheless.  It may not be 
stated somewhere but I believe our strategic direction is for clients to not 
talk directly to ZooKeeper.  We have Http2ClusterStateProvider for 
CloudSolrClient to use to talk to Solr to talk to ZK.  It's not perfect (there 
are some perf bugs I've seen) but that's the direction.  Not including Netty 
doesn't stop someone from adding it themselves.  We could add some Javadoc 
comments in SolrJ CloudSolrClientBuilder on the ZK coordinates to mention the 
need for Netty when doing SSL.  Not everyone uses SSL or perhaps some users do 
it differently like via [Istio service 
mesh|https://istio.io/latest/docs/concepts/security/].

> SolrJ: Remove needless Netty dependency
> ---
>
> Key: SOLR-15215
> URL: https://issues.apache.org/jira/browse/SOLR-15215
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SolrJ depends on Netty transitively via ZooKeeper.  But ZooKeeper's Netty 
> dependency should be considered optional -- you have to opt-in.
> BTW it's only needed in Solr-core because of Hadoop/HDFS which ought to move 
> to a contrib and take this dependency with it over there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15219) TestPointFields.testIntPointFieldMultiValuedRangeFacet fails for seed

2021-03-05 Thread David Smiley (Jira)

David Smiley created SOLR-15219:
---

 Summary: TestPointFields.testIntPointFieldMultiValuedRangeFacet 
fails for seed
 Key: SOLR-15219
 URL: https://issues.apache.org/jira/browse/SOLR-15219
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: David Smiley
Assignee: David Smiley


This reproduces:
 {{gradlew :solr:core:test --tests 
"org.apache.solr.schema.TestPointFields.testIntPointFieldMultiValuedRangeFacet" 
-Ptests.jvms=6 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 
-Ptests.seed=8E7691162850731 -Ptests.file.encoding=ISO-8859-1}}
 Line 391.

>From my build emails, this test has failed twice last year.  I checked if it's 
>related to Mike Drob's last commit RE a static analysis discovered 
>refactoring, and it isn't.

The facet range is:
 {{facet.range.start=-1899777513=2145600248}}
 That is really quite a wide range. Both the start and end value are integers, 
but subtracting the two overflows an integer.  This suggests an integer 
overflow error somewhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-15185) Improve "hash" QParser

2021-03-04 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-15185.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> Improve "hash" QParser
> --
>
> Key: SOLR-15185
> URL: https://issues.apache.org/jira/browse/SOLR-15185
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> * Don't use Filter (to be removed)
> * Do use TwoPhaseIterator, not PostFilter
> * Don't pre-compute matching docs (wasteful)
> * Support more fields, and more field types
> * Faster hash on Strings (avoid Char conversion)
> * Stronger hash when using multiple fields



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14660) Migrating HDFS into a package

2021-03-04 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295758#comment-17295758
 ] 

David Smiley commented on SOLR-14660:
-

When the build.gradle is created for this contrib, please try to undo the 
dependency flattening that's in most modules only because it was ported from 
Ant.  Basically SOLR-14929 but just scoped to this new contrib.  This means 
removing all/most of the {{transitive=false}} and runtime dependencies, then 
figure out which deps are added unnecessarily so they can be excluded.  So for 
example, I'm seeing that Netty will be transitively included, and thus won't 
need a mention in build.gradle.

BTW in SOLR-15215 I'm removing Netty from SolrJ and in so doing I had to 
explicitly reference Netty in solr-core's build because it will no longer come 
in automatically via SolrJ.  But ideally, hadoop deps would not say 
transitive=false so I wouldn't of had to do this.

Woodstox is a dependency of hadoop that solr-core will continue to provide for 
awhile.  Jackson -- same.

There's a help/dependencies.txt file that is helpful.

> Migrating HDFS into a package
> -
>
> Key: SOLR-14660
> URL: https://issues.apache.org/jira/browse/SOLR-14660
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Priority: Major
>  Labels: package, packagemanager
>
> Following up on the deprecation of HDFS (SOLR-14021), we need to work on 
> isolating it away from Solr core and making a package for this. This issue is 
> to track the efforts for that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15217) rename shardsWhitelist and use it more broadly

2021-03-04 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295720#comment-17295720
 ] 

David Smiley commented on SOLR-15217:
-

Maybe use it as an alternative for "allowSolrUrls" in 
CrossCollectionJoinQParser, maybe in ReplicationHandler.
I suppose it's current location is not bad but at the top level (outside of 
{{}} would be better.  I'm doubtful it's worth moving it 
though.  I don't love that "shards" is in its name... I'd even prefer using the 
name chosen by CrossCollectionJoinQParser: allowSolrUrls.

> rename shardsWhitelist and use it more broadly
> --
>
> Key: SOLR-15217
> URL: https://issues.apache.org/jira/browse/SOLR-15217
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Priority: Major
>
> The {{shardsWhitelist}} is defined on shardHandlerFactory element in 
> solr.xml.  We should rename it so something like "shardsAllowList".  And we 
> could use it in more places.
> https://solr.apache.org/guide/8_7/distributed-requests.html#configuring-the-shardhandlerfactory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15217) rename shardsWhitelist and use it more broadly

2021-03-04 Thread David Smiley (Jira)

David Smiley created SOLR-15217:
---

 Summary: rename shardsWhitelist and use it more broadly
 Key: SOLR-15217
 URL: https://issues.apache.org/jira/browse/SOLR-15217
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: David Smiley


The {{shardsWhitelist}} is defined on shardHandlerFactory element in solr.xml.  
We should rename it so something like "shardsAllowList".  And we could use it 
in more places.

https://solr.apache.org/guide/8_7/distributed-requests.html#configuring-the-shardhandlerfactory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-2852) SolrJ doesn't need woodstox jar

2021-03-04 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295620#comment-17295620
 ] 

David Smiley commented on SOLR-2852:


Time to do this for 9.0 in at least SolrJ?

> SolrJ doesn't need woodstox jar
> ---
>
> Key: SOLR-2852
> URL: https://issues.apache.org/jira/browse/SOLR-2852
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>
> The /dist/solrj-lib/ directory contains wstx-asl-3.2.7.jar (Woodstox StAX 
> API).  SolrJ doesn't actually have any type of dependency on this library. 
> The maven build doesn't have it as a dependency and the tests pass.  Perhaps 
> Woodstox is faster than the JDK's StAX, I don't know, but I find that point 
> quite moot since SolrJ can use the efficient binary format.  Woodstox is not 
> a small library either, weighting in at 524KB, and of course if someone 
> actually wants to use it, they can.
> I propose woodstox be removed as a SolrJ dependency.  I am *not* proposing it 
> be removed as a Solr WAR dependency since it is actually required there due 
> to an obscure XSLT issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15215) SolrJ: Remove needless Netty dependency

2021-03-04 Thread David Smiley (Jira)

David Smiley created SOLR-15215:
---

 Summary: SolrJ: Remove needless Netty dependency
 Key: SOLR-15215
 URL: https://issues.apache.org/jira/browse/SOLR-15215
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrJ
Reporter: David Smiley
Assignee: David Smiley


SolrJ depends on Netty transitively via ZooKeeper.  But ZooKeeper's Netty 
dependency should be considered optional -- you have to opt-in.

BTW it's only needed in Solr-core because of Hadoop/HDFS which ought to move to 
a contrib and take this dependency with it over there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-15191) Faceting on EnumFieldType does not work if allBuckets, numBuckets or missing is set

2021-03-04 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-15191.
-
Resolution: Fixed

> Faceting on EnumFieldType does not work if allBuckets, numBuckets or missing 
> is set
> ---
>
> Key: SOLR-15191
> URL: https://issues.apache.org/jira/browse/SOLR-15191
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module, FacetComponent, faceting, search, 
> streaming expressions
>Affects Versions: 8.7, 8.8, 8.8.1
>Reporter: Thomas Wöckinger
>Assignee: David Smiley
>Priority: Major
>  Labels: easy-fix, pull-request-available
> Fix For: 8.9
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Due to Solr-14514 FacetFieldProcessorByEnumTermsStream is not used if 
> allBuckets, numBuckets or missing parma is true.
> As fallback FacetFieldProcessorByHashDV is used which 
> FacetRangeProcessor.getNumericCalc(sf) on the field. EnumFileType is not 
> handled currently, so a SolrException is thrown with BAD_REQUEST and 
> 'Expected numeric field type'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15185) Improve "hash" QParser

2021-03-04 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295490#comment-17295490
 ] 

David Smiley commented on SOLR-15185:
-

Okay.  Given that most users of this are indirect users via streaming 
expressions (I presume), can you recommend how I might say that... i.e. _what_ 
part/expression is affected here?  Such users would not even know this 
optimization affects them otherwise.

> Improve "hash" QParser
> --
>
> Key: SOLR-15185
> URL: https://issues.apache.org/jira/browse/SOLR-15185
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> * Don't use Filter (to be removed)
> * Do use TwoPhaseIterator, not PostFilter
> * Don't pre-compute matching docs (wasteful)
> * Support more fields, and more field types
> * Faster hash on Strings (avoid Char conversion)
> * Stronger hash when using multiple fields



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15185) Improve "hash" QParser

2021-03-04 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295407#comment-17295407
 ] 

David Smiley commented on SOLR-15185:
-

[~jbernste] can you please recommend CHANGES.txt and/or ref guide upgrade notes 
pertaining to the hash changing?  Or maybe don't bother if nobody would care?  
RE the perf change; I'll just be vague to say it's more efficient.

> Improve "hash" QParser
> --
>
> Key: SOLR-15185
> URL: https://issues.apache.org/jira/browse/SOLR-15185
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> * Don't use Filter (to be removed)
> * Do use TwoPhaseIterator, not PostFilter
> * Don't pre-compute matching docs (wasteful)
> * Support more fields, and more field types
> * Faster hash on Strings (avoid Char conversion)
> * Stronger hash when using multiple fields



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14663) ConfigSets CREATE does not set trusted flag

2021-03-03 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17294996#comment-17294996
 ] 

David Smiley commented on SOLR-14663:
-

Sorry for the misidentification; thanks for re-routing.

> ConfigSets CREATE does not set trusted flag
> ---
>
> Key: SOLR-14663
> URL: https://issues.apache.org/jira/browse/SOLR-14663
> Project: Solr
>  Issue Type: Task
>Reporter: Andras Salamon
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Fix For: 8.6.3, 8.7, master (9.0)
>
> Attachments: SOLR-14663.patch, SOLR-14663.patch
>
>
> If I upload a configset using [ConfigSets 
> API|https://lucene.apache.org/solr/guide/8_6/configsets-api.html] UPLOAD Solr 
> sets the trusted flag. The config set will be trusted if authentication is 
> enabled and the upload operation is performed as an authenticated request.
> On the other hand if I use the ConfigSets API CREATE which creates a new 
> configset based on an already uploaded one, this flag will not be set, so the 
> configset will be effectively untrusted.
> I don't really understand the difference here, I think CREATE API call should 
> set this flag just like UPLOAD sets it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15031) NPE caused by FunctionQParser returning a null ValueSource

2021-03-03 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17294985#comment-17294985
 ] 

David Smiley commented on SOLR-15031:
-

Yeah; it'd be an upgrade headache to change the contract or create some other 
parseNotNull.  Ugh.
Alternatively -- I've been thinking Lucene/Solr would benefit from having a 
"Nullable" annotation in our codebase with IDE support and/or static analysis 
checkers.  It'd need a bit of exploration and a dev list post to draw attention 
to it.

> NPE caused by FunctionQParser returning a null ValueSource
> --
>
> Key: SOLR-15031
> URL: https://issues.apache.org/jira/browse/SOLR-15031
> Project: Solr
>  Issue Type: Bug
>Reporter: Pieter
>Assignee: Mike Drob
>Priority: Minor
> Fix For: 8.8, master (9.0)
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When parsing a sub query in a function query, 
> {{FunctionQParser#parseValueSource}} does not check if the produced query 
> object is null. When it is, it just wraps a null in a {{QueryValueSource}} 
> object. This is a cause for NPE's in code consuming that object. Parsed 
> queries can be null, for example when the query string only contains 
> stopwords, so we need handle that condition.
> h3. Steps to reproduce the issue
>  # Start solr with the techproducts example collection: {{solr start -e 
> techproducts}}
>  # Add a stopword to 
> SOLR_DIR/example/techproducts/solr/techproducts/conf/stopwords.txt, for 
> example "at"
>  # Reload the core
>  # Execute a function query:
> {code:java}
> http://localhost:8983/solr/techproducts/select?fieldquery={!field%20f=features%20v=%27%22at%22%27}={!func}%20if($fieldquery,1,0){code}
> The following stacktrace is produced:
> {code:java}
> 2020-12-03 13:35:38.868 INFO  (qtp2095677157-21) [   x:techproducts] 
> o.a.s.c.S.Request [techproducts]  webapp=/solr path=/select 
> params={q={!func}+if($fieldquery,1,0)={!field+f%3Dfeatures+v%3D'"at"'}}
>  status=500 QTime=34
> 2020-12-03 13:35:38.872 ERROR (qtp2095677157-21) [   x:techproducts] 
> o.a.s.s.HttpSolrCall null:java.lang.NullPointerException
> at 
> org.apache.lucene.queries.function.valuesource.QueryValueSource.hashCode(QueryValueSource.java:63)
> at 
> org.apache.lucene.queries.function.valuesource.IfFunction.hashCode(IfFunction.java:129)
> at 
> org.apache.lucene.queries.function.FunctionQuery.hashCode(FunctionQuery.java:176)
> at 
> org.apache.solr.search.QueryResultKey.(QueryResultKey.java:53)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1341)
> at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:580)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1222 matches

Mail list logo