[GitHub] [lucene] dweiss commented on pull request #634: LUCENE-10399: Handle large boolean expressions better in flexible query parser

2022-02-03 Thread GitBox
dweiss commented on pull request #634: URL: https://github.com/apache/lucene/pull/634#issuecomment-1029716363 Ping, @rmuir - WDYT? Should I merge this in or leave as a convenience patch in jira? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [lucene] jtibshirani opened a new pull request #645: Rename KnnGraphValues -> HnswGraph

2022-02-03 Thread GitBox
jtibshirani opened a new pull request #645: URL: https://github.com/apache/lucene/pull/645 This PR proposes some renames to clarify the code structure. The top-level `KnnGraphValues` is renamed to `HnswGraph`, since it now represents a hierarchical graph. It's also moved from

[GitHub] [lucene] mocobeta commented on pull request #643: LUCENE-10400: revise constructors to load dictionary resources in kuromoji

2022-02-03 Thread GitBox
mocobeta commented on pull request #643: URL: https://github.com/apache/lucene/pull/643#issuecomment-1029592938 @uschindler @msokolov Would you take a look at the revised constructors' API design when you have some time? If this gains basic consensus, I will add tests for the newly

[jira] [Updated] (LUCENE-10404) Use hash set for visited nodes in HNSW search?

2022-02-03 Thread Julie Tibshirani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani updated LUCENE-10404: -- Description: While searching each layer, HNSW tracks the nodes it has already

[jira] [Created] (LUCENE-10404) Use hash set for visited nodes in HNSW search?

2022-02-03 Thread Julie Tibshirani (Jira)
Julie Tibshirani created LUCENE-10404: - Summary: Use hash set for visited nodes in HNSW search? Key: LUCENE-10404 URL: https://issues.apache.org/jira/browse/LUCENE-10404 Project: Lucene - Core

[GitHub] [lucene] jtibshirani commented on pull request #641: LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls

2022-02-03 Thread GitBox
jtibshirani commented on pull request #641: URL: https://github.com/apache/lucene/pull/641#issuecomment-1029566793 I filed https://issues.apache.org/jira/browse/LUCENE-10404 so we don't forget about the hash set idea. -- This is an automated message from the Apache Git Service. To

[GitHub] [lucene] gautamworah96 commented on a change in pull request #632: LUCENE-10050 Remove DrillSideways#search(DrillDownQuery,Collector) in favor of DrillSideways#search(DrillDownQuery,Collector

2022-02-03 Thread GitBox
gautamworah96 commented on a change in pull request #632: URL: https://github.com/apache/lucene/pull/632#discussion_r799102705 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java ## @@ -285,53 +289,42 @@ public DrillSidewaysResult search( }

[GitHub] [lucene] gautamworah96 commented on a change in pull request #632: LUCENE-10050 Remove DrillSideways#search(DrillDownQuery,Collector) in favor of DrillSideways#search(DrillDownQuery,Collector

2022-02-03 Thread GitBox
gautamworah96 commented on a change in pull request #632: URL: https://github.com/apache/lucene/pull/632#discussion_r799102042 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java ## @@ -192,7 +192,11 @@ protected Facets buildFacetsResult( *

[GitHub] [lucene] gautamworah96 commented on a change in pull request #632: LUCENE-10050 Remove DrillSideways#search(DrillDownQuery,Collector) in favor of DrillSideways#search(DrillDownQuery,Collector

2022-02-03 Thread GitBox
gautamworah96 commented on a change in pull request #632: URL: https://github.com/apache/lucene/pull/632#discussion_r799091771 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java ## @@ -351,45 +344,36 @@ public DrillSidewaysResult search(ScoreDoc

[jira] [Commented] (LUCENE-10391) Reuse data structures across HnswGraph invocations

2022-02-03 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486759#comment-17486759 ] ASF subversion and git services commented on LUCENE-10391: -- Commit

[jira] [Commented] (LUCENE-10403) Add ArrayUtil#grow(T[])

2022-02-03 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486756#comment-17486756 ] Greg Miller commented on LUCENE-10403: -- For what it's worth, I recently narrowly avoided an

[jira] [Commented] (LUCENE-10391) Reuse data structures across HnswGraph invocations

2022-02-03 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486752#comment-17486752 ] ASF subversion and git services commented on LUCENE-10391: -- Commit

[GitHub] [lucene] jtibshirani merged pull request #641: LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls

2022-02-03 Thread GitBox
jtibshirani merged pull request #641: URL: https://github.com/apache/lucene/pull/641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [lucene] gsmiller opened a new pull request #644: LUCENE-10403: Add ArrayUtil#grow(T[])

2022-02-03 Thread GitBox
gsmiller opened a new pull request #644: URL: https://github.com/apache/lucene/pull/644 # Description Add utility method to `ArrayUtil` allowing the user to omit a min size for generically typed arrays, adding parity with capabilities for primitive type arrays. # Solution

[GitHub] [lucene] jtibshirani commented on a change in pull request #641: LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls

2022-02-03 Thread GitBox
jtibshirani commented on a change in pull request #641: URL: https://github.com/apache/lucene/pull/641#discussion_r799057820 ## File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java ## @@ -93,6 +96,11 @@ public HnswGraphBuilder( this.random =

[jira] [Created] (LUCENE-10403) Add ArrayUtil#grow(T[])

2022-02-03 Thread Greg Miller (Jira)
Greg Miller created LUCENE-10403: Summary: Add ArrayUtil#grow(T[]) Key: LUCENE-10403 URL: https://issues.apache.org/jira/browse/LUCENE-10403 Project: Lucene - Core Issue Type: Improvement

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #641: LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls

2022-02-03 Thread GitBox
mayya-sharipova commented on a change in pull request #641: URL: https://github.com/apache/lucene/pull/641#discussion_r799027275 ## File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java ## @@ -93,6 +96,11 @@ public HnswGraphBuilder(

[GitHub] [lucene] jtibshirani edited a comment on pull request #641: LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls

2022-02-03 Thread GitBox
jtibshirani edited a comment on pull request #641: URL: https://github.com/apache/lucene/pull/641#issuecomment-1029284791 Extracting `HnswGraphSearcher` is a lot nicer, I will push a refactor. On the topic of hash sets, I tried switching to `IntIntHashMap` on top of this PR and it

[GitHub] [lucene] jtibshirani commented on pull request #641: LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls

2022-02-03 Thread GitBox
jtibshirani commented on pull request #641: URL: https://github.com/apache/lucene/pull/641#issuecomment-1029284791 Extracting `HnswGraphSearcher` is a lot nicer, I pushed that refactor. On the topic of hash sets, I tried switching to `IntIntHashMap` on top of this PR and it gives a

[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-02-03 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486619#comment-17486619 ] Adrien Grand commented on LUCENE-8739: -- Robert disagreed with introducing a requirement on libzstd

[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2022-02-03 Thread Praveen Nishchal (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486614#comment-17486614 ] Praveen Nishchal commented on LUCENE-8739: -- As we observed earlier, Zstd is at par with

[GitHub] [lucene] dnhatn commented on pull request #640: LUCENE-10190: Ensure changes are visible before advancing seqno

2022-02-03 Thread GitBox
dnhatn commented on pull request #640: URL: https://github.com/apache/lucene/pull/640#issuecomment-1029191014 @jpountz Thanks for reviewing. I've pushed https://github.com/apache/lucene/pull/640/commits/85d294ee611a22e3d0fb80347fcf2bc244e2fa5d to simplify the fix. -- This is an

[jira] [Resolved] (LUCENE-10385) Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery.

2022-02-03 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10385. --- Fix Version/s: 9.1 Resolution: Fixed > Implement Weight#count on

[jira] [Commented] (LUCENE-10002) Remove IndexSearcher#search(Query,Collector) in favor of IndexSearcher#search(Query,CollectorManager)

2022-02-03 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486570#comment-17486570 ] ASF subversion and git services commented on LUCENE-10002: -- Commit

[jira] [Commented] (LUCENE-10385) Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery.

2022-02-03 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486571#comment-17486571 ] ASF subversion and git services commented on LUCENE-10385: -- Commit

[jira] [Commented] (LUCENE-10385) Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery.

2022-02-03 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486567#comment-17486567 ] ASF subversion and git services commented on LUCENE-10385: -- Commit

[GitHub] [lucene] jpountz merged pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

2022-02-03 Thread GitBox
jpountz merged pull request #635: URL: https://github.com/apache/lucene/pull/635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [lucene] jpountz merged pull request #639: LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests

2022-02-03 Thread GitBox
jpountz merged pull request #639: URL: https://github.com/apache/lucene/pull/639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (LUCENE-10002) Remove IndexSearcher#search(Query,Collector) in favor of IndexSearcher#search(Query,CollectorManager)

2022-02-03 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486566#comment-17486566 ] ASF subversion and git services commented on LUCENE-10002: -- Commit

[GitHub] [lucene] mocobeta commented on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
mocobeta commented on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1029151546 I opened a draft PR: https://github.com/apache/lucene/pull/643. Would you take a look at it? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [lucene] mocobeta opened a new pull request #643: LUCENE-10400: revise to load dictionary resources in kuromoji

2022-02-03 Thread GitBox
mocobeta opened a new pull request #643: URL: https://github.com/apache/lucene/pull/643 I drafted an idea to revise the dictionary constructors. Main changes: 1. always use Class.getResourceAsStream() to load class resources. (don't delegate it to the parent class) 2.

[jira] [Created] (LUCENE-10402) Intervals.prefix() handles multicharacter unicode incorrectly

2022-02-03 Thread Alan Woodward (Jira)
Alan Woodward created LUCENE-10402: -- Summary: Intervals.prefix() handles multicharacter unicode incorrectly Key: LUCENE-10402 URL: https://issues.apache.org/jira/browse/LUCENE-10402 Project: Lucene

[jira] [Created] (LUCENE-10401) Seeking on empty doc-value terms dictionaries fails with AIOOBE

2022-02-03 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-10401: - Summary: Seeking on empty doc-value terms dictionaries fails with AIOOBE Key: LUCENE-10401 URL: https://issues.apache.org/jira/browse/LUCENE-10401 Project: Lucene

[jira] [Updated] (LUCENE-10401) Seeking on empty doc-value terms dictionaries fails with AIOOBE

2022-02-03 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-10401: -- Issue Type: Bug (was: Task) > Seeking on empty doc-value terms dictionaries fails with

[GitHub] [lucene] jpountz commented on pull request #641: LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls

2022-02-03 Thread GitBox
jpountz commented on pull request #641: URL: https://github.com/apache/lucene/pull/641#issuecomment-1029048199 I've become a bit cautious with codec-level caching. E.g. we have threadlocals for stored fields which end up storing `num_search_threads * num_indices * num_segments_per_index`

[GitHub] [lucene] msokolov commented on pull request #641: LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls

2022-02-03 Thread GitBox
msokolov commented on pull request #641: URL: https://github.com/apache/lucene/pull/641#issuecomment-1029006052 Thanks for working on this, and it shows a nice result. I had been discussing with some folks at work and we were considering whether it would be possible to maintain the state

[GitHub] [lucene] msokolov commented on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
msokolov commented on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028994632 I think the only reason why we have the file loading option is to support unit testing the dictionary-loading components. Maybe there's a better way? -- This is an automated

[jira] [Updated] (LUCENE-10400) Clean up the constructors' API signature of dictionary classes in kuromoji and nori

2022-02-03 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10400: --- Description: It was suggested in a few issues/pr comments. * do not delegate to load

[jira] [Created] (LUCENE-10400) Clean up the constructors' API signature of dictionary classes in kuromoji and nori

2022-02-03 Thread Tomoko Uchida (Jira)
Tomoko Uchida created LUCENE-10400: -- Summary: Clean up the constructors' API signature of dictionary classes in kuromoji and nori Key: LUCENE-10400 URL: https://issues.apache.org/jira/browse/LUCENE-10400

[GitHub] [lucene] javanna commented on pull request #639: LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests

2022-02-03 Thread GitBox
javanna commented on pull request #639: URL: https://github.com/apache/lucene/pull/639#issuecomment-1028873856 This should be good to go now ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [lucene] uschindler commented on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
uschindler commented on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028862245 Thanks. This really needs motivation, one reason why I did not do it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [lucene] mocobeta closed pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
mocobeta closed pull request #638: URL: https://github.com/apache/lucene/pull/638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [lucene] mocobeta commented on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
mocobeta commented on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028858655 I will open an issue to clean up the constructors' signatures for the dictionaries if I have a chance. I might need some time and good motivation for the refactoring. -- This

[GitHub] [lucene] mocobeta edited a comment on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
mocobeta edited a comment on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028811056 Personally, I don't care about the API or two-args constructor to load external resources (the main interest/concern here) too, but I know there are people who need and

[GitHub] [lucene] mocobeta commented on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
mocobeta commented on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028811056 Personally, I don't care about the APIs (two-args constructors to load external resources) too, but I know there are people who need and add the feature to kuromoji and nori (I

[GitHub] [lucene] uschindler commented on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
uschindler commented on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028808508 I would really not move this dictionary code to commons module and add opens. This is as bad as having the (now deprecated) methods in IOUtils to load stopwords files. Loading

[GitHub] [lucene] uschindler commented on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
uschindler commented on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028805953 > I'll explore if we can remove the delegation around resource loading without changing the API interface (for now). I won't care about API, it is all not really public,

[GitHub] [lucene] javanna commented on a change in pull request #635: LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery

2022-02-03 Thread GitBox
javanna commented on a change in pull request #635: URL: https://github.com/apache/lucene/pull/635#discussion_r798385420 ## File path: lucene/CHANGES.txt ## @@ -128,6 +128,9 @@ New Features based on TotalHitCountCollector that allows users to parallelize counting the

[GitHub] [lucene] javanna commented on a change in pull request #639: LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests

2022-02-03 Thread GitBox
javanna commented on a change in pull request #639: URL: https://github.com/apache/lucene/pull/639#discussion_r798383042 ## File path: lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java ## @@ -492,33 +493,43 @@ private void

[GitHub] [lucene] mocobeta edited a comment on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
mocobeta edited a comment on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028790164 Thanks for your comment. Minor correction - we currently don't even have DictionaryLoader but the resource loading methods are embedded in two abstract BinaryDictionarys

[GitHub] [lucene] javanna commented on a change in pull request #639: LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests

2022-02-03 Thread GitBox
javanna commented on a change in pull request #639: URL: https://github.com/apache/lucene/pull/639#discussion_r798382358 ## File path: lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java ## @@ -492,33 +493,43 @@ private void

[GitHub] [lucene] mocobeta commented on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
mocobeta commented on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028790164 Thanks for your comment. Minor correction - we currently don't even have DictionaryLoader but the resource loading methods are embedded in two BinaryDictionarys themselves. My

[GitHub] [lucene] uschindler commented on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
uschindler commented on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028771777 > Please don't do the current code or add opens. Just simple remove the DictionaryLoader class completely and just use getResourceAsStream() directly. Most important: Remove

[GitHub] [lucene] mocobeta commented on pull request #638: LUCENE-10393: Unify resource loader in kuromoji and nori

2022-02-03 Thread GitBox
mocobeta commented on pull request #638: URL: https://github.com/apache/lucene/pull/638#issuecomment-1028736869 Hi @uschindler, I think I need your review or opinion. This reduces code duplication and technically works, but adds "open" directives in the module descriptors. If this