[PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-03-18 Thread via GitHub
shubhamvishu opened a new pull request, #13187: URL: https://github.com/apache/lucene/pull/13187 ### Description This PR allows passing a custom vector similarity function to DVS implementations of `VectorSimilarityValuesSource` as opposed to current behaviour which by default only

[I] Support for building materialized views using Lucene formats [lucene]

2024-03-18 Thread via GitHub
bharath-techie opened a new issue, #13188: URL: https://github.com/apache/lucene/issues/13188 ### Description We are exploring the use case of building materialized views for certain fields and dimensions using [Star Tree

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-18 Thread via GitHub
gf2121 commented on code in PR #13149: URL: https://github.com/apache/lucene/pull/13149#discussion_r1527917667 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -185,6 +186,13 @@ public void visit(DocIdSetIterator iterator) throws IOException {

Re: [PR] Fix TestLucene90FieldInfosFormat.testRandom [lucene]

2024-03-18 Thread via GitHub
msokolov commented on code in PR #13135: URL: https://github.com/apache/lucene/pull/13135#discussion_r1528450684 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseFieldInfoFormatTestCase.java: ## @@ -278,46 +278,50 @@ public void testRandom() throws Exception

Re: [PR] Revert "Add new parallel merge task executor for parallel actions within a single merge action" [lucene]

2024-03-18 Thread via GitHub
benwtrent merged PR #13189: URL: https://github.com/apache/lucene/pull/13189 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-2003789175 I am going to revert the change and open a new PR for iterating a fix. `RateLimitedIndexOutput` isn't threadsafe and our rate limiting assumes a single thread. With this

Re: [PR] Fix TestLucene90FieldInfosFormat.testRandom [lucene]

2024-03-18 Thread via GitHub
shubhamvishu commented on code in PR #13135: URL: https://github.com/apache/lucene/pull/13135#discussion_r1528555456 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseFieldInfoFormatTestCase.java: ## @@ -278,46 +278,50 @@ public void testRandom() throws

[PR] Revert "Add new parallel merge task executor for parallel actions within a single merge action" [lucene]

2024-03-18 Thread via GitHub
benwtrent opened a new pull request, #13189: URL: https://github.com/apache/lucene/pull/13189 Reverts apache/lucene#13124 The reason for this revert is `RateLimitedIndexOutput` `RateLimitedIndexOutput` assumes a single thread and is not multi-threaded safe. Will revert the

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2024-03-18 Thread via GitHub
benwtrent merged PR #12915: URL: https://github.com/apache/lucene/pull/12915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1528339145 ## lucene/CHANGES.txt: ## @@ -174,12 +174,14 @@ API Changes New Features - - * GITHUB#12679: Add support for similarity-based vector

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2024-03-18 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1528440247 ## lucene/CHANGES.txt: ## @@ -174,12 +174,14 @@ API Changes New Features - - * GITHUB#12679: Add support for similarity-based vector searches

Re: [PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-03-18 Thread via GitHub
msokolov commented on PR #13187: URL: https://github.com/apache/lucene/pull/13187#issuecomment-2003799323 There is some discussion how to make similarities more pluggable https://github.com/apache/lucene/issues/13182 that seems relevant. Part of the idea there is to accept ordinal values

[PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-18 Thread via GitHub
benwtrent opened a new pull request, #13190: URL: https://github.com/apache/lucene/pull/13190 This commit adds a new interface to all MergeScheduler classes that allows the scheduler to provide an Executor for intra-merge parallelism. The first sub-class to satisfy this new interface is

Re: [PR] gh-13147: use dense bit-encoding for frequent terms [lucene]

2024-03-18 Thread via GitHub
msokolov commented on PR #13153: URL: https://github.com/apache/lucene/pull/13153#issuecomment-2004334853 after disabling this for fields with positions, luceneutil perf looks pretty flat. I think it simply doesn't have any test cases that would exercise this. I wrote a small benchmark

Re: [PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-03-18 Thread via GitHub
shubhamvishu commented on PR #13187: URL: https://github.com/apache/lucene/pull/13187#issuecomment-2003991803 Thanks for the review @msokolov! The idea to make it pluggable seems relevant and interesting. Currently it was not possible to use any custom vector similarity function other than

Re: [PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on PR #13187: URL: https://github.com/apache/lucene/pull/13187#issuecomment-2004052748 > Though I'm not sure if this change conflicts with or makes things difficult for the ongoing efforts to have pluggability (maybe @benwtrent would be interested in sharing his

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on PR #13190: URL: https://github.com/apache/lucene/pull/13190#issuecomment-2004024374 @dweiss @mikemccand I am currently iterating on how to best make `RateLimitedIndexOutput` `MergePolicy` and `MergeRateLimiter` thread safe. Right now, it is all assumed that

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on code in PR #13190: URL: https://github.com/apache/lucene/pull/13190#discussion_r1528642636 ## lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java: ## @@ -281,11 +297,11 @@ public IndexOutput createOutput(String name, IOContext

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-18 Thread via GitHub
antonha commented on code in PR #13149: URL: https://github.com/apache/lucene/pull/13149#discussion_r1528679700 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -185,6 +186,13 @@ public void visit(DocIdSetIterator iterator) throws IOException {

Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

2024-03-18 Thread via GitHub
github-actions[bot] commented on PR #13143: URL: https://github.com/apache/lucene/pull/13143#issuecomment-2005421733 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your

Re: [PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-03-18 Thread via GitHub
shubhamvishu commented on PR #13187: URL: https://github.com/apache/lucene/pull/13187#issuecomment-2004568222 @benwtrent So should we instead wait for the pluggability support and discard this for now? or Is it possible to go forward with this? > What makes this PR doubly worrying is

[I] TestTaxonomyFacetValueSource.testRandom fails [lucene]

2024-03-18 Thread via GitHub
benwtrent opened a new issue, #13191: URL: https://github.com/apache/lucene/issues/13191 ### Description ``` org.apache.lucene.facet.taxonomy.TestTaxonomyFacetValueSource > testRandom FAILED java.lang.AssertionError: expected:<10> but was:<9> at

Re: [I] TestTaxonomyFacetValueSource.testRandom fails [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on issue #13191: URL: https://github.com/apache/lucene/issues/13191#issuecomment-2004940562 git-bisect says its this commit: b5795db0cf517f8942eed868752249df9b105603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Exploring GPU based kNN vector search [lucene]

2024-03-18 Thread via GitHub
chatman commented on issue #13003: URL: https://github.com/apache/lucene/issues/13003#issuecomment-2005698009 As an initial proof of concept integration to evaluate performance, we put together a repository. https://github.com/SearchScale/lucene-cuvs The benchmarks are against single