[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2022-05-09 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534126#comment-17534126 ] Robert Muir commented on LUCENE-9356: - I think the "problem" in current test is the inherent

[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2022-05-09 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534124#comment-17534124 ] Robert Muir commented on LUCENE-9356: - i'd disable compound file as a first pass on any test too.

[jira] [Created] (LUCENE-10564) SparseFixedBitSet#or doesn't update memory accounting

2022-05-09 Thread Julie Tibshirani (Jira)
Julie Tibshirani created LUCENE-10564: - Summary: SparseFixedBitSet#or doesn't update memory accounting Key: LUCENE-10564 URL: https://issues.apache.org/jira/browse/LUCENE-10564 Project: Lucene -

[GitHub] [lucene] LuXugang commented on a diff in pull request #870: LUCENE-10502: Refactor hnswVectors format

2022-05-09 Thread GitBox
LuXugang commented on code in PR #870: URL: https://github.com/apache/lucene/pull/870#discussion_r868794312 ## lucene/core/src/java/org/apache/lucene/codecs/lucene92/Lucene92HnswVectorsFormat.java: ## @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [lucene] jtibshirani commented on a diff in pull request #870: LUCENE-10502: Refactor hnswVectors format

2022-05-09 Thread GitBox
jtibshirani commented on code in PR #870: URL: https://github.com/apache/lucene/pull/870#discussion_r868790940 ## lucene/core/src/java/org/apache/lucene/codecs/lucene92/Lucene92HnswVectorsFormat.java: ## @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [lucene] jtibshirani commented on a diff in pull request #870: LUCENE-10502: Refactor hnswVectors format

2022-05-09 Thread GitBox
jtibshirani commented on code in PR #870: URL: https://github.com/apache/lucene/pull/870#discussion_r868790940 ## lucene/core/src/java/org/apache/lucene/codecs/lucene92/Lucene92HnswVectorsFormat.java: ## @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2022-05-09 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534118#comment-17534118 ] Robert Muir commented on LUCENE-9356: - mulling on it more, this to me seems like the way to go.

[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2022-05-09 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534117#comment-17534117 ] Robert Muir commented on LUCENE-9356: - by the way, if we just want to improve the exception path,

[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2022-05-09 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534116#comment-17534116 ] Robert Muir commented on LUCENE-9356: - The test seems wrong to me, for example it does not consider

[jira] [Commented] (LUCENE-10532) Remove @Slow annotation

2022-05-09 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534112#comment-17534112 ] ASF subversion and git services commented on LUCENE-10532: -- Commit

[jira] [Resolved] (LUCENE-10532) Remove @Slow annotation

2022-05-09 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-10532. -- Fix Version/s: 9.2 Resolution: Fixed > Remove @Slow annotation >

[jira] [Commented] (LUCENE-10532) Remove @Slow annotation

2022-05-09 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534102#comment-17534102 ] ASF subversion and git services commented on LUCENE-10532: -- Commit

[GitHub] [lucene] rmuir merged pull request #832: LUCENE-10532: remove @Slow annotation

2022-05-09 Thread GitBox
rmuir merged PR #832: URL: https://github.com/apache/lucene/pull/832 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [lucene] rmuir commented on pull request #832: LUCENE-10532: remove @Slow annotation

2022-05-09 Thread GitBox
rmuir commented on PR #832: URL: https://github.com/apache/lucene/pull/832#issuecomment-1121832123 > I'm fine with this. Reasons for Slow (and other test groups) are various. I use Slow in projects where certain tests are indeed slow by nature - have to unpack the distribution/ fork

[GitHub] [lucene] rmuir commented on pull request #832: LUCENE-10532: remove @Slow annotation

2022-05-09 Thread GitBox
rmuir commented on PR #832: URL: https://github.com/apache/lucene/pull/832#issuecomment-1121826968 Thanks @cpoerschke for correcting my dyslexia :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] LuXugang commented on a diff in pull request #870: LUCENE-10502: Refactor hnswVectors format

2022-05-09 Thread GitBox
LuXugang commented on code in PR #870: URL: https://github.com/apache/lucene/pull/870#discussion_r868760034 ## lucene/core/src/java/org/apache/lucene/codecs/lucene92/Lucene92HnswVectorsFormat.java: ## @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [lucene] mocobeta commented on pull request #874: LUCENE-10471 Increse max dims for vectors to 2048

2022-05-09 Thread GitBox
mocobeta commented on PR #874: URL: https://github.com/apache/lucene/pull/874#issuecomment-1121814970 I'm curious about how such large models (to me) are practically common or will be common in the near future (in the IR area). I don't have enough expertise to agree or disagree - it's

[jira] [Commented] (LUCENE-10471) Increase the number of dims for KNN vectors to 2048

2022-05-09 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534086#comment-17534086 ] Robert Muir commented on LUCENE-10471: -- I think the major problem is still no Vector API in the

[GitHub] [lucene] rmuir commented on pull request #777: LUCENE-10488: Optimize Facets#getTopDims in ConcurrentSortedSetDocValuesFacetCounts

2022-05-09 Thread GitBox
rmuir commented on PR #777: URL: https://github.com/apache/lucene/pull/777#issuecomment-1121805275 just an observation, this is a large amount of code changes for performance change that may be in the noise? I'm a bit confused. -- This is an automated message from the Apache Git Service.

[jira] [Commented] (LUCENE-10551) LowercaseAsciiCompression should return false when it's unable to compress

2022-05-09 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534074#comment-17534074 ] Robert Muir commented on LUCENE-10551: -- Thanks [~irislpx] for solving the mystery. Maybe there

[GitHub] [lucene] rmuir commented on pull request #875: LUCENE-10560: Speed up OrdinalMap construction a bit.

2022-05-09 Thread GitBox
rmuir commented on PR #875: URL: https://github.com/apache/lucene/pull/875#issuecomment-1121772167 I kinda feel like in this case we are trying to outsmart the JIT compiler with optimizations it has for `Arrays.equals()`. I understand the idea that we could be smarter based on the

[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-09 Thread GitBox
mayya-sharipova commented on code in PR #872: URL: https://github.com/apache/lucene/pull/872#discussion_r868594873 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsWriter.java: ## @@ -53,12 +53,14 @@ public final class Lucene91HnswVectorsWriter

[GitHub] [lucene] gsmiller commented on pull request #777: LUCENE-10488: Optimize Facets#getTopDims in ConcurrentSortedSetDocValuesFacetCounts

2022-05-09 Thread GitBox
gsmiller commented on PR #777: URL: https://github.com/apache/lucene/pull/777#issuecomment-1121602253 This looks great! Thanks @Yuti-G! It would be nice if we could create a common abstract class to hold some of the common logic between this and the non-concurrent implementation. Seems

[GitHub] [lucene] Yuti-G closed pull request #843: LUCENE-10538: TopN is not being used in getTopChildren in RangeFacetCounts

2022-05-09 Thread GitBox
Yuti-G closed pull request #843: LUCENE-10538: TopN is not being used in getTopChildren in RangeFacetCounts URL: https://github.com/apache/lucene/pull/843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[jira] [Commented] (LUCENE-10538) TopN is not being used in getTopChildren()

2022-05-09 Thread Yuting Gan (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534019#comment-17534019 ] Yuting Gan commented on LUCENE-10538: - Yes, thanks [~gsmiller]! I am working on LUCENE-10550 and

[jira] [Commented] (LUCENE-10471) Increase the number of dims for KNN vectors to 2048

2022-05-09 Thread Julie Tibshirani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534018#comment-17534018 ] Julie Tibshirani commented on LUCENE-10471: --- I also don't have an objection to increasing it

[jira] [Comment Edited] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533982#comment-17533982 ] Tomoko Uchida edited comment on LUCENE-10562 at 5/9/22 8:08 PM: Yes,

[jira] [Commented] (LUCENE-10551) LowercaseAsciiCompression should return false when it's unable to compress

2022-05-09 Thread Peixin Li (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533993#comment-17533993 ] Peixin Li commented on LUCENE-10551: we have identified the issue is not related to code in Lucene.

[jira] [Comment Edited] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533982#comment-17533982 ] Tomoko Uchida edited comment on LUCENE-10562 at 5/9/22 7:43 PM: Yes,

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533982#comment-17533982 ] Tomoko Uchida commented on LUCENE-10562: Yes, it's not about "query execution (retrieving

[GitHub] [lucene] mayya-sharipova commented on pull request #870: LUCENE-10502: Refactor hnswVectors format

2022-05-09 Thread GitBox
mayya-sharipova commented on PR #870: URL: https://github.com/apache/lucene/pull/870#issuecomment-1121489041 @msokolov Thanks for your feedback on this PR. I am wondering if you have any further feedback for this work. It would be nice to get it merged for 9.2 Lucene release. -- This is

[GitHub] [lucene] jtibshirani commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-09 Thread GitBox
jtibshirani commented on code in PR #872: URL: https://github.com/apache/lucene/pull/872#discussion_r868255093 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsWriter.java: ## @@ -53,12 +53,14 @@ public final class Lucene91HnswVectorsWriter extends

[GitHub] [lucene] jtibshirani commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-09 Thread GitBox
jtibshirani commented on PR #872: URL: https://github.com/apache/lucene/pull/872#issuecomment-1121379649 Thanks, this looks the same as what I was seeing now! It's good motivation to add Lucene to ann-benchmarks so we can stop using a custom local benchmark set-up! -- This is an

[GitHub] [lucene] jtibshirani commented on a diff in pull request #870: LUCENE-10502: Refactor hnswVectors format

2022-05-09 Thread GitBox
jtibshirani commented on code in PR #870: URL: https://github.com/apache/lucene/pull/870#discussion_r868240961 ## lucene/core/src/java/org/apache/lucene/codecs/lucene92/Lucene92HnswVectorsFormat.java: ## @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[jira] [Commented] (LUCENE-10538) TopN is not being used in getTopChildren()

2022-05-09 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533912#comment-17533912 ] Greg Miller commented on LUCENE-10538: -- So I think the order of operations here is: 1. Deliver

[GitHub] [lucene] gsmiller commented on pull request #843: LUCENE-10538: TopN is not being used in getTopChildren in RangeFacetCounts

2022-05-09 Thread GitBox
gsmiller commented on PR #843: URL: https://github.com/apache/lucene/pull/843#issuecomment-1121351525 @Yuti-G would it make sense to close out this PR since I don't think we plan to merge this as it is? -- This is an automated message from the Apache Git Service. To respond to the

[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks

2022-05-09 Thread Balmukund Mandal (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533911#comment-17533911 ] Balmukund Mandal commented on LUCENE-9625: -- I was trying to run the benchmark and has a couple

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Henrik Hertel (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533874#comment-17533874 ] Henrik Hertel commented on LUCENE-10562: [~uschindler] you are correct, it is indeed a german

[GitHub] [lucene] jpountz opened a new pull request, #875: LUCENE-10560: Speed up OrdinalMap construction a bit.

2022-05-09 Thread GitBox
jpountz opened a new pull request, #875: URL: https://github.com/apache/lucene/pull/875 I benchmarked OrdinalMap construction over high-cardinality fields, and lots of time gets spent into `PriorityQueue#downHeap` due to entry comparisons. I added a small hack that speeds up these

[jira] [Comment Edited] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Uwe Schindler (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533857#comment-17533857 ] Uwe Schindler edited comment on LUCENE-10562 at 5/9/22 3:38 PM: As

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Uwe Schindler (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533857#comment-17533857 ] Uwe Schindler commented on LUCENE-10562: As explanation why this is slow: It has nothing to do

[jira] [Comment Edited] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Uwe Schindler (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533856#comment-17533856 ] Uwe Schindler edited comment on LUCENE-10562 at 5/9/22 3:31 PM: Hi, I

[jira] [Comment Edited] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Uwe Schindler (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533856#comment-17533856 ] Uwe Schindler edited comment on LUCENE-10562 at 5/9/22 3:30 PM: Hi, I

[jira] [Resolved] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Uwe Schindler (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-10562. Resolution: Won't Fix > Large system: Wildcard search leads to full index scan despite

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Uwe Schindler (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533856#comment-17533856 ] Uwe Schindler commented on LUCENE-10562: Hi, I think those question do not relate to Lucene and

[GitHub] [lucene] mayya-sharipova opened a new pull request, #874: LUCENE-10471 Increse max dims for vectors to 2048

2022-05-09 Thread GitBox
mayya-sharipova opened a new pull request, #874: URL: https://github.com/apache/lucene/pull/874 Increase the maximum number of dims for KNN vectors to 2048. The current maximum allowed number of dimensions is equal to 1024. But we see in practice a number of models that produce

[GitHub] [lucene] mayya-sharipova commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-09 Thread GitBox
mayya-sharipova commented on PR #872: URL: https://github.com/apache/lucene/pull/872#issuecomment-1121144832 @jtibshirani Thanks for the comment. I've rerun the benchmarks as you suggested, and here are the new results ```txt kApproach

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533765#comment-17533765 ] Tomoko Uchida commented on LUCENE-10562: bq. I guess that would increase my index size by some

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Henrik Hertel (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533762#comment-17533762 ] Henrik Hertel commented on LUCENE-10562: Sure, that could help, but I guess that would increase

[jira] [Comment Edited] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533751#comment-17533751 ] Tomoko Uchida edited comment on LUCENE-10562 at 5/9/22 10:57 AM: - One

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533751#comment-17533751 ] Tomoko Uchida commented on LUCENE-10562: One thing I could recommend is, that instead of using

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533747#comment-17533747 ] Tomoko Uchida commented on LUCENE-10562: I'm not fully sure how filters are implemented in

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Henrik Hertel (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533743#comment-17533743 ] Henrik Hertel commented on LUCENE-10562: Thanks for your answer. Well, from my naive point of

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533739#comment-17533739 ] Tomoko Uchida commented on LUCENE-10562: As for "despite filter query", sorry, why you can

[jira] [Commented] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533735#comment-17533735 ] Tomoko Uchida commented on LUCENE-10562: Infix or suffix wildcard query is extremely slow in

[jira] [Updated] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

2022-05-09 Thread Henrik Hertel (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henrik Hertel updated LUCENE-10562: --- Description: I use Solr and have a large system with 1TB in one core and about 5 million

[GitHub] [lucene] zacharymorn commented on pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader

2022-05-09 Thread GitBox
zacharymorn commented on PR #833: URL: https://github.com/apache/lucene/pull/833#issuecomment-1120692187 Hi @jpountz @jtibshirani , just want to check back on this PR and see if it looks ready to you? -- This is an automated message from the Apache Git Service. To respond to the message,