[GitHub] [lucene] gf2121 closed pull request #530: LUCENE-10297: Speed up medium cardinality fields with readLongs and SIMD
gf2121 closed pull request #530: URL: https://github.com/apache/lucene/pull/530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459689#comment-17459689 ] Feng Guo commented on LUCENE-10315: --- The optimization can only be triggered when {{count == BKDConfig#DEFAULT_MAX_POINTS_IN_LEAF_NODE}}, This is fragile because users can customize the {{maxPointsInLeaf}} in the Codec, leading the optimization meaningless. Here are some ways i can think of to address this: 1. Directly drop the support of customizing {{maxPointsInLeaf}}, like what we do in postings. 2. Generate a series of ForUtils, like {{ForUitil128}}, {{ForUitil256}}, {{ForUitil512}}, {{ForUtil1024}} ... and make some notes to hint users to choose values from them. > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I > benchmarked this optimization by mocking some random LongPoint and querying > them with PointInSetQuery. > *Benchmark Result* > |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff > percentage| > |1|32|1|51.44|148.26|188.22%| > |1|32|2|26.8|101.88|280.15%| > |1|32|4|14.04|53.52|281.20%| > |1|32|8|7.04|28.54|305.40%| > |1|32|16|3.54|14.61|312.71%| > |1|128|1|110.56|350.26|216.81%| > |1|128|8|16.6|89.81|441.02%| > |1|128|16|8.45|48.07|468.88%| > |1|128|32|4.2|25.35|503.57%| > |1|128|64|2.13|13.02|511.27%| > |1|1024|1|536.19|843.88|57.38%| > |1|1024|8|109.71|251.89|129.60%| > |1|1024|32|33.24|104.11|213.21%| > |1|1024|128|8.87|30.47|243.52%| > |1|1024|512|2.24|8.3|270.54%| > |1|8192|1|.33|5000|50.00%| > |1|8192|32|139.47|214.59|53.86%| > |1|8192|128|54.59|109.23|100.09%| > |1|8192|512|15.61|36.15|131.58%| > |1|8192|2048|4.11|11.14|171.05%| > |1|1048576|1|2597.4|3030.3|16.67%| > |1|1048576|32|314.96|371.75|18.03%| > |1|1048576|128|99.7|116.28|16.63%| > |1|1048576|512|30.5|37.15|21.80%| > |1|1048576|2048|10.38|12.3|18.50%| > |1|8388608|1|2564.1|3174.6|23.81%| > |1|8388608|32|196.27|238.95|21.75%| > |1|8388608|128|55.36|68.03|22.89%| > |1|8388608|512|15.58|19.24|23.49%| > |1|8388608|2048|4.56|5.71|25.22%| > The indices size is reduced for low cardinality fields and flat for high > cardinality fields. > {code:java} > 113Mindex_1_doc_32_cardinality_baseline > 114Mindex_1_doc_32_cardinality_candidate > 140Mindex_1_doc_128_cardinality_baseline > 133Mindex_1_doc_128_cardinality_candidate > 193Mindex_1_doc_1024_cardinality_baseline > 174Mindex_1_doc_1024_cardinality_candidate > 241Mindex_1_doc_8192_cardinality_baseline > 233Mindex_1_doc_8192_cardinality_candidate > 314Mindex_1_doc_1048576_cardinality_baseline > 315Mindex_1_doc_1048576_cardinality_candidate > 392Mindex_1_doc_8388608_cardinality_baseline > 391Mindex_1_doc_8388608_cardinality_candidate > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10315: -- Description: This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization by mocking some random LongPoint and querying them with PointInSetQuery. *Benchmark Result* |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff percentage| |1|32|1|51.44|148.26|188.22%| |1|32|2|26.8|101.88|280.15%| |1|32|4|14.04|53.52|281.20%| |1|32|8|7.04|28.54|305.40%| |1|32|16|3.54|14.61|312.71%| |1|128|1|110.56|350.26|216.81%| |1|128|8|16.6|89.81|441.02%| |1|128|16|8.45|48.07|468.88%| |1|128|32|4.2|25.35|503.57%| |1|128|64|2.13|13.02|511.27%| |1|1024|1|536.19|843.88|57.38%| |1|1024|8|109.71|251.89|129.60%| |1|1024|32|33.24|104.11|213.21%| |1|1024|128|8.87|30.47|243.52%| |1|1024|512|2.24|8.3|270.54%| |1|8192|1|.33|5000|50.00%| |1|8192|32|139.47|214.59|53.86%| |1|8192|128|54.59|109.23|100.09%| |1|8192|512|15.61|36.15|131.58%| |1|8192|2048|4.11|11.14|171.05%| |1|1048576|1|2597.4|3030.3|16.67%| |1|1048576|32|314.96|371.75|18.03%| |1|1048576|128|99.7|116.28|16.63%| |1|1048576|512|30.5|37.15|21.80%| |1|1048576|2048|10.38|12.3|18.50%| |1|8388608|1|2564.1|3174.6|23.81%| |1|8388608|32|196.27|238.95|21.75%| |1|8388608|128|55.36|68.03|22.89%| |1|8388608|512|15.58|19.24|23.49%| |1|8388608|2048|4.56|5.71|25.22%| The indices size is reduced for low cardinality fields and flat for high cardinality fields. {code:java} 113Mindex_1_doc_32_cardinality_baseline 114Mindex_1_doc_32_cardinality_candidate 140Mindex_1_doc_128_cardinality_baseline 133Mindex_1_doc_128_cardinality_candidate 193Mindex_1_doc_1024_cardinality_baseline 174Mindex_1_doc_1024_cardinality_candidate 241Mindex_1_doc_8192_cardinality_baseline 233Mindex_1_doc_8192_cardinality_candidate 314Mindex_1_doc_1048576_cardinality_baseline 315Mindex_1_doc_1048576_cardinality_candidate 392Mindex_1_doc_8388608_cardinality_baseline 391Mindex_1_doc_8388608_cardinality_candidate {code} was: This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization by mocking some random LongPoint and querying them with PointInSetQuery. *Benchmark Result* |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff percentage| |1|32|1|51.44|148.26|188.22%| |1|32|2|26.8|101.88|280.15%| |1|32|4|14.04|53.52|281.20%| |1|32|8|7.04|28.54|305.40%| |1|32|16|3.54|14.61|312.71%| |1|128|1|110.56|350.26|216.81%| |1|128|8|16.6|89.81|441.02%| |1|128|16|8.45|48.07|468.88%| |1|128|32|4.2|25.35|503.57%| |1|128|64|2.13|13.02|511.27%| |1|1024|1|536.19|843.88|57.38%| |1|1024|8|109.71|251.89|129.60%| |1|1024|32|33.24|104.11|213.21%| |1|1024|128|8.87|30.47|243.52%| |1|1024|512|2.24|8.3|270.54%| |1|8192|1|.33|5000|50.00%| |1|8192|32|139.47|214.59|53.86%| |1|8192|128|54.59|109.23|100.09%| |1|8192|512|15.61|36.15|131.58%| |1|8192|2048|4.11|11.14|171.05%| |1|1048576|1|2597.4|3030.3|16.67%| |1|1048576|32|314.96|371.75|18.03%| |1|1048576|128|99.7|116.28|16.63%| |1|1048576|512|30.5|37.15|21.80%| |1|1048576|2048|10.38|12.3|18.50%| |1|8388608|1|2564.1|3174.6|23.81%| |1|8388608|32|196.27|238.95|21.75%| |1|8388608|128|55.36|68.03|22.89%| |1|8388608|512|15.58|19.24|23.49%| |1|8388608|2048|4.56|5.71|25.22%| The indices size is reduced for low cardinality fields and flat for high cardinality fields. {code:java} 113Mindex_1_doc_32_cardinality_baseline 114Mindex_1_doc_32_cardinality_candidate 140Mindex_1_doc_128_cardinality_baseline 133Mindex_1_doc_128_cardinality_candidate 241Mindex_1_doc_8192_cardinality_baseline 233Mindex_1_doc_8192_cardinality_candidate 193Mindex_1_doc_1024_cardinality_baseline 174Mindex_1_doc_1024_cardinality_candidate 314Mindex_1_doc_1048576_cardinality_baseline 315Mindex_1_doc_1048576_cardinality_candidate 392Mindex_1_doc_8388608_cardinality_baseline 391Mindex_1_doc_8388608_cardinality_candidate {code} > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 >
[GitHub] [lucene] zacharymorn commented on pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues
zacharymorn commented on pull request #534: URL: https://github.com/apache/lucene/pull/534#issuecomment-994295420 Thanks @msokolov @jtibshirani @jpountz for the review and suggestions! I've pushed an update to address the feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues
zacharymorn commented on pull request #534: URL: https://github.com/apache/lucene/pull/534#issuecomment-994295107 > Maybe also and a CHANGELOG entry? I guess it's a somewhat internal API, being in codecs, but it's not marked @experimental, so if someone had implemented their own KnnVectorsWriter codec, they'd need to change it ... Just added an entry into `9.1.0`. Shall I also add `@lucene.experimental` for it as well in case it needs to change again? But I don't know about this API enough to foresee that need... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues
zacharymorn commented on a change in pull request #534: URL: https://github.com/apache/lucene/pull/534#discussion_r769243295 ## File path: lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java ## @@ -40,7 +41,8 @@ protected KnnVectorsWriter() {} /** Write all values contained in the provided reader */ - public abstract void writeField(FieldInfo fieldInfo, VectorValues values) throws IOException; + public abstract void writeField(FieldInfo fieldInfo, KnnVectorsReader knnVectorReader) Review comment: Good suggestion! I've updated them to use `knnVectorsReader`. ## File path: lucene/core/src/java/org/apache/lucene/index/VectorValuesWriter.java ## @@ -109,11 +110,15 @@ private void updateBytesUsed() { public void flush(Sorter.DocMap sortMap, KnnVectorsWriter knnVectorsWriter) throws IOException { VectorValues vectorValues = new BufferedVectorValues(docsWithField, vectors, fieldInfo.getVectorDimension()); -if (sortMap != null) { - knnVectorsWriter.writeField(fieldInfo, new SortingVectorValues(vectorValues, sortMap)); -} else { - knnVectorsWriter.writeField(fieldInfo, vectorValues); -} +KnnVectorsReader vectorsReader = +new EmptyKnnVectorsReader() { + @Override + public VectorValues getVectorValues(String field) throws IOException { +return sortMap != null ? new SortingVectorValues(vectorValues, sortMap) : vectorValues; Review comment: Ops sorry good catch! I've fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues
zacharymorn commented on a change in pull request #534: URL: https://github.com/apache/lucene/pull/534#discussion_r769243208 ## File path: lucene/core/src/java/org/apache/lucene/index/EmptyKnnVectorsReader.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.index; + +import java.io.IOException; +import org.apache.lucene.codecs.KnnVectorsReader; +import org.apache.lucene.search.TopDocs; +import org.apache.lucene.util.Bits; + +/** Abstract base class implementing a {@link KnnVectorsReader} that has no vector values. */ +public abstract class EmptyKnnVectorsReader extends KnnVectorsReader { Review comment: Ah makes sense. Thanks for the context! I've removed it and extended directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical
mayya-sharipova commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r769229827 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java ## @@ -205,6 +215,43 @@ private FieldEntry readField(DataInput input) throws IOException { return new FieldEntry(input, similarityFunction); } + private void fillGraphNodesAndOffsetsByLevel() throws IOException { +for (FieldEntry entry : fields.values()) { + IndexInput input = Review comment: > How would you like to proceed -- work on that PR first (since it seems useful on its own), or move forward with this one and follow-up with a fix soon after? I was under the impression that we are not happy with the current state of this PR and would not want to merge it without some changes. No? > To clarify, I was not thinking that GraphLevels would replace FieldEntry. It would be a second data structure. Can you please elaborate more how to do you see it is organized? How `GraphLevels` connected to a `FieldEntry`? Do you suggest put GraphLevels into a separate file and load them on a first use? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10175) Remove VectorValues#binaryValue?
[ https://issues.apache.org/jira/browse/LUCENE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459636#comment-17459636 ] Julie Tibshirani commented on LUCENE-10175: --- +1 it'd be good to avoid exposing the binary format. I'm not aware of a way the method is helpful to external users, or necessary for features we want to build. > Remove VectorValues#binaryValue? > > > Key: LUCENE-10175 > URL: https://issues.apache.org/jira/browse/LUCENE-10175 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Priority: Major > > It's unclear to me why we have VectorValues#binaryValue. This exposes > implementation details that I'd rather like to avoid exposing in our > higher-level APIs such as the Directory byte order. And I worry that it might > be in the way of upcoming features like supporting bfloat16. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10191) Optimize vector functions by precomputing magnitudes
[ https://issues.apache.org/jira/browse/LUCENE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459633#comment-17459633 ] Julie Tibshirani commented on LUCENE-10191: --- I've been pondering this more. It definitely seems possible to move VectorSimilarityFunction to Lucene90HnswVectorsFormat. Maybe we could start with something really simple, like a new method {{VectorValues#computeDistance(float[] query)}} that uses the configured distance function. I guess {{computeDistance}} could give a simple interface but do something fancy if it wanted to, since it knows how exactly its vectors are represented. My main hesitation is that VectorSimilarityFunction is a cross-cutting concept that makes sense across format implementations. In fact, I would expect all KnnVectorsFormat to support dot product, euclidean, and cosine. Could there be drift across different formats (maybe a vector function is missing, or is named something different) in a way that hurts users? > Optimize vector functions by precomputing magnitudes > > > Key: LUCENE-10191 > URL: https://issues.apache.org/jira/browse/LUCENE-10191 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Julie Tibshirani >Priority: Minor > > Both euclidean distance (L2 norm) and cosine similarity can be expressed in > terms of dot product and vector magnitudes: > * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2) > * cosine(a, b) = a . b / ||a|| ||b|| > We could compute and store each vector's magnitude upfront while indexing, > and compute the query vector's magnitude once per query. Then we'd calculate > the distance using our (very optimized) dot product method, plus the > precomputed values. > This is an exploratory issue: I haven't tested this out yet, so I'm not sure > how much it would help. I would at least expect it to help with cosine > similarity – several months ago we tried out similar ideas in Elasticsearch > and were able to get a nice boost in cosine performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical
jtibshirani commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r769177664 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java ## @@ -205,6 +215,43 @@ private FieldEntry readField(DataInput input) throws IOException { return new FieldEntry(input, similarityFunction); } + private void fillGraphNodesAndOffsetsByLevel() throws IOException { +for (FieldEntry entry : fields.values()) { + IndexInput input = Review comment: I like this idea. How would you like to proceed -- work on that PR first (since it seems useful on its own), or move forward with this one and follow-up with a fix soon after? To clarify, I was not thinking that `GraphLevels` would replace `FieldEntry`. It would be a second data structure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on pull request #2624: SOLR-15833
madrob commented on pull request #2624: URL: https://github.com/apache/lucene-solr/pull/2624#issuecomment-994157346 I force pushed to my fork -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih opened a new pull request #542: LUCENE-10316: fix TestLRUQueryCache.testCachingAccountableQuery failure
zhaih opened a new pull request #542: URL: https://github.com/apache/lucene/pull/542 # Description & Solution Please see https://issues.apache.org/jira/browse/LUCENE-10316 # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [ ] ~~I have added tests for my changes~~ (Changing a test) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10316) fix TestLRUQueryCache.testCachingAccountableQuery failure
[ https://issues.apache.org/jira/browse/LUCENE-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haoyu Zhai updated LUCENE-10316: Description: I saw this build failure: [https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/348/] with following stack trace {code:java} java.lang.AssertionError: expected:<130.0> but was:<1544976.0> at __randomizedtesting.SeedInfo.seed([F7826B1EB37D545A:995B6ED46A95D1A0]:0) at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:577) at org.junit.Assert.assertEquals(Assert.java:701) at org.apache.lucene.search.TestLRUQueryCache.testCachingAccountableQuery(TestLRUQueryCache.java:570) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) ... NOTE: reproduce with: gradlew test --tests TestLRUQueryCache.testCachingAccountableQuery -Dtests.seed=F7826B1EB37D545A -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ckb-IR -Dtests.timezone=Africa/Dakar -Dtests.asserts=true -Dtests.file.encoding=UTF-8 {code} It does not reproduce on my laptop on current main branch, but since the test is comparing an estimation with a 10% slack, it can fail for sure sometime. was: I saw this build failure: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/348/ with following stack trace {code:java} java.lang.AssertionError: expected:<130.0> but was:<1544976.0> at __randomizedtesting.SeedInfo.seed([F7826B1EB37D545A:995B6ED46A95D1A0]:0) at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:577) at org.junit.Assert.assertEquals(Assert.java:701) at org.apache.lucene.search.TestLRUQueryCache.testCachingAccountableQuery(TestLRUQueryCache.java:570) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) ... {code} It does not reproduce on my laptop on current main branch, but since the test is comparing an estimation with a 10% slack, it can fail for sure sometime. > fix TestLRUQueryCache.testCachingAccountableQuery failure > - > > Key: LUCENE-10316 > URL: https://issues.apache.org/jira/browse/LUCENE-10316 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Reporter: Haoyu Zhai >Priority: Minor > > I saw this build failure: > [https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/348/] > with following stack trace > {code:java} > java.lang.AssertionError: expected:<130.0> but was:<1544976.0> > at > __randomizedtesting.SeedInfo.seed([F7826B1EB37D545A:995B6ED46A95D1A0]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:577) > at org.junit.Assert.assertEquals(Assert.java:701) > at > org.apache.lucene.search.TestLRUQueryCache.testCachingAccountableQuery(TestLRUQueryCache.java:570) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at
[jira] [Commented] (LUCENE-10316) fix TestLRUQueryCache.testCachingAccountableQuery failure
[ https://issues.apache.org/jira/browse/LUCENE-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459420#comment-17459420 ] Haoyu Zhai commented on LUCENE-10316: - So basically the test is about making sure the query cache has the right estimation when the query has implemented the {{Accountable}} interface. When I originally wrote it I estimated the query cache size using {{(query_size + linked_hash_map_entry_size) * query_num}} with 10% slack to allow the error of estimation. But apparently it is not enough sometimes (probably larger number of cache entries will waste more?). Given the aim of the test is make sure when there're known big queries being cached the query cache reflect it correctly, I think we could change that to {{assert(query_cache_size > sum_of_all_queries_cached)}}. Then we won't depend on a slack to assert the correctness. > fix TestLRUQueryCache.testCachingAccountableQuery failure > - > > Key: LUCENE-10316 > URL: https://issues.apache.org/jira/browse/LUCENE-10316 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Reporter: Haoyu Zhai >Priority: Minor > > I saw this build failure: > https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/348/ > with following stack trace > {code:java} > java.lang.AssertionError: expected:<130.0> but was:<1544976.0> > at > __randomizedtesting.SeedInfo.seed([F7826B1EB37D545A:995B6ED46A95D1A0]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:577) > at org.junit.Assert.assertEquals(Assert.java:701) > at > org.apache.lucene.search.TestLRUQueryCache.testCachingAccountableQuery(TestLRUQueryCache.java:570) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > ... > {code} > It does not reproduce on my laptop on current main branch, but since the test > is comparing an estimation with a 10% slack, it can fail for sure sometime. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10316) fix TestLRUQueryCache.testCachingAccountableQuery failure
Haoyu Zhai created LUCENE-10316: --- Summary: fix TestLRUQueryCache.testCachingAccountableQuery failure Key: LUCENE-10316 URL: https://issues.apache.org/jira/browse/LUCENE-10316 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Haoyu Zhai I saw this build failure: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/348/ with following stack trace {code:java} java.lang.AssertionError: expected:<130.0> but was:<1544976.0> at __randomizedtesting.SeedInfo.seed([F7826B1EB37D545A:995B6ED46A95D1A0]:0) at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:577) at org.junit.Assert.assertEquals(Assert.java:701) at org.apache.lucene.search.TestLRUQueryCache.testCachingAccountableQuery(TestLRUQueryCache.java:570) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) ... {code} It does not reproduce on my laptop on current main branch, but since the test is comparing an estimation with a 10% slack, it can fail for sure sometime. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 opened a new pull request #541: LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil
gf2121 opened a new pull request #541: URL: https://github.com/apache/lucene/pull/541 This approach tried to use a 512 ints ForUtil for BKD ids codec. I benchmarked this optimization by mocking some random LongPoint and querying them with PointInSetQuery. **Benchmark Result** http://www.w3.org/TR/REC-html40;> doc count | field cardinality | query point | baseline QPS | candidate QPS | diff percentage -- | -- | -- | -- | -- | -- 1 | 32 | 1 | 51.44 | 148.26 | 188.22% 1 | 32 | 2 | 26.8 | 101.88 | 280.15% 1 | 32 | 4 | 14.04 | 53.52 | 281.20% 1 | 32 | 8 | 7.04 | 28.54 | 305.40% 1 | 32 | 16 | 3.54 | 14.61 | 312.71% 1 | 128 | 1 | 110.56 | 350.26 | 216.81% 1 | 128 | 8 | 16.6 | 89.81 | 441.02% 1 | 128 | 16 | 8.45 | 48.07 | 468.88% 1 | 128 | 32 | 4.2 | 25.35 | 503.57% 1 | 128 | 64 | 2.13 | 13.02 | 511.27% 1 | 1024 | 1 | 536.19 | 843.88 | 57.38% 1 | 1024 | 8 | 109.71 | 251.89 | 129.60% 1 | 1024 | 32 | 33.24 | 104.11 | 213.21% 1 | 1024 | 128 | 8.87 | 30.47 | 243.52% 1 | 1024 | 512 | 2.24 | 8.3 | 270.54% 1 | 8192 | 1 | .33 | 5000 | 50.00% 1 | 8192 | 32 | 139.47 | 214.59 | 53.86% 1 | 8192 | 128 | 54.59 | 109.23 | 100.09% 1 | 8192 | 512 | 15.61 | 36.15 | 131.58% 1 | 8192 | 2048 | 4.11 | 11.14 | 171.05% 1 | 1048576 | 1 | 2597.4 | 3030.3 | 16.67% 1 | 1048576 | 32 | 314.96 | 371.75 | 18.03% 1 | 1048576 | 128 | 99.7 | 116.28 | 16.63% 1 | 1048576 | 512 | 30.5 | 37.15 | 21.80% 1 | 1048576 | 2048 | 10.38 | 12.3 | 18.50% 1 | 8388608 | 1 | 2564.1 | 3174.6 | 23.81% 1 | 8388608 | 32 | 196.27 | 238.95 | 21.75% 1 | 8388608 | 128 | 55.36 | 68.03 | 22.89% 1 | 8388608 | 512 | 15.58 | 19.24 | 23.49% 1 | 8388608 | 2048 | 4.56 | 5.71 | 25.22% The indices size is reduced for low cardinality fields and flat for high cardinality fields. ``` 113Mindex_1_doc_32_cardinality_baseline 114Mindex_1_doc_32_cardinality_candidate 140Mindex_1_doc_128_cardinality_baseline 133Mindex_1_doc_128_cardinality_candidate 241Mindex_1_doc_8192_cardinality_baseline 233Mindex_1_doc_8192_cardinality_candidate 193Mindex_1_doc_1024_cardinality_baseline 174Mindex_1_doc_1024_cardinality_candidate 314Mindex_1_doc_1048576_cardinality_baseline 315Mindex_1_doc_1048576_cardinality_candidate 392Mindex_1_doc_8388608_cardinality_baseline 391Mindex_1_doc_8388608_cardinality_candidate ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)
[ https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459377#comment-17459377 ] Dawid Weiss commented on LUCENE-10313: -- Yeah, it's very similar to log4j appenders and I don't think there should be any problems with it. Please take whatever you need from that branch I created - it stops literally one step from showing the content of the logs buffer - I think this should only update itself when it's actually displayed but it's been years since I wrote any swing code and I forgot what the listener was to actually determine whether a tabbed pane is visible or not. > Remove log4j from dependencies and switch to java logging (in luke) > --- > > Key: LUCENE-10313 > URL: https://issues.apache.org/jira/browse/LUCENE-10313 > Project: Lucene - Core > Issue Type: Task > Components: luke >Reporter: Tomoko Uchida >Priority: Major > > This seems to be a simpler solution at this moment > https://issues.apache.org/jira/browse/LUCENE-10303 > https://issues.apache.org/jira/browse/LUCENE-10308 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10297) Speed up medium cardinality fields with readLongs and SIMD
[ https://issues.apache.org/jira/browse/LUCENE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459370#comment-17459370 ] Feng Guo commented on LUCENE-10297: --- This approach could increase the indices' size for low cardinality fileds. i raised https://issues.apache.org/jira/browse/LUCENE-10315 which looks better, close this now. > Speed up medium cardinality fields with readLongs and SIMD > -- > > Key: LUCENE-10297 > URL: https://issues.apache.org/jira/browse/LUCENE-10297 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > We introduced a bitset optimization for extremly low cardinality fields in > [LUCENE-10233|https://issues.apache.org/jira/browse/LUCENE-10233], but medium > cardinality fields (like 32/128) can rarely trigger this optimization, This > issue is trying to find out a way to speed up them. > In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to > use readLELongs to speed up BKD id blocks, but did not get a obvious gain on > this approach. I think the reason could probably be that we were trying to > optimize the unsorted situation (typically happens for high cardinality > fields) and the bottleneck of queries on high cardinality fields is > {{visitDocValues}} but not {{{}readDocIds{}}}. > However, medium cardinality fields may be tempted for this optimization > because they need to read lots of ids for each term. The basic idea is that > we can compute the delta of the sorted ids and encode/decode them like what > we do in {{{}StoredFieldsInts{}}}. I benchmarked the optimization by mocking > some random longPoint and querying them with {{{}PointInSetQuery{}}}. As > expected, the medium cardinality fields got spped up and high cardinality > fields get even results. > *Benchmark Result* > |doc count|field cardinality|query point|baseline(ms)|candidate(ms)|diff > percentage|baseline(QPS)|candidate(QPS)|diff percentage| > |1|32|1|19|16|-15.79%|52.63|62.50|18.75%| > |1|32|2|34|14|-58.82%|29.41|71.43|142.86%| > |1|32|4|76|22|-71.05%|13.16|45.45|245.45%| > |1|32|8|139|42|-69.78%|7.19|23.81|230.95%| > |1|32|16|279|82|-70.61%|3.58|12.20|240.24%| > |1|128|1|17|11|-35.29%|58.82|90.91|54.55%| > |1|128|8|75|23|-69.33%|13.33|43.48|226.09%| > |1|128|16|126|25|-80.16%|7.94|40.00|404.00%| > |1|128|32|245|50|-79.59%|4.08|20.00|390.00%| > |1|128|64|528|97|-81.63%|1.89|10.31|444.33%| > |1|1024|1|3|2|-33.33%|333.33|500.00|50.00%| > |1|1024|8|13|8|-38.46%|76.92|125.00|62.50%| > |1|1024|32|31|19|-38.71%|32.26|52.63|63.16%| > |1|1024|128|120|67|-44.17%|8.33|14.93|79.10%| > |1|1024|512|480|133|-72.29%|2.08|7.52|260.90%| > |1|8192|1|3|3|0.00%|333.33|333.33|0.00%| > |1|8192|16|18|15|-16.67%|55.56|66.67|20.00%| > |1|8192|64|19|14|-26.32%|52.63|71.43|35.71%| > |1|8192|512|69|43|-37.68%|14.49|23.26|60.47%| > |1|8192|2048|236|134|-43.22%|4.24|7.46|76.12%| > |1|1048576|1|3|2|-33.33%|333.33|500.00|50.00%| > |1|1048576|16|18|19|5.56%|55.56|52.63|-5.26%| > |1|1048576|64|17|17|0.00%|58.82|58.82|0.00%| > |1|1048576|512|34|32|-5.88%|29.41|31.25|6.25%| > |1|1048576|2048|89|93|4.49%|11.24|10.75|-4.30%| -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10297) Speed up medium cardinality fields with readLongs and SIMD
[ https://issues.apache.org/jira/browse/LUCENE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo resolved LUCENE-10297. --- Resolution: Won't Do > Speed up medium cardinality fields with readLongs and SIMD > -- > > Key: LUCENE-10297 > URL: https://issues.apache.org/jira/browse/LUCENE-10297 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > We introduced a bitset optimization for extremly low cardinality fields in > [LUCENE-10233|https://issues.apache.org/jira/browse/LUCENE-10233], but medium > cardinality fields (like 32/128) can rarely trigger this optimization, This > issue is trying to find out a way to speed up them. > In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to > use readLELongs to speed up BKD id blocks, but did not get a obvious gain on > this approach. I think the reason could probably be that we were trying to > optimize the unsorted situation (typically happens for high cardinality > fields) and the bottleneck of queries on high cardinality fields is > {{visitDocValues}} but not {{{}readDocIds{}}}. > However, medium cardinality fields may be tempted for this optimization > because they need to read lots of ids for each term. The basic idea is that > we can compute the delta of the sorted ids and encode/decode them like what > we do in {{{}StoredFieldsInts{}}}. I benchmarked the optimization by mocking > some random longPoint and querying them with {{{}PointInSetQuery{}}}. As > expected, the medium cardinality fields got spped up and high cardinality > fields get even results. > *Benchmark Result* > |doc count|field cardinality|query point|baseline(ms)|candidate(ms)|diff > percentage|baseline(QPS)|candidate(QPS)|diff percentage| > |1|32|1|19|16|-15.79%|52.63|62.50|18.75%| > |1|32|2|34|14|-58.82%|29.41|71.43|142.86%| > |1|32|4|76|22|-71.05%|13.16|45.45|245.45%| > |1|32|8|139|42|-69.78%|7.19|23.81|230.95%| > |1|32|16|279|82|-70.61%|3.58|12.20|240.24%| > |1|128|1|17|11|-35.29%|58.82|90.91|54.55%| > |1|128|8|75|23|-69.33%|13.33|43.48|226.09%| > |1|128|16|126|25|-80.16%|7.94|40.00|404.00%| > |1|128|32|245|50|-79.59%|4.08|20.00|390.00%| > |1|128|64|528|97|-81.63%|1.89|10.31|444.33%| > |1|1024|1|3|2|-33.33%|333.33|500.00|50.00%| > |1|1024|8|13|8|-38.46%|76.92|125.00|62.50%| > |1|1024|32|31|19|-38.71%|32.26|52.63|63.16%| > |1|1024|128|120|67|-44.17%|8.33|14.93|79.10%| > |1|1024|512|480|133|-72.29%|2.08|7.52|260.90%| > |1|8192|1|3|3|0.00%|333.33|333.33|0.00%| > |1|8192|16|18|15|-16.67%|55.56|66.67|20.00%| > |1|8192|64|19|14|-26.32%|52.63|71.43|35.71%| > |1|8192|512|69|43|-37.68%|14.49|23.26|60.47%| > |1|8192|2048|236|134|-43.22%|4.24|7.46|76.12%| > |1|1048576|1|3|2|-33.33%|333.33|500.00|50.00%| > |1|1048576|16|18|19|5.56%|55.56|52.63|-5.26%| > |1|1048576|64|17|17|0.00%|58.82|58.82|0.00%| > |1|1048576|512|34|32|-5.88%|29.41|31.25|6.25%| > |1|1048576|2048|89|93|4.49%|11.24|10.75|-4.30%| -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
Feng Guo created LUCENE-10315: - Summary: Speed up BKD leaf block ids codec by a 512 ints ForUtil Key: LUCENE-10315 URL: https://issues.apache.org/jira/browse/LUCENE-10315 Project: Lucene - Core Issue Type: Improvement Reporter: Feng Guo This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization by mocking some random LongPoint and querying them with PointInSetQuery. *Benchmark Result* |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff percentage| |1|32|1|51.44|148.26|188.22%| |1|32|2|26.8|101.88|280.15%| |1|32|4|14.04|53.52|281.20%| |1|32|8|7.04|28.54|305.40%| |1|32|16|3.54|14.61|312.71%| |1|128|1|110.56|350.26|216.81%| |1|128|8|16.6|89.81|441.02%| |1|128|16|8.45|48.07|468.88%| |1|128|32|4.2|25.35|503.57%| |1|128|64|2.13|13.02|511.27%| |1|1024|1|536.19|843.88|57.38%| |1|1024|8|109.71|251.89|129.60%| |1|1024|32|33.24|104.11|213.21%| |1|1024|128|8.87|30.47|243.52%| |1|1024|512|2.24|8.3|270.54%| |1|8192|1|.33|5000|50.00%| |1|8192|32|139.47|214.59|53.86%| |1|8192|128|54.59|109.23|100.09%| |1|8192|512|15.61|36.15|131.58%| |1|8192|2048|4.11|11.14|171.05%| |1|1048576|1|2597.4|3030.3|16.67%| |1|1048576|32|314.96|371.75|18.03%| |1|1048576|128|99.7|116.28|16.63%| |1|1048576|512|30.5|37.15|21.80%| |1|1048576|2048|10.38|12.3|18.50%| |1|8388608|1|2564.1|3174.6|23.81%| |1|8388608|32|196.27|238.95|21.75%| |1|8388608|128|55.36|68.03|22.89%| |1|8388608|512|15.58|19.24|23.49%| |1|8388608|2048|4.56|5.71|25.22%| The indices size is reduced for low cardinality fields and flat for high cardinality fields. {code:java} 113Mindex_1_doc_32_cardinality_baseline 114Mindex_1_doc_32_cardinality_candidate 140Mindex_1_doc_128_cardinality_baseline 133Mindex_1_doc_128_cardinality_candidate 241Mindex_1_doc_8192_cardinality_baseline 233Mindex_1_doc_8192_cardinality_candidate 193Mindex_1_doc_1024_cardinality_baseline 174Mindex_1_doc_1024_cardinality_candidate 314Mindex_1_doc_1048576_cardinality_baseline 315Mindex_1_doc_1048576_cardinality_candidate 392Mindex_1_doc_8388608_cardinality_baseline 391Mindex_1_doc_8388608_cardinality_candidate {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)
[ https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459344#comment-17459344 ] Dawid Weiss commented on LUCENE-10313: -- https://github.com/dweiss/lucene/tree/LUCENE-10313 It just rips the log4j api, nothing else. But I'll play with the handler a bit - it's fun. > Remove log4j from dependencies and switch to java logging (in luke) > --- > > Key: LUCENE-10313 > URL: https://issues.apache.org/jira/browse/LUCENE-10313 > Project: Lucene - Core > Issue Type: Task > Components: luke >Reporter: Tomoko Uchida >Priority: Major > > This seems to be a simpler solution at this moment > https://issues.apache.org/jira/browse/LUCENE-10303 > https://issues.apache.org/jira/browse/LUCENE-10308 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)
[ https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459322#comment-17459322 ] Dawid Weiss commented on LUCENE-10313: -- Oh. I already started - let me push what I have and you can take it from there or modify it in a way you like it. I don't think the screen buffer is a problem - if one wants it, they can use tee. But I agree the text area can have a larger circular buffer so that it reaches _a lot_ into the history if somebody wants to save the logs. This should be fairly easy to implement. > Remove log4j from dependencies and switch to java logging (in luke) > --- > > Key: LUCENE-10313 > URL: https://issues.apache.org/jira/browse/LUCENE-10313 > Project: Lucene - Core > Issue Type: Task > Components: luke >Reporter: Tomoko Uchida >Priority: Major > > This seems to be a simpler solution at this moment > https://issues.apache.org/jira/browse/LUCENE-10303 > https://issues.apache.org/jira/browse/LUCENE-10308 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10314) inconsistent index options when opening pre 9.0.0 index with 9.0.0
[ https://issues.apache.org/jira/browse/LUCENE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459281#comment-17459281 ] Michael Sokolov commented on LUCENE-10314: -- Small side note here; we have this comment in {{{}IndexOptions{}}}: {{public enum IndexOptions {}} {{ // NOTE: order is important here; FieldInfo uses this}} {{ // order to merge two conflicting IndexOptions (always}} {{ // "downgrades" by picking the lowest).}} which is probably no longer relevant (since conflicting {{IndexOptions}} are forbidden). > inconsistent index options when opening pre 9.0.0 index with 9.0.0 > -- > > Key: LUCENE-10314 > URL: https://issues.apache.org/jira/browse/LUCENE-10314 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 9.0 >Reporter: Ian Lea >Priority: Major > > We have a long-standing index with some mandatory fields and some optional > fields that has been through multiple lucene upgrades without a full > rebuild and on testing out an upgrade from version 8.11.0 to 9.0.0, when > open an IndexWriter we are hitting the exception > Exception in thread "main" java.lang.IllegalArgumentException: cannot > change field "language" from index options=NONE to inconsistent index > options=DOCS > at > org.apache.lucene.index.FieldInfo.verifySameIndexOptions(FieldInfo.java:245) > at > org.apache.lucene.index.FieldInfos$FieldNumbers.verifySameSchema(FieldInfos.java:421) > at > org.apache.lucene.index.FieldInfos$FieldNumbers.addOrGet(FieldInfos.java:357) > at > org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:1263) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1116) > Where language is one of our optional fields. > Presumably this is at least somewhat related to "Index options can no > longer be changed dynamically" as mentioned at > https://lucene.apache.org/core/9_0_0/MIGRATE.html although it fails before > our code attempts to update the index, and we are not trying to change any > index options. > Adding some displays to IndexWriter and FieldInfos and logging rather than > throwing the exception I see > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=DOCS > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=NONE > language curr=NONE, other=DOCS > language curr=NONE, other=DOCS > language curr=NONE, other=DOCS > language curr=NONE, other=DOCS > language curr=NONE, other=DOCS > language curr=NONE, other=DOCS > language curr=NONE, other=DOCS > language curr=NONE, other=DOCS > where there is one line per segment. It logs the exception whenever > other=DOCS. Subset with segment info: > segment _x8(8.2.0):c31753/-1:[diagnostics={timestamp=1565623850605, > lucene.version=8.2.0, java.vm.version=11.0.3+7, java.version=11.0.3, > mergeMaxNumSegments=-1, os.version=3.1.0-1.2-desktop, > java.vendor=AdoptOpenJDK, source=merge, os.arch=amd64, mergeFactor=10, > java.runtime.version=11.0.3+7, > os=Linux}]:[attributes=\{Lucene50StoredFieldsFormat.mode=BEST_SPEED}] > language curr=NONE, other=NONE > segment _y9(8.7.0):c43531/-1:[diagnostics={timestamp=1604597581562, > lucene.version=8.7.0, java.vm.version=11.0.3+7, java.version=11.0.3, > mergeMaxNumSegments=-1, os.version=3.1.0-1.2-desktop, > java.vendor=AdoptOpenJDK, source=merge, os.arch=amd64, mergeFactor=10, > java.runtime.version=11.0.3+7, > os=Linux}]:[attributes=\{Lucene87StoredFieldsFormat.mode=BEST_SPEED}] > language curr=NONE, other=DOCS > NOT throwing java.lang.IllegalArgumentException: cannot change field > "language" from index options=NONE to inconsistent index options=DOCS > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase closed pull request #1193: LUCENE-9154: Remove encodeCeil() to encode bounding box queries
iverase closed pull request #1193: URL: https://github.com/apache/lucene-solr/pull/1193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)
[ https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459269#comment-17459269 ] Tomoko Uchida commented on LUCENE-10313: Thank you - I will take this. It shouldn't be difficult. Regarding implementation, I would set a log handler only for the text area that has a scroll bar and possibly add a "save" button to persist the entire log to a local file. One reason I didn't use the console log appender in the app was that I thought it could lose the important part of the log that can include a long stack trace, without a large scrollback buffer. > Remove log4j from dependencies and switch to java logging (in luke) > --- > > Key: LUCENE-10313 > URL: https://issues.apache.org/jira/browse/LUCENE-10313 > Project: Lucene - Core > Issue Type: Task > Components: luke >Reporter: Tomoko Uchida >Priority: Major > > This seems to be a simpler solution at this moment > https://issues.apache.org/jira/browse/LUCENE-10303 > https://issues.apache.org/jira/browse/LUCENE-10308 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox
msokolov commented on pull request #446: URL: https://github.com/apache/lucene/pull/446#issuecomment-993640438 > Why do we need to exclude small segments from regular merges? If we selected some small segments as part of a full-flush merge, then they wouldn't be available to also be included in a regular merge, right? I also noticed that in IndexWriter where we call findFullFlushMerges, we only do so for merge triggers `GET_READER` and `COMMIT`, but *not* for trigger `FULL_FLUSH`, which seems quite confusing. I wonder if we could find a better name for `findFullFlushMerges`. Also, given that both `findMerges` and `findFullFlushMerges` are both called from the same switch statement, and for different triggers, *and the trigger is passed in as an argument* -- we could get rid of `findFullFlushMerges`, *always* call `findMerges`, and let the merge policy decide what to do based on the value of `trigger`. @s1monw WDTY? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10314) inconsistent index options when opening pre 9.0.0 index with 9.0.0
Ian Lea created LUCENE-10314: Summary: inconsistent index options when opening pre 9.0.0 index with 9.0.0 Key: LUCENE-10314 URL: https://issues.apache.org/jira/browse/LUCENE-10314 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 9.0 Reporter: Ian Lea We have a long-standing index with some mandatory fields and some optional fields that has been through multiple lucene upgrades without a full rebuild and on testing out an upgrade from version 8.11.0 to 9.0.0, when open an IndexWriter we are hitting the exception Exception in thread "main" java.lang.IllegalArgumentException: cannot change field "language" from index options=NONE to inconsistent index options=DOCS at org.apache.lucene.index.FieldInfo.verifySameIndexOptions(FieldInfo.java:245) at org.apache.lucene.index.FieldInfos$FieldNumbers.verifySameSchema(FieldInfos.java:421) at org.apache.lucene.index.FieldInfos$FieldNumbers.addOrGet(FieldInfos.java:357) at org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:1263) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1116) Where language is one of our optional fields. Presumably this is at least somewhat related to "Index options can no longer be changed dynamically" as mentioned at https://lucene.apache.org/core/9_0_0/MIGRATE.html although it fails before our code attempts to update the index, and we are not trying to change any index options. Adding some displays to IndexWriter and FieldInfos and logging rather than throwing the exception I see language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=DOCS language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=NONE language curr=NONE, other=DOCS language curr=NONE, other=DOCS language curr=NONE, other=DOCS language curr=NONE, other=DOCS language curr=NONE, other=DOCS language curr=NONE, other=DOCS language curr=NONE, other=DOCS language curr=NONE, other=DOCS where there is one line per segment. It logs the exception whenever other=DOCS. Subset with segment info: segment _x8(8.2.0):c31753/-1:[diagnostics={timestamp=1565623850605, lucene.version=8.2.0, java.vm.version=11.0.3+7, java.version=11.0.3, mergeMaxNumSegments=-1, os.version=3.1.0-1.2-desktop, java.vendor=AdoptOpenJDK, source=merge, os.arch=amd64, mergeFactor=10, java.runtime.version=11.0.3+7, os=Linux}]:[attributes=\{Lucene50StoredFieldsFormat.mode=BEST_SPEED}] language curr=NONE, other=NONE segment _y9(8.7.0):c43531/-1:[diagnostics={timestamp=1604597581562, lucene.version=8.7.0, java.vm.version=11.0.3+7, java.version=11.0.3, mergeMaxNumSegments=-1, os.version=3.1.0-1.2-desktop, java.vendor=AdoptOpenJDK, source=merge, os.arch=amd64, mergeFactor=10, java.runtime.version=11.0.3+7, os=Linux}]:[attributes=\{Lucene87StoredFieldsFormat.mode=BEST_SPEED}] language curr=NONE, other=DOCS NOT throwing java.lang.IllegalArgumentException: cannot change field "language" from index options=NONE to inconsistent index options=DOCS -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues
msokolov commented on a change in pull request #534: URL: https://github.com/apache/lucene/pull/534#discussion_r768748046 ## File path: lucene/core/src/java/org/apache/lucene/index/VectorValuesWriter.java ## @@ -109,11 +110,15 @@ private void updateBytesUsed() { public void flush(Sorter.DocMap sortMap, KnnVectorsWriter knnVectorsWriter) throws IOException { VectorValues vectorValues = new BufferedVectorValues(docsWithField, vectors, fieldInfo.getVectorDimension()); -if (sortMap != null) { - knnVectorsWriter.writeField(fieldInfo, new SortingVectorValues(vectorValues, sortMap)); -} else { - knnVectorsWriter.writeField(fieldInfo, vectorValues); -} +KnnVectorsReader vectorsReader = +new EmptyKnnVectorsReader() { + @Override + public VectorValues getVectorValues(String field) throws IOException { +return sortMap != null ? new SortingVectorValues(vectorValues, sortMap) : vectorValues; Review comment: Oh, good catch! I think with the current usage it probably never matters since we only call this once and dispose of the reader, but it would be a trap waiting for some future usage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox
jpountz commented on pull request #446: URL: https://github.com/apache/lucene/pull/446#issuecomment-993590295 Why do we need to exclude small segments from regular merges? E.g. if a merge-on-flush merges small segments and the resulting segment is small too, we don't want to exclude it from regular merges, do we? Because it creates merges manually, this merge policy cannot take advantage of some specifics of the wrapped merge policy like the merge factor. I wonder if a better implementation would consist of calling `findMerges` and filter out merges that are too large? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)
[ https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459143#comment-17459143 ] Dawid Weiss commented on LUCENE-10313: -- I can take a look at this, if you don't have time, [~tomoko]. I need something more stimulating than log4j upgrades that seem to happen daily now... > Remove log4j from dependencies and switch to java logging (in luke) > --- > > Key: LUCENE-10313 > URL: https://issues.apache.org/jira/browse/LUCENE-10313 > Project: Lucene - Core > Issue Type: Task > Components: luke >Reporter: Tomoko Uchida >Priority: Major > > This seems to be a simpler solution at this moment > https://issues.apache.org/jira/browse/LUCENE-10303 > https://issues.apache.org/jira/browse/LUCENE-10308 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)
[ https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459137#comment-17459137 ] Dawid Weiss commented on LUCENE-10313: -- There is just a handful of files actually logging something... Maybe all of it could be even replaced with a luke-specific logging facade that would keep a window of logs for the text area and emit the rest via java.util.logging (without even altering any default configuration/ appenders/ etc.). > Remove log4j from dependencies and switch to java logging (in luke) > --- > > Key: LUCENE-10313 > URL: https://issues.apache.org/jira/browse/LUCENE-10313 > Project: Lucene - Core > Issue Type: Task > Components: luke >Reporter: Tomoko Uchida >Priority: Major > > This seems to be a simpler solution at this moment > https://issues.apache.org/jira/browse/LUCENE-10303 > https://issues.apache.org/jira/browse/LUCENE-10308 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.16.0
[ https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459132#comment-17459132 ] Dawid Weiss commented on LUCENE-10303: -- I don't know how difficult it would be, Tomoko. I don't think it'd be very hard. The problem with a single log4j sink in the home folder is that it's one file - if you run two Luke instances (for example, to compare indexes) one overwrites another. I'd rather have the persistent log dumped to the console. To me, it'd be more convenient than trying to look up where that log actually is. > Upgrade log4j to 2.16.0 > --- > > Key: LUCENE-10303 > URL: https://issues.apache.org/jira/browse/LUCENE-10303 > Project: Lucene - Core > Issue Type: Task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10303.patch > > > CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker > controlled LDAP and other JNDI related endpoints. > Versions Affected: all versions from 2.0-beta9 to 2.14.1 > [https://logging.apache.org/log4j/2.x/security.html] > > Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile > the versions.props is shared by all subprojects, it may be better to upgrade > to 2.15.0 I think. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)
[ https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10313: --- Summary: Remove log4j from dependencies and switch to java logging (in luke) (was: Remove log4j from dependencies and switch to java logging) > Remove log4j from dependencies and switch to java logging (in luke) > --- > > Key: LUCENE-10313 > URL: https://issues.apache.org/jira/browse/LUCENE-10313 > Project: Lucene - Core > Issue Type: Task > Components: luke >Reporter: Tomoko Uchida >Priority: Major > > This seems to be a simpler solution at this moment > https://issues.apache.org/jira/browse/LUCENE-10303 > https://issues.apache.org/jira/browse/LUCENE-10308 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10313) Remove log4j from dependencies and switch to java logging
Tomoko Uchida created LUCENE-10313: -- Summary: Remove log4j from dependencies and switch to java logging Key: LUCENE-10313 URL: https://issues.apache.org/jira/browse/LUCENE-10313 Project: Lucene - Core Issue Type: Task Components: luke Reporter: Tomoko Uchida This seems to be a simpler solution at this moment https://issues.apache.org/jira/browse/LUCENE-10303 https://issues.apache.org/jira/browse/LUCENE-10308 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.16.0
[ https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459120#comment-17459120 ] Tomoko Uchida commented on LUCENE-10303: Luke uses log4j mainly because I have been accustomed to it, and have never used java logging. The logger has two appenders - one for a file handler and another for a text area component named "Logs" tab. If this configuration can be seamlessly ported to java logging (I could write a custom log handler) there would not be any problems with switching the logging framework. Or we probably should remove the fancy TextArea appender - though if possible, I'd like to keep this for the convenience of daily use. > Upgrade log4j to 2.16.0 > --- > > Key: LUCENE-10303 > URL: https://issues.apache.org/jira/browse/LUCENE-10303 > Project: Lucene - Core > Issue Type: Task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10303.patch > > > CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker > controlled LDAP and other JNDI related endpoints. > Versions Affected: all versions from 2.0-beta9 to 2.14.1 > [https://logging.apache.org/log4j/2.x/security.html] > > Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile > the versions.props is shared by all subprojects, it may be better to upgrade > to 2.15.0 I think. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths
[ https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459118#comment-17459118 ] Dawid Weiss commented on LUCENE-10308: -- Thanks, I added it (again). Bugzilla is confusing like hell. > Make ecj and javadoc run with modular paths > --- > > Key: LUCENE-10308 > URL: https://issues.apache.org/jira/browse/LUCENE-10308 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Attachments: repro.zip > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths
[ https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459115#comment-17459115 ] Uwe Schindler commented on LUCENE-10308: I don't see an attachment on the ECJ bug. > Make ecj and javadoc run with modular paths > --- > > Key: LUCENE-10308 > URL: https://issues.apache.org/jira/browse/LUCENE-10308 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Attachments: repro.zip > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths
[ https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459100#comment-17459100 ] Dawid Weiss commented on LUCENE-10308: -- > The issue here is a bug in ecj. I know. But it doesn't help me much, does it? :) > Make ecj and javadoc run with modular paths > --- > > Key: LUCENE-10308 > URL: https://issues.apache.org/jira/browse/LUCENE-10308 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Attachments: repro.zip > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths
[ https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459096#comment-17459096 ] Dawid Weiss commented on LUCENE-10308: -- Filed a bug to ECJ. https://bugs.eclipse.org/bugs/show_bug.cgi?id=577790 > Make ecj and javadoc run with modular paths > --- > > Key: LUCENE-10308 > URL: https://issues.apache.org/jira/browse/LUCENE-10308 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Attachments: repro.zip > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths
[ https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459093#comment-17459093 ] Uwe Schindler commented on LUCENE-10308: That's explicitly wanted by log4j, because it makes the module info file only visible to Java 9+ code. It was added because of complaints by Users where classpath scanners and IDEs broke when finding the file. We had the discussion on the openjdk committer meeting on Fosdem 2 years ago with Maven people and the result was that it's the recommendation to only add Module-Info.class in the MR part of Jar file for best compatibility. Maven does this by default I think. The issue here is a bug in ecj. > Make ecj and javadoc run with modular paths > --- > > Key: LUCENE-10308 > URL: https://issues.apache.org/jira/browse/LUCENE-10308 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Attachments: repro.zip > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10308) Make ecj and javadoc run with modular paths
[ https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10308: - Attachment: repro.zip > Make ecj and javadoc run with modular paths > --- > > Key: LUCENE-10308 > URL: https://issues.apache.org/jira/browse/LUCENE-10308 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Attachments: repro.zip > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths
[ https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459087#comment-17459087 ] Dawid Weiss commented on LUCENE-10308: -- The root cause of this is ECJ failing to parse log4j JAR as a module because log4j is a multi-release JAR and has the module descriptor inside META-INF/ mr folder for Java 9+. > Make ecj and javadoc run with modular paths > --- > > Key: LUCENE-10308 > URL: https://issues.apache.org/jira/browse/LUCENE-10308 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.16.0
[ https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459075#comment-17459075 ] Dawid Weiss commented on LUCENE-10303: -- I remember looking at how Luke configures logs (to user's home folder) and thinking whether this is really necessary/ correct. Perhaps java logging would be entirely sufficient for Luke's needs, eventually? > Upgrade log4j to 2.16.0 > --- > > Key: LUCENE-10303 > URL: https://issues.apache.org/jira/browse/LUCENE-10303 > Project: Lucene - Core > Issue Type: Task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10303.patch > > > CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker > controlled LDAP and other JNDI related endpoints. > Versions Affected: all versions from 2.0-beta9 to 2.14.1 > [https://logging.apache.org/log4j/2.x/security.html] > > Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile > the versions.props is shared by all subprojects, it may be better to upgrade > to 2.15.0 I think. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy merged pull request #2632: SOLR-15848 BadApple failing tests in branch_8_11
janhoy merged pull request #2632: URL: https://github.com/apache/lucene-solr/pull/2632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.16.0
[ https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459049#comment-17459049 ] Tomoko Uchida commented on LUCENE-10303: {quote}We should update to 2.16.0 (came put today) in all active branches. Please aso change the changelog entry, no new issue please! {quote} I'll update it. Let me wait for a while (to make sure there is no further minor update on it). {quote}No patch release needed for Lucene 9.0, as there's no remote access to Luke. I am not sure about Lucene replicator, is it used there, too? {quote} Only Luke uses log4j. No other module does not depend on it, I grepped the entire source. > Upgrade log4j to 2.16.0 > --- > > Key: LUCENE-10303 > URL: https://issues.apache.org/jira/browse/LUCENE-10303 > Project: Lucene - Core > Issue Type: Task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10303.patch > > > CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker > controlled LDAP and other JNDI related endpoints. > Versions Affected: all versions from 2.0-beta9 to 2.14.1 > [https://logging.apache.org/log4j/2.x/security.html] > > Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile > the versions.props is shared by all subprojects, it may be better to upgrade > to 2.15.0 I think. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #2632: SOLR-15848 BadApple failing tests in branch_8_11
janhoy opened a new pull request #2632: URL: https://github.com/apache/lucene-solr/pull/2632 https://issues.apache.org/jira/browse/SOLR-15848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10085) Implement Weight#count on DocValuesFieldExistsQuery
[ https://issues.apache.org/jira/browse/LUCENE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459021#comment-17459021 ] ASF subversion and git services commented on LUCENE-10085: -- Commit 352a6b68f0dfc6847e81bcd41a4c66b86494e2b4 in lucene's branch refs/heads/branch_9x from Quentin Pradet [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=352a6b6 ] LUCENE-10085: Fix flaky testQueryMatchesCount (#538) Five times every 10 000 tests, we did not index any documents with i between 0 and 10 (inclusive), which caused the deleted tests to fail. With this commit, we make sure that we always index at least one document between 0 and 10. > Implement Weight#count on DocValuesFieldExistsQuery > --- > > Key: LUCENE-10085 > URL: https://issues.apache.org/jira/browse/LUCENE-10085 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Now that we require all documents to use the same features (LUCENE-9334) we > could implement {{Weight#count}} to return docCount if either terms or points > are indexed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #538: LUCENE-10085: Fix flaky testQueryMatchesCount
iverase commented on pull request #538: URL: https://github.com/apache/lucene/pull/538#issuecomment-993362650 thanks @pquentin! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10085) Implement Weight#count on DocValuesFieldExistsQuery
[ https://issues.apache.org/jira/browse/LUCENE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459019#comment-17459019 ] ASF subversion and git services commented on LUCENE-10085: -- Commit 9974f6ac34ac2f17bfcdf30d6df79476579ff1e0 in lucene's branch refs/heads/main from Quentin Pradet [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9974f6a ] LUCENE-10085: Fix flaky testQueryMatchesCount (#538) Five times every 10 000 tests, we did not index any documents with i between 0 and 10 (inclusive), which caused the deleted tests to fail. With this commit, we make sure that we always index at least one document between 0 and 10. > Implement Weight#count on DocValuesFieldExistsQuery > --- > > Key: LUCENE-10085 > URL: https://issues.apache.org/jira/browse/LUCENE-10085 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 4h > Remaining Estimate: 0h > > Now that we require all documents to use the same features (LUCENE-9334) we > could implement {{Weight#count}} to return docCount if either terms or points > are indexed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase merged pull request #538: LUCENE-10085: Fix flaky testQueryMatchesCount
iverase merged pull request #538: URL: https://github.com/apache/lucene/pull/538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.16.0
[ https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459016#comment-17459016 ] Uwe Schindler commented on LUCENE-10303: Please aso change the changelog entry, no new issue please! > Upgrade log4j to 2.16.0 > --- > > Key: LUCENE-10303 > URL: https://issues.apache.org/jira/browse/LUCENE-10303 > Project: Lucene - Core > Issue Type: Task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10303.patch > > > CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker > controlled LDAP and other JNDI related endpoints. > Versions Affected: all versions from 2.0-beta9 to 2.14.1 > [https://logging.apache.org/log4j/2.x/security.html] > > Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile > the versions.props is shared by all subprojects, it may be better to upgrade > to 2.15.0 I think. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10303) Upgrade log4j to 2.15.0
[ https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459015#comment-17459015 ] Uwe Schindler edited comment on LUCENE-10303 at 12/14/21, 9:43 AM: --- No patch release needed for Lucene 9.0, as there's no remote access to Luke. I am not sure about Lucene replicator, is it used there, too? was (Author: thetaphi): No patch release needed for Lucene 9.0, as there's no idde for Luke. I am not sure about Lucene replicator, is it used there, too? > Upgrade log4j to 2.15.0 > --- > > Key: LUCENE-10303 > URL: https://issues.apache.org/jira/browse/LUCENE-10303 > Project: Lucene - Core > Issue Type: Task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10303.patch > > > CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker > controlled LDAP and other JNDI related endpoints. > Versions Affected: all versions from 2.0-beta9 to 2.14.1 > [https://logging.apache.org/log4j/2.x/security.html] > > Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile > the versions.props is shared by all subprojects, it may be better to upgrade > to 2.15.0 I think. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10303) Upgrade log4j to 2.16.0
[ https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10303: --- Summary: Upgrade log4j to 2.16.0 (was: Upgrade log4j to 2.15.0) > Upgrade log4j to 2.16.0 > --- > > Key: LUCENE-10303 > URL: https://issues.apache.org/jira/browse/LUCENE-10303 > Project: Lucene - Core > Issue Type: Task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10303.patch > > > CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker > controlled LDAP and other JNDI related endpoints. > Versions Affected: all versions from 2.0-beta9 to 2.14.1 > [https://logging.apache.org/log4j/2.x/security.html] > > Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile > the versions.props is shared by all subprojects, it may be better to upgrade > to 2.15.0 I think. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.15.0
[ https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459015#comment-17459015 ] Uwe Schindler commented on LUCENE-10303: No patch release needed for Lucene 9.0, as there's no idde for Luke. I am not sure about Lucene replicator, is it used there, too? > Upgrade log4j to 2.15.0 > --- > > Key: LUCENE-10303 > URL: https://issues.apache.org/jira/browse/LUCENE-10303 > Project: Lucene - Core > Issue Type: Task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10303.patch > > > CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker > controlled LDAP and other JNDI related endpoints. > Versions Affected: all versions from 2.0-beta9 to 2.14.1 > [https://logging.apache.org/log4j/2.x/security.html] > > Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile > the versions.props is shared by all subprojects, it may be better to upgrade > to 2.15.0 I think. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-10303) Upgrade log4j to 2.15.0
[ https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened LUCENE-10303: We should update to 2.16.0 (came put today) in all active branches. > Upgrade log4j to 2.15.0 > --- > > Key: LUCENE-10303 > URL: https://issues.apache.org/jira/browse/LUCENE-10303 > Project: Lucene - Core > Issue Type: Task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10303.patch > > > CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker > controlled LDAP and other JNDI related endpoints. > Versions Affected: all versions from 2.0-beta9 to 2.14.1 > [https://logging.apache.org/log4j/2.x/security.html] > > Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile > the versions.props is shared by all subprojects, it may be better to upgrade > to 2.15.0 I think. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on pull request #2631: SOLR-15843 Upgrade log4j from 2.15 to 2.16
uschindler commented on pull request #2631: URL: https://github.com/apache/lucene-solr/pull/2631#issuecomment-993354358 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] raminmjj opened a new pull request #540: LUCENE-10312: Add PersianStemmer
raminmjj opened a new pull request #540: URL: https://github.com/apache/lucene/pull/540 - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [ ] I have run `./gradlew check`. - [x] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10312) Add PersianStemmer
Ramin Alirezaee created LUCENE-10312: Summary: Add PersianStemmer Key: LUCENE-10312 URL: https://issues.apache.org/jira/browse/LUCENE-10312 Project: Lucene - Core Issue Type: Wish Components: modules/analysis Affects Versions: 9.0 Reporter: Ramin Alirezaee -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] codaitya commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox
codaitya commented on pull request #446: URL: https://github.com/apache/lucene/pull/446#issuecomment-993311631 Thanks for the explanation Mikes !. I have updated the PR to leave out variable max segment size . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field
jpountz commented on a change in pull request #539: URL: https://github.com/apache/lucene/pull/539#discussion_r768421436 ## File path: lucene/core/src/test/org/apache/lucene/codecs/TestMinimalCodec.java ## @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs; + +import static com.carrotsearch.randomizedtesting.RandomizedTest.randomBoolean; + +import java.io.IOException; +import org.apache.lucene.analysis.MockAnalyzer; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.StoredField; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.search.MatchAllDocsQuery; +import org.apache.lucene.store.BaseDirectoryWrapper; +import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.util.TestUtil; + +/** + * Tests to ensure that {@link Codec}s won't need to implement all formats in case where only a + * small subset of Lucene's functionality is used. + */ +public class TestMinimalCodec extends LuceneTestCase { + + public void testMinimalCodec() throws IOException { +runMinimalCodecTest(false, false); +runMinimalCodecTest(false, true); +runMinimalCodecTest(true, true); +runMinimalCodecTest(true, false); + } + + private void runMinimalCodecTest(boolean useCompoundFile, boolean useDeletes) throws IOException { +try (BaseDirectoryWrapper dir = newDirectory()) { + dir.setCheckIndexOnClose(false); // MinimalCodec is not registered with SPI Review comment: should we register these 4 codecs via SPI? ## File path: lucene/core/src/test/org/apache/lucene/codecs/TestMinimalCodec.java ## @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs; + +import static com.carrotsearch.randomizedtesting.RandomizedTest.randomBoolean; + +import java.io.IOException; +import org.apache.lucene.analysis.MockAnalyzer; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.StoredField; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.search.MatchAllDocsQuery; +import org.apache.lucene.store.BaseDirectoryWrapper; +import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.util.TestUtil; + +/** + * Tests to ensure that {@link Codec}s won't need to implement all formats in case where only a + * small subset of Lucene's functionality is used. + */ +public class TestMinimalCodec extends LuceneTestCase { + + public void testMinimalCodec() throws IOException { +runMinimalCodecTest(false, false); +runMinimalCodecTest(false, true); +runMinimalCodecTest(true, true); +runMinimalCodecTest(true, false); + } + + private void runMinimalCodecTest(boolean useCompoundFile, boolean useDeletes) throws IOException { +try (BaseDirectoryWrapper dir = newDirectory()) { + dir.setCheckIndexOnClose(false); // MinimalCodec is not registered with SPI + + IndexWriterConfig writerConfig = + newIndexWriterConfig(new MockAnalyzer(random())) + .setCodec(new MinimalCodec(useCompoundFile, useDeletes)) + .setUseCompoundFile(useCompoundFile); + if (!useCompoundFile) { +
[GitHub] [lucene] ywelsch commented on a change in pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field
ywelsch commented on a change in pull request #539: URL: https://github.com/apache/lucene/pull/539#discussion_r768409970 ## File path: lucene/core/src/java/org/apache/lucene/index/SegmentCommitInfo.java ## @@ -244,7 +244,9 @@ public long sizeInBytes() throws IOException { // updates) and then maybe even be able to remove LiveDocsFormat.files(). // Must separately add any live docs files: -info.getCodec().liveDocsFormat().files(this, files); +if (hasDeletions()) { + info.getCodec().liveDocsFormat().files(this, files); Review comment: This now allows for a minimal codec that does not have a LiveDocsFormat when no deletes are being used. It was a tiny change to make, but perhaps out of scope for this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues
jpountz commented on a change in pull request #534: URL: https://github.com/apache/lucene/pull/534#discussion_r768405655 ## File path: lucene/core/src/java/org/apache/lucene/index/EmptyKnnVectorsReader.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.index; + +import java.io.IOException; +import org.apache.lucene.codecs.KnnVectorsReader; +import org.apache.lucene.search.TopDocs; +import org.apache.lucene.util.Bits; + +/** Abstract base class implementing a {@link KnnVectorsReader} that has no vector values. */ +public abstract class EmptyKnnVectorsReader extends KnnVectorsReader { Review comment: We do it for doc values because doc values support 5 different types (numeric, sorted numeric, sorted, sorted set, binary) and the empty producer helps only implement the doc values type that we care about. Since there is a single type of vectors, I don't think we need this empty producer, let's remove it and extend KnnVectorsReader directly? ## File path: lucene/core/src/java/org/apache/lucene/index/VectorValuesWriter.java ## @@ -109,11 +110,15 @@ private void updateBytesUsed() { public void flush(Sorter.DocMap sortMap, KnnVectorsWriter knnVectorsWriter) throws IOException { VectorValues vectorValues = new BufferedVectorValues(docsWithField, vectors, fieldInfo.getVectorDimension()); -if (sortMap != null) { - knnVectorsWriter.writeField(fieldInfo, new SortingVectorValues(vectorValues, sortMap)); -} else { - knnVectorsWriter.writeField(fieldInfo, vectorValues); -} +KnnVectorsReader vectorsReader = +new EmptyKnnVectorsReader() { + @Override + public VectorValues getVectorValues(String field) throws IOException { +return sortMap != null ? new SortingVectorValues(vectorValues, sortMap) : vectorValues; Review comment: This is incorrect as it would return the same instance every time it is called. We should instantiate a new BufferedVectorValues instance every time this method is called. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ywelsch commented on a change in pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field
ywelsch commented on a change in pull request #539: URL: https://github.com/apache/lucene/pull/539#discussion_r768407647 ## File path: lucene/core/src/java/org/apache/lucene/index/FieldInfos.java ## @@ -200,6 +204,11 @@ public boolean hasFreq() { return hasFreq; } + /** Returns true if any fields are indexed */ + public boolean hasIndexed() { Review comment: I wasn't sure about the naming here. Perhaps `hasPostings` is a better name? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ywelsch opened a new pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field
ywelsch opened a new pull request #539: URL: https://github.com/apache/lucene/pull/539 # Description Unlike points, norms, term vectors or doc values which only get written to the directory when at least one of the fields uses the data structure, postings always get written to the directory. While this isn't hurting much, it can be surprising at times, e.g. if you index with SimpleText you will have a file for postings even though none of the fields indexes postings. This inconsistency is hidden with the default codec due to the fact that it uses PerFieldPostingsFormat, which only delegates to any of the per-field codecs if any of the fields is actually indexed, so you don't actually get a file if none of the fields is indexed. # Solution Fixes the situation by making read / write of postings conditional on whether there is any data indexed. This can be determined from the metadata in FieldInfos. # Tests Added new test `TestMinimalCodec`. Passed existing tests via ./gradlew clean; ./gradlew check # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10303) Upgrade log4j to 2.15.0
[ https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458965#comment-17458965 ] Tomoko Uchida edited comment on LUCENE-10303 at 12/14/21, 8:06 AM: --- It seems the concerns and worries on this CVE grow bigger and bigger. I think we don't need a security update/patch release (9.0.1) for this since there is no substantial concern in terms of the Luke app (and just replacing the log4j jars ("api" and "core") with the latest ones is a perfectly fine solution if there is anyone who needs it). ? Though... Please let me know if it's required. was (Author: tomoko uchida): It seems the concerns and worries on this CVE grow bigger and bigger. I think we don't need a security update/patch for this since there is no substantial concern in terms of the Luke app (and just replacing the log4j jars ("api" and "core") with the latest ones is a perfectly fine solution if there is anyone who needs it). ? Though... Please let me know if it's required. > Upgrade log4j to 2.15.0 > --- > > Key: LUCENE-10303 > URL: https://issues.apache.org/jira/browse/LUCENE-10303 > Project: Lucene - Core > Issue Type: Task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: 9.1, 10.0 (main) > > Attachments: LUCENE-10303.patch > > > CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker > controlled LDAP and other JNDI related endpoints. > Versions Affected: all versions from 2.0-beta9 to 2.14.1 > [https://logging.apache.org/log4j/2.x/security.html] > > Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile > the versions.props is shared by all subprojects, it may be better to upgrade > to 2.15.0 I think. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org