date:20211214



gf2121 closed pull request #530:
URL: https://github.com/apache/lucene/pull/530


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459689#comment-17459689
 ] 

Feng Guo commented on LUCENE-10315:
---

The optimization can only be triggered when {{count == 
BKDConfig#DEFAULT_MAX_POINTS_IN_LEAF_NODE}}, This is fragile because users can 
customize the {{maxPointsInLeaf}} in the Codec, leading the optimization 
meaningless. Here are some ways i can think of to address this:

1. Directly drop the support of customizing {{maxPointsInLeaf}}, like what we 
do in postings.
2. Generate a series of ForUtils, like {{ForUitil128}}, {{ForUitil256}}, 
{{ForUitil512}}, {{ForUtil1024}} ... and make some notes to hint users to 
choose values from them.


> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I 
> benchmarked this optimization by mocking some random LongPoint and querying 
> them with PointInSetQuery.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
> percentage|
> |1|32|1|51.44|148.26|188.22%|
> |1|32|2|26.8|101.88|280.15%|
> |1|32|4|14.04|53.52|281.20%|
> |1|32|8|7.04|28.54|305.40%|
> |1|32|16|3.54|14.61|312.71%|
> |1|128|1|110.56|350.26|216.81%|
> |1|128|8|16.6|89.81|441.02%|
> |1|128|16|8.45|48.07|468.88%|
> |1|128|32|4.2|25.35|503.57%|
> |1|128|64|2.13|13.02|511.27%|
> |1|1024|1|536.19|843.88|57.38%|
> |1|1024|8|109.71|251.89|129.60%|
> |1|1024|32|33.24|104.11|213.21%|
> |1|1024|128|8.87|30.47|243.52%|
> |1|1024|512|2.24|8.3|270.54%|
> |1|8192|1|.33|5000|50.00%|
> |1|8192|32|139.47|214.59|53.86%|
> |1|8192|128|54.59|109.23|100.09%|
> |1|8192|512|15.61|36.15|131.58%|
> |1|8192|2048|4.11|11.14|171.05%|
> |1|1048576|1|2597.4|3030.3|16.67%|
> |1|1048576|32|314.96|371.75|18.03%|
> |1|1048576|128|99.7|116.28|16.63%|
> |1|1048576|512|30.5|37.15|21.80%|
> |1|1048576|2048|10.38|12.3|18.50%|
> |1|8388608|1|2564.1|3174.6|23.81%|
> |1|8388608|32|196.27|238.95|21.75%|
> |1|8388608|128|55.36|68.03|22.89%|
> |1|8388608|512|15.58|19.24|23.49%|
> |1|8388608|2048|4.56|5.71|25.22%|
> The indices size is reduced for low cardinality fields and flat for high 
> cardinality fields.  
> {code:java}
> 113Mindex_1_doc_32_cardinality_baseline
> 114Mindex_1_doc_32_cardinality_candidate
> 140Mindex_1_doc_128_cardinality_baseline
> 133Mindex_1_doc_128_cardinality_candidate
> 193Mindex_1_doc_1024_cardinality_baseline
> 174Mindex_1_doc_1024_cardinality_candidate
> 241Mindex_1_doc_8192_cardinality_baseline
> 233Mindex_1_doc_8192_cardinality_candidate
> 314Mindex_1_doc_1048576_cardinality_baseline
> 315Mindex_1_doc_1048576_cardinality_candidate
> 392Mindex_1_doc_8388608_cardinality_baseline
> 391Mindex_1_doc_8388608_cardinality_candidate
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil



 [ 
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10315:
--
Description: 
This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked 
this optimization by mocking some random LongPoint and querying them with 
PointInSetQuery.


*Benchmark Result*
|doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
percentage|
|1|32|1|51.44|148.26|188.22%|
|1|32|2|26.8|101.88|280.15%|
|1|32|4|14.04|53.52|281.20%|
|1|32|8|7.04|28.54|305.40%|
|1|32|16|3.54|14.61|312.71%|
|1|128|1|110.56|350.26|216.81%|
|1|128|8|16.6|89.81|441.02%|
|1|128|16|8.45|48.07|468.88%|
|1|128|32|4.2|25.35|503.57%|
|1|128|64|2.13|13.02|511.27%|
|1|1024|1|536.19|843.88|57.38%|
|1|1024|8|109.71|251.89|129.60%|
|1|1024|32|33.24|104.11|213.21%|
|1|1024|128|8.87|30.47|243.52%|
|1|1024|512|2.24|8.3|270.54%|
|1|8192|1|.33|5000|50.00%|
|1|8192|32|139.47|214.59|53.86%|
|1|8192|128|54.59|109.23|100.09%|
|1|8192|512|15.61|36.15|131.58%|
|1|8192|2048|4.11|11.14|171.05%|
|1|1048576|1|2597.4|3030.3|16.67%|
|1|1048576|32|314.96|371.75|18.03%|
|1|1048576|128|99.7|116.28|16.63%|
|1|1048576|512|30.5|37.15|21.80%|
|1|1048576|2048|10.38|12.3|18.50%|
|1|8388608|1|2564.1|3174.6|23.81%|
|1|8388608|32|196.27|238.95|21.75%|
|1|8388608|128|55.36|68.03|22.89%|
|1|8388608|512|15.58|19.24|23.49%|
|1|8388608|2048|4.56|5.71|25.22%|

The indices size is reduced for low cardinality fields and flat for high 
cardinality fields.  

{code:java}
113Mindex_1_doc_32_cardinality_baseline
114Mindex_1_doc_32_cardinality_candidate

140Mindex_1_doc_128_cardinality_baseline
133Mindex_1_doc_128_cardinality_candidate

193Mindex_1_doc_1024_cardinality_baseline
174Mindex_1_doc_1024_cardinality_candidate

241Mindex_1_doc_8192_cardinality_baseline
233Mindex_1_doc_8192_cardinality_candidate

314Mindex_1_doc_1048576_cardinality_baseline
315Mindex_1_doc_1048576_cardinality_candidate

392Mindex_1_doc_8388608_cardinality_baseline
391Mindex_1_doc_8388608_cardinality_candidate
{code}


  was:
This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked 
this optimization by mocking some random LongPoint and querying them with 
PointInSetQuery.


*Benchmark Result*
|doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
percentage|
|1|32|1|51.44|148.26|188.22%|
|1|32|2|26.8|101.88|280.15%|
|1|32|4|14.04|53.52|281.20%|
|1|32|8|7.04|28.54|305.40%|
|1|32|16|3.54|14.61|312.71%|
|1|128|1|110.56|350.26|216.81%|
|1|128|8|16.6|89.81|441.02%|
|1|128|16|8.45|48.07|468.88%|
|1|128|32|4.2|25.35|503.57%|
|1|128|64|2.13|13.02|511.27%|
|1|1024|1|536.19|843.88|57.38%|
|1|1024|8|109.71|251.89|129.60%|
|1|1024|32|33.24|104.11|213.21%|
|1|1024|128|8.87|30.47|243.52%|
|1|1024|512|2.24|8.3|270.54%|
|1|8192|1|.33|5000|50.00%|
|1|8192|32|139.47|214.59|53.86%|
|1|8192|128|54.59|109.23|100.09%|
|1|8192|512|15.61|36.15|131.58%|
|1|8192|2048|4.11|11.14|171.05%|
|1|1048576|1|2597.4|3030.3|16.67%|
|1|1048576|32|314.96|371.75|18.03%|
|1|1048576|128|99.7|116.28|16.63%|
|1|1048576|512|30.5|37.15|21.80%|
|1|1048576|2048|10.38|12.3|18.50%|
|1|8388608|1|2564.1|3174.6|23.81%|
|1|8388608|32|196.27|238.95|21.75%|
|1|8388608|128|55.36|68.03|22.89%|
|1|8388608|512|15.58|19.24|23.49%|
|1|8388608|2048|4.56|5.71|25.22%|

The indices size is reduced for low cardinality fields and flat for high 
cardinality fields.  

{code:java}
113Mindex_1_doc_32_cardinality_baseline
114Mindex_1_doc_32_cardinality_candidate

140Mindex_1_doc_128_cardinality_baseline
133Mindex_1_doc_128_cardinality_candidate

241Mindex_1_doc_8192_cardinality_baseline
233Mindex_1_doc_8192_cardinality_candidate

193Mindex_1_doc_1024_cardinality_baseline
174Mindex_1_doc_1024_cardinality_candidate

314Mindex_1_doc_1048576_cardinality_baseline
315Mindex_1_doc_1048576_cardinality_candidate

392Mindex_1_doc_8388608_cardinality_baseline
391Mindex_1_doc_8388608_cardinality_candidate
{code}



> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
>

[GitHub] [lucene] zacharymorn commented on pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues



zacharymorn commented on pull request #534:
URL: https://github.com/apache/lucene/pull/534#issuecomment-994295420


   Thanks @msokolov @jtibshirani @jpountz for the review and suggestions! I've 
pushed an update to address the feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues



zacharymorn commented on pull request #534:
URL: https://github.com/apache/lucene/pull/534#issuecomment-994295107


   > Maybe also and a CHANGELOG entry? I guess it's a somewhat internal API, 
being in codecs, but it's not marked @experimental, so if someone had 
implemented their own KnnVectorsWriter codec, they'd need to change it ...
   
   Just added an entry into `9.1.0`.  Shall I also add `@lucene.experimental` 
for it as well in case it needs to change again? But I don't know about this 
API enough to foresee that need...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues



zacharymorn commented on a change in pull request #534:
URL: https://github.com/apache/lucene/pull/534#discussion_r769243295



##
File path: lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java
##
@@ -40,7 +41,8 @@
   protected KnnVectorsWriter() {}
 
   /** Write all values contained in the provided reader */
-  public abstract void writeField(FieldInfo fieldInfo, VectorValues values) 
throws IOException;
+  public abstract void writeField(FieldInfo fieldInfo, KnnVectorsReader 
knnVectorReader)

Review comment:
   Good suggestion! I've updated them to use `knnVectorsReader`. 

##
File path: lucene/core/src/java/org/apache/lucene/index/VectorValuesWriter.java
##
@@ -109,11 +110,15 @@ private void updateBytesUsed() {
   public void flush(Sorter.DocMap sortMap, KnnVectorsWriter knnVectorsWriter) 
throws IOException {
 VectorValues vectorValues =
 new BufferedVectorValues(docsWithField, vectors, 
fieldInfo.getVectorDimension());
-if (sortMap != null) {
-  knnVectorsWriter.writeField(fieldInfo, new 
SortingVectorValues(vectorValues, sortMap));
-} else {
-  knnVectorsWriter.writeField(fieldInfo, vectorValues);
-}
+KnnVectorsReader vectorsReader =
+new EmptyKnnVectorsReader() {
+  @Override
+  public VectorValues getVectorValues(String field) throws IOException 
{
+return sortMap != null ? new SortingVectorValues(vectorValues, 
sortMap) : vectorValues;

Review comment:
   Ops sorry good catch! I've fixed it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues



zacharymorn commented on a change in pull request #534:
URL: https://github.com/apache/lucene/pull/534#discussion_r769243208



##
File path: 
lucene/core/src/java/org/apache/lucene/index/EmptyKnnVectorsReader.java
##
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.codecs.KnnVectorsReader;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.util.Bits;
+
+/** Abstract base class implementing a {@link KnnVectorsReader} that has no 
vector values. */
+public abstract class EmptyKnnVectorsReader extends KnnVectorsReader {

Review comment:
   Ah makes sense. Thanks for the context! I've removed it and extended 
directly. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical



mayya-sharipova commented on a change in pull request #416:
URL: https://github.com/apache/lucene/pull/416#discussion_r769229827



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##
@@ -205,6 +215,43 @@ private FieldEntry readField(DataInput input) throws 
IOException {
 return new FieldEntry(input, similarityFunction);
   }
 
+  private void fillGraphNodesAndOffsetsByLevel() throws IOException {
+for (FieldEntry entry : fields.values()) {
+  IndexInput input =

Review comment:
   > How would you like to proceed -- work on that PR first (since it seems 
useful on its own), or move forward with this one and follow-up with a fix soon 
after?
   
   I was under the impression that we are not happy with the current state of 
this PR and would not want to merge it without some changes. No?
   
   > To clarify, I was not thinking that GraphLevels would replace FieldEntry. 
It would be a second data structure.
   
   Can you please elaborate more how to do you see it is organized? How 
`GraphLevels` connected to a `FieldEntry`?  Do you suggest put GraphLevels into 
a separate file and load them on a first use? 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10175) Remove VectorValues#binaryValue?

2021-12-14 Thread Julie Tibshirani (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459636#comment-17459636
 ] 

Julie Tibshirani commented on LUCENE-10175:
---

+1 it'd be good to avoid exposing the binary format. I'm not aware of a way the 
method is helpful to external users, or necessary for features we want to build.

> Remove VectorValues#binaryValue?
> 
>
> Key: LUCENE-10175
> URL: https://issues.apache.org/jira/browse/LUCENE-10175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Major
>
> It's unclear to me why we have VectorValues#binaryValue. This exposes 
> implementation details that I'd rather like to avoid exposing in our 
> higher-level APIs such as the Directory byte order. And I worry that it might 
> be in the way of upcoming features like supporting bfloat16.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10191) Optimize vector functions by precomputing magnitudes

2021-12-14 Thread Julie Tibshirani (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459633#comment-17459633
 ] 

Julie Tibshirani commented on LUCENE-10191:
---

I've been pondering this more. It definitely seems possible to move 
VectorSimilarityFunction to Lucene90HnswVectorsFormat. Maybe we could start 
with something really simple, like a new method 
{{VectorValues#computeDistance(float[] query)}} that uses the configured 
distance function. I guess {{computeDistance}} could give a simple interface 
but do something fancy if it wanted to, since it knows how exactly its vectors 
are represented.

My main hesitation is that VectorSimilarityFunction is a cross-cutting concept 
that makes sense across format implementations. In fact, I would expect all 
KnnVectorsFormat to support dot product, euclidean, and cosine. Could there be 
drift across different formats (maybe a vector function is missing, or is named 
something different) in a way that hurts users?

> Optimize vector functions by precomputing magnitudes
> 
>
> Key: LUCENE-10191
> URL: https://issues.apache.org/jira/browse/LUCENE-10191
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Priority: Minor
>
> Both euclidean distance (L2 norm) and cosine similarity can be expressed in 
> terms of dot product and vector magnitudes:
>  * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2)
>  * cosine(a, b) = a . b / ||a|| ||b||
> We could compute and store each vector's magnitude upfront while indexing, 
> and compute the query vector's magnitude once per query. Then we'd calculate 
> the distance using our (very optimized) dot product method, plus the 
> precomputed values.
> This is an exploratory issue: I haven't tested this out yet, so I'm not sure 
> how much it would help. I would at least expect it to help with cosine 
> similarity – several months ago we tried out similar ideas in Elasticsearch 
> and were able to get a nice boost in cosine performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical



jtibshirani commented on a change in pull request #416:
URL: https://github.com/apache/lucene/pull/416#discussion_r769177664



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##
@@ -205,6 +215,43 @@ private FieldEntry readField(DataInput input) throws 
IOException {
 return new FieldEntry(input, similarityFunction);
   }
 
+  private void fillGraphNodesAndOffsetsByLevel() throws IOException {
+for (FieldEntry entry : fields.values()) {
+  IndexInput input =

Review comment:
   I like this idea. How would you like to proceed -- work on that PR first 
(since it seems useful on its own), or move forward with this one and follow-up 
with a fix soon after?
   
   To clarify, I was not thinking that `GraphLevels` would replace 
`FieldEntry`. It would be a second data structure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on pull request #2624: SOLR-15833



madrob commented on pull request #2624:
URL: https://github.com/apache/lucene-solr/pull/2624#issuecomment-994157346


   I force pushed to my fork


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zhaih opened a new pull request #542: LUCENE-10316: fix TestLRUQueryCache.testCachingAccountableQuery failure



zhaih opened a new pull request #542:
URL: https://github.com/apache/lucene/pull/542


   
   
   
   # Description & Solution
   
   Please see https://issues.apache.org/jira/browse/LUCENE-10316
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code 
conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [ ] ~~I have added tests for my changes~~ (Changing a test)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10316) fix TestLRUQueryCache.testCachingAccountableQuery failure

2021-12-14 Thread Haoyu Zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haoyu Zhai updated LUCENE-10316:

Description: 
I saw this build failure: 
[https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/348/]
with following stack trace
{code:java}
java.lang.AssertionError: expected:<130.0> but was:<1544976.0>
at 
__randomizedtesting.SeedInfo.seed([F7826B1EB37D545A:995B6ED46A95D1A0]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:577)
at org.junit.Assert.assertEquals(Assert.java:701)
at 
org.apache.lucene.search.TestLRUQueryCache.testCachingAccountableQuery(TestLRUQueryCache.java:570)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
...
NOTE: reproduce with: gradlew test --tests 
TestLRUQueryCache.testCachingAccountableQuery -Dtests.seed=F7826B1EB37D545A 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ckb-IR 
-Dtests.timezone=Africa/Dakar -Dtests.asserts=true -Dtests.file.encoding=UTF-8 
{code}
It does not reproduce on my laptop on current main branch, but since the test 
is comparing an estimation with a 10% slack, it can fail for sure sometime.

  was:
I saw this build failure: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/348/
with following stack trace
{code:java}
java.lang.AssertionError: expected:<130.0> but was:<1544976.0>
at 
__randomizedtesting.SeedInfo.seed([F7826B1EB37D545A:995B6ED46A95D1A0]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:577)
at org.junit.Assert.assertEquals(Assert.java:701)
at 
org.apache.lucene.search.TestLRUQueryCache.testCachingAccountableQuery(TestLRUQueryCache.java:570)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
...
{code}
It does not reproduce on my laptop on current main branch, but since the test 
is comparing an estimation with a 10% slack, it can fail for sure sometime.


> fix TestLRUQueryCache.testCachingAccountableQuery failure
> -
>
> Key: LUCENE-10316
> URL: https://issues.apache.org/jira/browse/LUCENE-10316
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Haoyu Zhai
>Priority: Minor
>
> I saw this build failure: 
> [https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/348/]
> with following stack trace
> {code:java}
> java.lang.AssertionError: expected:<130.0> but was:<1544976.0>
>   at 
> __randomizedtesting.SeedInfo.seed([F7826B1EB37D545A:995B6ED46A95D1A0]:0)
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:577)
>   at org.junit.Assert.assertEquals(Assert.java:701)
>   at 
> org.apache.lucene.search.TestLRUQueryCache.testCachingAccountableQuery(TestLRUQueryCache.java:570)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at

[jira] [Commented] (LUCENE-10316) fix TestLRUQueryCache.testCachingAccountableQuery failure

2021-12-14 Thread Haoyu Zhai (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459420#comment-17459420
 ] 

Haoyu Zhai commented on LUCENE-10316:
-

So basically the test is about making sure the query cache has the right 
estimation when the query has implemented the {{Accountable}} interface. 

When I originally wrote it I estimated the query cache size using {{(query_size 
+ linked_hash_map_entry_size) * query_num}} with 10% slack to allow the error 
of estimation. But apparently it is not enough sometimes (probably larger 
number of cache entries will waste more?).

Given the aim of the test is make sure when there're known big queries being 
cached the query cache reflect it correctly, I think we could change that to 
{{assert(query_cache_size > sum_of_all_queries_cached)}}. Then we won't depend 
on a slack to assert the correctness.

> fix TestLRUQueryCache.testCachingAccountableQuery failure
> -
>
> Key: LUCENE-10316
> URL: https://issues.apache.org/jira/browse/LUCENE-10316
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Haoyu Zhai
>Priority: Minor
>
> I saw this build failure: 
> https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/348/
> with following stack trace
> {code:java}
> java.lang.AssertionError: expected:<130.0> but was:<1544976.0>
>   at 
> __randomizedtesting.SeedInfo.seed([F7826B1EB37D545A:995B6ED46A95D1A0]:0)
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:577)
>   at org.junit.Assert.assertEquals(Assert.java:701)
>   at 
> org.apache.lucene.search.TestLRUQueryCache.testCachingAccountableQuery(TestLRUQueryCache.java:570)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> ...
> {code}
> It does not reproduce on my laptop on current main branch, but since the test 
> is comparing an estimation with a 10% slack, it can fail for sure sometime.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10316) fix TestLRUQueryCache.testCachingAccountableQuery failure

2021-12-14 Thread Haoyu Zhai (Jira)

Haoyu Zhai created LUCENE-10316:
---

 Summary: fix TestLRUQueryCache.testCachingAccountableQuery failure
 Key: LUCENE-10316
 URL: https://issues.apache.org/jira/browse/LUCENE-10316
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Haoyu Zhai


I saw this build failure: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/348/
with following stack trace
{code:java}
java.lang.AssertionError: expected:<130.0> but was:<1544976.0>
at 
__randomizedtesting.SeedInfo.seed([F7826B1EB37D545A:995B6ED46A95D1A0]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:577)
at org.junit.Assert.assertEquals(Assert.java:701)
at 
org.apache.lucene.search.TestLRUQueryCache.testCachingAccountableQuery(TestLRUQueryCache.java:570)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
...
{code}
It does not reproduce on my laptop on current main branch, but since the test 
is comparing an estimation with a 10% slack, it can fail for sure sometime.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gf2121 opened a new pull request #541: LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil



gf2121 opened a new pull request #541:
URL: https://github.com/apache/lucene/pull/541


   This approach tried to use a 512 ints ForUtil for BKD ids codec. I 
benchmarked this optimization by mocking some random LongPoint and querying 
them with PointInSetQuery.
   
   **Benchmark Result**
   http://www.w3.org/TR/REC-html40;>
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   doc count | field   cardinality | query   point | baseline   QPS | candidate 
  QPS | diff   percentage
   -- | -- | -- | -- | -- | --
   1 | 32 | 1 | 51.44 | 148.26 | 188.22%
   1 | 32 | 2 | 26.8 | 101.88 | 280.15%
   1 | 32 | 4 | 14.04 | 53.52 | 281.20%
   1 | 32 | 8 | 7.04 | 28.54 | 305.40%
   1 | 32 | 16 | 3.54 | 14.61 | 312.71%
   1 | 128 | 1 | 110.56 | 350.26 | 216.81%
   1 | 128 | 8 | 16.6 | 89.81 | 441.02%
   1 | 128 | 16 | 8.45 | 48.07 | 468.88%
   1 | 128 | 32 | 4.2 | 25.35 | 503.57%
   1 | 128 | 64 | 2.13 | 13.02 | 511.27%
   1 | 1024 | 1 | 536.19 | 843.88 | 57.38%
   1 | 1024 | 8 | 109.71 | 251.89 | 129.60%
   1 | 1024 | 32 | 33.24 | 104.11 | 213.21%
   1 | 1024 | 128 | 8.87 | 30.47 | 243.52%
   1 | 1024 | 512 | 2.24 | 8.3 | 270.54%
   1 | 8192 | 1 | .33 | 5000 | 50.00%
   1 | 8192 | 32 | 139.47 | 214.59 | 53.86%
   1 | 8192 | 128 | 54.59 | 109.23 | 100.09%
   1 | 8192 | 512 | 15.61 | 36.15 | 131.58%
   1 | 8192 | 2048 | 4.11 | 11.14 | 171.05%
   1 | 1048576 | 1 | 2597.4 | 3030.3 | 16.67%
   1 | 1048576 | 32 | 314.96 | 371.75 | 18.03%
   1 | 1048576 | 128 | 99.7 | 116.28 | 16.63%
   1 | 1048576 | 512 | 30.5 | 37.15 | 21.80%
   1 | 1048576 | 2048 | 10.38 | 12.3 | 18.50%
   1 | 8388608 | 1 | 2564.1 | 3174.6 | 23.81%
   1 | 8388608 | 32 | 196.27 | 238.95 | 21.75%
   1 | 8388608 | 128 | 55.36 | 68.03 | 22.89%
   1 | 8388608 | 512 | 15.58 | 19.24 | 23.49%
   1 | 8388608 | 2048 | 4.56 | 5.71 | 25.22%
   
   
   
   
   
   
   
   
   The indices size is reduced for low cardinality fields and flat for high 
cardinality fields.
   
   ```
   113Mindex_1_doc_32_cardinality_baseline
   114Mindex_1_doc_32_cardinality_candidate
   
   140Mindex_1_doc_128_cardinality_baseline
   133Mindex_1_doc_128_cardinality_candidate
   
   241Mindex_1_doc_8192_cardinality_baseline
   233Mindex_1_doc_8192_cardinality_candidate
   
   193Mindex_1_doc_1024_cardinality_baseline
   174Mindex_1_doc_1024_cardinality_candidate
   
   314Mindex_1_doc_1048576_cardinality_baseline
   315Mindex_1_doc_1048576_cardinality_candidate
   
   392Mindex_1_doc_8388608_cardinality_baseline
   391Mindex_1_doc_8388608_cardinality_candidate
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)



[ 
https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459377#comment-17459377
 ] 

Dawid Weiss commented on LUCENE-10313:
--

Yeah, it's very similar to log4j appenders and I don't think there should be 
any problems with it. Please take whatever you need from that branch I created 
- it stops literally one step from showing the content of the logs buffer - I 
think this should only update itself when it's actually displayed but it's been 
years since I wrote any swing code and I forgot what the listener was to 
actually determine whether a tabbed pane is visible or not.

> Remove log4j from dependencies and switch to java logging (in luke)
> ---
>
> Key: LUCENE-10313
> URL: https://issues.apache.org/jira/browse/LUCENE-10313
> Project: Lucene - Core
>  Issue Type: Task
>  Components: luke
>Reporter: Tomoko Uchida
>Priority: Major
>
> This seems to be a simpler solution at this moment
> https://issues.apache.org/jira/browse/LUCENE-10303
> https://issues.apache.org/jira/browse/LUCENE-10308
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10297) Speed up medium cardinality fields with readLongs and SIMD



[ 
https://issues.apache.org/jira/browse/LUCENE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459370#comment-17459370
 ] 

Feng Guo commented on LUCENE-10297:
---

This approach could increase the indices' size for low cardinality fileds. i 
raised https://issues.apache.org/jira/browse/LUCENE-10315 which looks better, 
close this now.

> Speed up medium cardinality fields with readLongs and SIMD
> --
>
> Key: LUCENE-10297
> URL: https://issues.apache.org/jira/browse/LUCENE-10297
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We introduced a bitset optimization for extremly low cardinality fields in 
> [LUCENE-10233|https://issues.apache.org/jira/browse/LUCENE-10233], but medium 
> cardinality fields (like 32/128) can rarely trigger this optimization, This 
> issue is trying to find out a way to speed up them.
> In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to 
> use readLELongs to speed up BKD id blocks, but did not get a obvious gain on 
> this approach. I think the reason could probably be that we were trying to 
> optimize the unsorted situation (typically happens for high cardinality 
> fields) and the bottleneck of queries on high cardinality fields is 
> {{visitDocValues}} but not {{{}readDocIds{}}}.
> However, medium cardinality fields may be tempted for this optimization 
> because they need to read lots of ids for each term. The basic idea is that 
> we can compute the delta of the sorted ids and encode/decode them like what 
> we do in {{{}StoredFieldsInts{}}}. I benchmarked the optimization by mocking 
> some random longPoint and querying them with {{{}PointInSetQuery{}}}. As 
> expected, the medium cardinality fields got spped up and high cardinality 
> fields get even results.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline(ms)|candidate(ms)|diff 
> percentage|baseline(QPS)|candidate(QPS)|diff percentage|
> |1|32|1|19|16|-15.79%|52.63|62.50|18.75%|
> |1|32|2|34|14|-58.82%|29.41|71.43|142.86%|
> |1|32|4|76|22|-71.05%|13.16|45.45|245.45%|
> |1|32|8|139|42|-69.78%|7.19|23.81|230.95%|
> |1|32|16|279|82|-70.61%|3.58|12.20|240.24%|
> |1|128|1|17|11|-35.29%|58.82|90.91|54.55%|
> |1|128|8|75|23|-69.33%|13.33|43.48|226.09%|
> |1|128|16|126|25|-80.16%|7.94|40.00|404.00%|
> |1|128|32|245|50|-79.59%|4.08|20.00|390.00%|
> |1|128|64|528|97|-81.63%|1.89|10.31|444.33%|
> |1|1024|1|3|2|-33.33%|333.33|500.00|50.00%|
> |1|1024|8|13|8|-38.46%|76.92|125.00|62.50%|
> |1|1024|32|31|19|-38.71%|32.26|52.63|63.16%|
> |1|1024|128|120|67|-44.17%|8.33|14.93|79.10%|
> |1|1024|512|480|133|-72.29%|2.08|7.52|260.90%|
> |1|8192|1|3|3|0.00%|333.33|333.33|0.00%|
> |1|8192|16|18|15|-16.67%|55.56|66.67|20.00%|
> |1|8192|64|19|14|-26.32%|52.63|71.43|35.71%|
> |1|8192|512|69|43|-37.68%|14.49|23.26|60.47%|
> |1|8192|2048|236|134|-43.22%|4.24|7.46|76.12%|
> |1|1048576|1|3|2|-33.33%|333.33|500.00|50.00%|
> |1|1048576|16|18|19|5.56%|55.56|52.63|-5.26%|
> |1|1048576|64|17|17|0.00%|58.82|58.82|0.00%|
> |1|1048576|512|34|32|-5.88%|29.41|31.25|6.25%|
> |1|1048576|2048|89|93|4.49%|11.24|10.75|-4.30%|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10297) Speed up medium cardinality fields with readLongs and SIMD



 [ 
https://issues.apache.org/jira/browse/LUCENE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo resolved LUCENE-10297.
---
Resolution: Won't Do

> Speed up medium cardinality fields with readLongs and SIMD
> --
>
> Key: LUCENE-10297
> URL: https://issues.apache.org/jira/browse/LUCENE-10297
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We introduced a bitset optimization for extremly low cardinality fields in 
> [LUCENE-10233|https://issues.apache.org/jira/browse/LUCENE-10233], but medium 
> cardinality fields (like 32/128) can rarely trigger this optimization, This 
> issue is trying to find out a way to speed up them.
> In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to 
> use readLELongs to speed up BKD id blocks, but did not get a obvious gain on 
> this approach. I think the reason could probably be that we were trying to 
> optimize the unsorted situation (typically happens for high cardinality 
> fields) and the bottleneck of queries on high cardinality fields is 
> {{visitDocValues}} but not {{{}readDocIds{}}}.
> However, medium cardinality fields may be tempted for this optimization 
> because they need to read lots of ids for each term. The basic idea is that 
> we can compute the delta of the sorted ids and encode/decode them like what 
> we do in {{{}StoredFieldsInts{}}}. I benchmarked the optimization by mocking 
> some random longPoint and querying them with {{{}PointInSetQuery{}}}. As 
> expected, the medium cardinality fields got spped up and high cardinality 
> fields get even results.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline(ms)|candidate(ms)|diff 
> percentage|baseline(QPS)|candidate(QPS)|diff percentage|
> |1|32|1|19|16|-15.79%|52.63|62.50|18.75%|
> |1|32|2|34|14|-58.82%|29.41|71.43|142.86%|
> |1|32|4|76|22|-71.05%|13.16|45.45|245.45%|
> |1|32|8|139|42|-69.78%|7.19|23.81|230.95%|
> |1|32|16|279|82|-70.61%|3.58|12.20|240.24%|
> |1|128|1|17|11|-35.29%|58.82|90.91|54.55%|
> |1|128|8|75|23|-69.33%|13.33|43.48|226.09%|
> |1|128|16|126|25|-80.16%|7.94|40.00|404.00%|
> |1|128|32|245|50|-79.59%|4.08|20.00|390.00%|
> |1|128|64|528|97|-81.63%|1.89|10.31|444.33%|
> |1|1024|1|3|2|-33.33%|333.33|500.00|50.00%|
> |1|1024|8|13|8|-38.46%|76.92|125.00|62.50%|
> |1|1024|32|31|19|-38.71%|32.26|52.63|63.16%|
> |1|1024|128|120|67|-44.17%|8.33|14.93|79.10%|
> |1|1024|512|480|133|-72.29%|2.08|7.52|260.90%|
> |1|8192|1|3|3|0.00%|333.33|333.33|0.00%|
> |1|8192|16|18|15|-16.67%|55.56|66.67|20.00%|
> |1|8192|64|19|14|-26.32%|52.63|71.43|35.71%|
> |1|8192|512|69|43|-37.68%|14.49|23.26|60.47%|
> |1|8192|2048|236|134|-43.22%|4.24|7.46|76.12%|
> |1|1048576|1|3|2|-33.33%|333.33|500.00|50.00%|
> |1|1048576|16|18|19|5.56%|55.56|52.63|-5.26%|
> |1|1048576|64|17|17|0.00%|58.82|58.82|0.00%|
> |1|1048576|512|34|32|-5.88%|29.41|31.25|6.25%|
> |1|1048576|2048|89|93|4.49%|11.24|10.75|-4.30%|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil

Feng Guo created LUCENE-10315:
-

 Summary: Speed up BKD leaf block ids codec by a 512 ints ForUtil
 Key: LUCENE-10315
 URL: https://issues.apache.org/jira/browse/LUCENE-10315
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Feng Guo


This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked 
this optimization by mocking some random LongPoint and querying them with 
PointInSetQuery.


*Benchmark Result*
|doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
percentage|
|1|32|1|51.44|148.26|188.22%|
|1|32|2|26.8|101.88|280.15%|
|1|32|4|14.04|53.52|281.20%|
|1|32|8|7.04|28.54|305.40%|
|1|32|16|3.54|14.61|312.71%|
|1|128|1|110.56|350.26|216.81%|
|1|128|8|16.6|89.81|441.02%|
|1|128|16|8.45|48.07|468.88%|
|1|128|32|4.2|25.35|503.57%|
|1|128|64|2.13|13.02|511.27%|
|1|1024|1|536.19|843.88|57.38%|
|1|1024|8|109.71|251.89|129.60%|
|1|1024|32|33.24|104.11|213.21%|
|1|1024|128|8.87|30.47|243.52%|
|1|1024|512|2.24|8.3|270.54%|
|1|8192|1|.33|5000|50.00%|
|1|8192|32|139.47|214.59|53.86%|
|1|8192|128|54.59|109.23|100.09%|
|1|8192|512|15.61|36.15|131.58%|
|1|8192|2048|4.11|11.14|171.05%|
|1|1048576|1|2597.4|3030.3|16.67%|
|1|1048576|32|314.96|371.75|18.03%|
|1|1048576|128|99.7|116.28|16.63%|
|1|1048576|512|30.5|37.15|21.80%|
|1|1048576|2048|10.38|12.3|18.50%|
|1|8388608|1|2564.1|3174.6|23.81%|
|1|8388608|32|196.27|238.95|21.75%|
|1|8388608|128|55.36|68.03|22.89%|
|1|8388608|512|15.58|19.24|23.49%|
|1|8388608|2048|4.56|5.71|25.22%|

The indices size is reduced for low cardinality fields and flat for high 
cardinality fields.  

{code:java}
113Mindex_1_doc_32_cardinality_baseline
114Mindex_1_doc_32_cardinality_candidate

140Mindex_1_doc_128_cardinality_baseline
133Mindex_1_doc_128_cardinality_candidate

241Mindex_1_doc_8192_cardinality_baseline
233Mindex_1_doc_8192_cardinality_candidate

193Mindex_1_doc_1024_cardinality_baseline
174Mindex_1_doc_1024_cardinality_candidate

314Mindex_1_doc_1048576_cardinality_baseline
315Mindex_1_doc_1048576_cardinality_candidate

392Mindex_1_doc_8388608_cardinality_baseline
391Mindex_1_doc_8388608_cardinality_candidate
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)



[ 
https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459344#comment-17459344
 ] 

Dawid Weiss commented on LUCENE-10313:
--

https://github.com/dweiss/lucene/tree/LUCENE-10313

It just rips the log4j api, nothing else. But I'll play with the handler a bit 
- it's fun.

> Remove log4j from dependencies and switch to java logging (in luke)
> ---
>
> Key: LUCENE-10313
> URL: https://issues.apache.org/jira/browse/LUCENE-10313
> Project: Lucene - Core
>  Issue Type: Task
>  Components: luke
>Reporter: Tomoko Uchida
>Priority: Major
>
> This seems to be a simpler solution at this moment
> https://issues.apache.org/jira/browse/LUCENE-10303
> https://issues.apache.org/jira/browse/LUCENE-10308
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)



[ 
https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459322#comment-17459322
 ] 

Dawid Weiss commented on LUCENE-10313:
--

Oh. I already started - let me push what I have and you can take it from there 
or modify it in a way you like it. I don't think the screen buffer is a problem 
- if one wants it, they can use tee. But I agree the text area can have a 
larger circular buffer so that it reaches _a lot_ into the history if somebody 
wants to save the logs. This should be fairly easy to implement.

> Remove log4j from dependencies and switch to java logging (in luke)
> ---
>
> Key: LUCENE-10313
> URL: https://issues.apache.org/jira/browse/LUCENE-10313
> Project: Lucene - Core
>  Issue Type: Task
>  Components: luke
>Reporter: Tomoko Uchida
>Priority: Major
>
> This seems to be a simpler solution at this moment
> https://issues.apache.org/jira/browse/LUCENE-10303
> https://issues.apache.org/jira/browse/LUCENE-10308
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10314) inconsistent index options when opening pre 9.0.0 index with 9.0.0

2021-12-14 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459281#comment-17459281
 ] 

Michael Sokolov commented on LUCENE-10314:
--

Small side note here; we have this comment in {{{}IndexOptions{}}}:

{{public enum IndexOptions {}}
{{  // NOTE: order is important here; FieldInfo uses this}}
{{  // order to merge two conflicting IndexOptions (always}}
{{  // "downgrades" by picking the lowest).}}

which is probably no longer relevant (since conflicting {{IndexOptions}} are 
forbidden).

> inconsistent index options when opening pre 9.0.0 index with 9.0.0
> --
>
> Key: LUCENE-10314
> URL: https://issues.apache.org/jira/browse/LUCENE-10314
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 9.0
>Reporter: Ian Lea
>Priority: Major
>
> We have a long-standing index with some mandatory fields and some optional
> fields that has been through multiple lucene upgrades without a full
> rebuild and on testing out an upgrade from version 8.11.0 to 9.0.0, when
> open an IndexWriter we are hitting the exception
> Exception in thread "main" java.lang.IllegalArgumentException: cannot
> change field "language" from index options=NONE to inconsistent index
> options=DOCS
>         at
> org.apache.lucene.index.FieldInfo.verifySameIndexOptions(FieldInfo.java:245)
>         at
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifySameSchema(FieldInfos.java:421)
>         at
> org.apache.lucene.index.FieldInfos$FieldNumbers.addOrGet(FieldInfos.java:357)
>         at
> org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:1263)
>         at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1116)
> Where language is one of our optional fields.
> Presumably this is at least somewhat related to "Index options can no
> longer be changed dynamically" as mentioned at
> https://lucene.apache.org/core/9_0_0/MIGRATE.html although it fails before
> our code attempts to update the index, and we are not trying to change any
> index options.
> Adding some displays to IndexWriter and FieldInfos and logging rather than
> throwing the exception I see
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=DOCS
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=NONE
>  language     curr=NONE, other=DOCS
>  language     curr=NONE, other=DOCS
>  language     curr=NONE, other=DOCS
>  language     curr=NONE, other=DOCS
>  language     curr=NONE, other=DOCS
>  language     curr=NONE, other=DOCS
>  language     curr=NONE, other=DOCS
>  language     curr=NONE, other=DOCS
> where there is one line per segment.  It logs the exception whenever
> other=DOCS.  Subset with segment info:
> segment _x8(8.2.0):c31753/-1:[diagnostics={timestamp=1565623850605,
> lucene.version=8.2.0, java.vm.version=11.0.3+7, java.version=11.0.3,
> mergeMaxNumSegments=-1, os.version=3.1.0-1.2-desktop,
> java.vendor=AdoptOpenJDK, source=merge, os.arch=amd64, mergeFactor=10,
> java.runtime.version=11.0.3+7,
> os=Linux}]:[attributes=\{Lucene50StoredFieldsFormat.mode=BEST_SPEED}]
>  language     curr=NONE, other=NONE
> segment _y9(8.7.0):c43531/-1:[diagnostics={timestamp=1604597581562,
> lucene.version=8.7.0, java.vm.version=11.0.3+7, java.version=11.0.3,
> mergeMaxNumSegments=-1, os.version=3.1.0-1.2-desktop,
> java.vendor=AdoptOpenJDK, source=merge, os.arch=amd64, mergeFactor=10,
> java.runtime.version=11.0.3+7,
> os=Linux}]:[attributes=\{Lucene87StoredFieldsFormat.mode=BEST_SPEED}]
>  language     curr=NONE, other=DOCS
> NOT throwing java.lang.IllegalArgumentException: cannot change field
> "language" from index options=NONE to inconsistent index options=DOCS
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase closed pull request #1193: LUCENE-9154: Remove encodeCeil() to encode bounding box queries



iverase closed pull request #1193:
URL: https://github.com/apache/lucene-solr/pull/1193


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)



[ 
https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459269#comment-17459269
 ] 

Tomoko Uchida commented on LUCENE-10313:


Thank you - I will take this. It shouldn't be difficult.

Regarding implementation, I would set a log handler only for the text area that 
has a scroll bar and possibly add a "save" button to persist the entire log to 
a local file. One reason I didn't use the console log appender in the app was 
that I thought it could lose the important part of the log that can include a 
long stack trace, without a large scrollback buffer.

> Remove log4j from dependencies and switch to java logging (in luke)
> ---
>
> Key: LUCENE-10313
> URL: https://issues.apache.org/jira/browse/LUCENE-10313
> Project: Lucene - Core
>  Issue Type: Task
>  Components: luke
>Reporter: Tomoko Uchida
>Priority: Major
>
> This seems to be a simpler solution at this moment
> https://issues.apache.org/jira/browse/LUCENE-10303
> https://issues.apache.org/jira/browse/LUCENE-10308
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox



msokolov commented on pull request #446:
URL: https://github.com/apache/lucene/pull/446#issuecomment-993640438


   > Why do we need to exclude small segments from regular merges? 
   
   If we selected some small segments as part of a full-flush merge, then they 
wouldn't be available to also be included in a regular merge, right?
   
   I also noticed that in IndexWriter where we call findFullFlushMerges, we 
only do so for merge triggers `GET_READER` and `COMMIT`, but *not* for trigger 
`FULL_FLUSH`, which seems quite confusing. I wonder if we could find a better 
name for `findFullFlushMerges`.
   
   Also, given that both `findMerges` and `findFullFlushMerges` are both called 
from the same switch statement, and for different triggers, *and the trigger is 
passed in as an argument* -- we could get rid of `findFullFlushMerges`, 
*always* call `findMerges`, and let the merge policy decide what to do based on 
the value of `trigger`. @s1monw WDTY?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10314) inconsistent index options when opening pre 9.0.0 index with 9.0.0

2021-12-14 Thread Ian Lea (Jira)

Ian Lea created LUCENE-10314:


 Summary: inconsistent index options when opening pre 9.0.0 index 
with 9.0.0
 Key: LUCENE-10314
 URL: https://issues.apache.org/jira/browse/LUCENE-10314
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 9.0
Reporter: Ian Lea


We have a long-standing index with some mandatory fields and some optional
fields that has been through multiple lucene upgrades without a full
rebuild and on testing out an upgrade from version 8.11.0 to 9.0.0, when
open an IndexWriter we are hitting the exception

Exception in thread "main" java.lang.IllegalArgumentException: cannot
change field "language" from index options=NONE to inconsistent index
options=DOCS
        at
org.apache.lucene.index.FieldInfo.verifySameIndexOptions(FieldInfo.java:245)
        at
org.apache.lucene.index.FieldInfos$FieldNumbers.verifySameSchema(FieldInfos.java:421)
        at
org.apache.lucene.index.FieldInfos$FieldNumbers.addOrGet(FieldInfos.java:357)
        at
org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:1263)
        at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1116)

Where language is one of our optional fields.

Presumably this is at least somewhat related to "Index options can no
longer be changed dynamically" as mentioned at
https://lucene.apache.org/core/9_0_0/MIGRATE.html although it fails before
our code attempts to update the index, and we are not trying to change any
index options.

Adding some displays to IndexWriter and FieldInfos and logging rather than
throwing the exception I see

 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=DOCS
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=NONE
 language     curr=NONE, other=DOCS
 language     curr=NONE, other=DOCS
 language     curr=NONE, other=DOCS
 language     curr=NONE, other=DOCS
 language     curr=NONE, other=DOCS
 language     curr=NONE, other=DOCS
 language     curr=NONE, other=DOCS
 language     curr=NONE, other=DOCS

where there is one line per segment.  It logs the exception whenever
other=DOCS.  Subset with segment info:

segment _x8(8.2.0):c31753/-1:[diagnostics={timestamp=1565623850605,
lucene.version=8.2.0, java.vm.version=11.0.3+7, java.version=11.0.3,
mergeMaxNumSegments=-1, os.version=3.1.0-1.2-desktop,
java.vendor=AdoptOpenJDK, source=merge, os.arch=amd64, mergeFactor=10,
java.runtime.version=11.0.3+7,
os=Linux}]:[attributes=\{Lucene50StoredFieldsFormat.mode=BEST_SPEED}]

 language     curr=NONE, other=NONE

segment _y9(8.7.0):c43531/-1:[diagnostics={timestamp=1604597581562,
lucene.version=8.7.0, java.vm.version=11.0.3+7, java.version=11.0.3,
mergeMaxNumSegments=-1, os.version=3.1.0-1.2-desktop,
java.vendor=AdoptOpenJDK, source=merge, os.arch=amd64, mergeFactor=10,
java.runtime.version=11.0.3+7,
os=Linux}]:[attributes=\{Lucene87StoredFieldsFormat.mode=BEST_SPEED}]

 language     curr=NONE, other=DOCS

NOT throwing java.lang.IllegalArgumentException: cannot change field
"language" from index options=NONE to inconsistent index options=DOCS

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues



msokolov commented on a change in pull request #534:
URL: https://github.com/apache/lucene/pull/534#discussion_r768748046



##
File path: lucene/core/src/java/org/apache/lucene/index/VectorValuesWriter.java
##
@@ -109,11 +110,15 @@ private void updateBytesUsed() {
   public void flush(Sorter.DocMap sortMap, KnnVectorsWriter knnVectorsWriter) 
throws IOException {
 VectorValues vectorValues =
 new BufferedVectorValues(docsWithField, vectors, 
fieldInfo.getVectorDimension());
-if (sortMap != null) {
-  knnVectorsWriter.writeField(fieldInfo, new 
SortingVectorValues(vectorValues, sortMap));
-} else {
-  knnVectorsWriter.writeField(fieldInfo, vectorValues);
-}
+KnnVectorsReader vectorsReader =
+new EmptyKnnVectorsReader() {
+  @Override
+  public VectorValues getVectorValues(String field) throws IOException 
{
+return sortMap != null ? new SortingVectorValues(vectorValues, 
sortMap) : vectorValues;

Review comment:
   Oh, good catch! I think with the current usage it probably never matters 
since we only call this once and dispose of the reader, but it would be a trap 
waiting for some future usage.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox



jpountz commented on pull request #446:
URL: https://github.com/apache/lucene/pull/446#issuecomment-993590295


   Why do we need to exclude small segments from regular merges? E.g. if a 
merge-on-flush merges small segments and the resulting segment is small too, we 
don't want to exclude it from regular merges, do we?
   
   Because it creates merges manually, this merge policy cannot take advantage 
of some specifics of the wrapped merge policy like the merge factor. I wonder 
if a better implementation would consist of calling `findMerges` and filter out 
merges that are too large?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)



[ 
https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459143#comment-17459143
 ] 

Dawid Weiss commented on LUCENE-10313:
--

I can take a look at this, if you don't have time, [~tomoko]. I need something 
more stimulating than log4j upgrades that seem to happen daily now...

> Remove log4j from dependencies and switch to java logging (in luke)
> ---
>
> Key: LUCENE-10313
> URL: https://issues.apache.org/jira/browse/LUCENE-10313
> Project: Lucene - Core
>  Issue Type: Task
>  Components: luke
>Reporter: Tomoko Uchida
>Priority: Major
>
> This seems to be a simpler solution at this moment
> https://issues.apache.org/jira/browse/LUCENE-10303
> https://issues.apache.org/jira/browse/LUCENE-10308
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)



[ 
https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459137#comment-17459137
 ] 

Dawid Weiss commented on LUCENE-10313:
--

There is just a handful of files actually logging something... Maybe all of it 
could be even replaced with a luke-specific logging facade that would keep a 
window of logs for the text area and emit the rest via java.util.logging 
(without even altering any default configuration/ appenders/ etc.).

> Remove log4j from dependencies and switch to java logging (in luke)
> ---
>
> Key: LUCENE-10313
> URL: https://issues.apache.org/jira/browse/LUCENE-10313
> Project: Lucene - Core
>  Issue Type: Task
>  Components: luke
>Reporter: Tomoko Uchida
>Priority: Major
>
> This seems to be a simpler solution at this moment
> https://issues.apache.org/jira/browse/LUCENE-10303
> https://issues.apache.org/jira/browse/LUCENE-10308
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.16.0



[ 
https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459132#comment-17459132
 ] 

Dawid Weiss commented on LUCENE-10303:
--

I don't know how difficult it would be, Tomoko. I don't think it'd be very 
hard. The problem with a single log4j sink in the home folder is that it's one 
file - if you run two Luke instances (for example, to compare indexes) one 
overwrites another. I'd rather have the persistent log dumped to the console. 
To me, it'd be more convenient than trying to look up where that log actually 
is.

> Upgrade log4j to 2.16.0
> ---
>
> Key: LUCENE-10303
> URL: https://issues.apache.org/jira/browse/LUCENE-10303
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10303.patch
>
>
> CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker 
> controlled LDAP and other JNDI related endpoints.
> Versions Affected: all versions from 2.0-beta9 to 2.14.1
> [https://logging.apache.org/log4j/2.x/security.html]
>  
> Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile 
> the versions.props is shared by all subprojects, it may be better to upgrade 
> to 2.15.0 I think.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10313) Remove log4j from dependencies and switch to java logging (in luke)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated LUCENE-10313:
---
Summary: Remove log4j from dependencies and switch to java logging (in 
luke)  (was: Remove log4j from dependencies and switch to java logging)

> Remove log4j from dependencies and switch to java logging (in luke)
> ---
>
> Key: LUCENE-10313
> URL: https://issues.apache.org/jira/browse/LUCENE-10313
> Project: Lucene - Core
>  Issue Type: Task
>  Components: luke
>Reporter: Tomoko Uchida
>Priority: Major
>
> This seems to be a simpler solution at this moment
> https://issues.apache.org/jira/browse/LUCENE-10303
> https://issues.apache.org/jira/browse/LUCENE-10308
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10313) Remove log4j from dependencies and switch to java logging

Tomoko Uchida created LUCENE-10313:
--

 Summary: Remove log4j from dependencies and switch to java logging
 Key: LUCENE-10313
 URL: https://issues.apache.org/jira/browse/LUCENE-10313
 Project: Lucene - Core
  Issue Type: Task
  Components: luke
Reporter: Tomoko Uchida


This seems to be a simpler solution at this moment

https://issues.apache.org/jira/browse/LUCENE-10303

https://issues.apache.org/jira/browse/LUCENE-10308

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.16.0



[ 
https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459120#comment-17459120
 ] 

Tomoko Uchida commented on LUCENE-10303:


Luke uses log4j mainly because I have been accustomed to it, and have never 
used java logging.

The logger has two appenders - one for a file handler and another for a text 
area component named "Logs" tab. If this configuration can be seamlessly ported 
to java logging (I could write a custom log handler) there would not be any 
problems with switching the logging framework. Or we probably should remove the 
fancy TextArea appender - though if possible, I'd like to keep this for the 
convenience of daily use.

> Upgrade log4j to 2.16.0
> ---
>
> Key: LUCENE-10303
> URL: https://issues.apache.org/jira/browse/LUCENE-10303
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10303.patch
>
>
> CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker 
> controlled LDAP and other JNDI related endpoints.
> Versions Affected: all versions from 2.0-beta9 to 2.14.1
> [https://logging.apache.org/log4j/2.x/security.html]
>  
> Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile 
> the versions.props is shared by all subprojects, it may be better to upgrade 
> to 2.15.0 I think.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths



[ 
https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459118#comment-17459118
 ] 

Dawid Weiss commented on LUCENE-10308:
--

Thanks, I added it (again). Bugzilla is confusing like hell.

> Make ecj and javadoc run with modular paths
> ---
>
> Key: LUCENE-10308
> URL: https://issues.apache.org/jira/browse/LUCENE-10308
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Attachments: repro.zip
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths



[ 
https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459115#comment-17459115
 ] 

Uwe Schindler commented on LUCENE-10308:


I don't see an attachment on the ECJ bug.

> Make ecj and javadoc run with modular paths
> ---
>
> Key: LUCENE-10308
> URL: https://issues.apache.org/jira/browse/LUCENE-10308
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Attachments: repro.zip
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths



[ 
https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459100#comment-17459100
 ] 

Dawid Weiss commented on LUCENE-10308:
--

> The issue here is a bug in ecj.

I know. But it doesn't help me much, does it? :)

> Make ecj and javadoc run with modular paths
> ---
>
> Key: LUCENE-10308
> URL: https://issues.apache.org/jira/browse/LUCENE-10308
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Attachments: repro.zip
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths



[ 
https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459096#comment-17459096
 ] 

Dawid Weiss commented on LUCENE-10308:
--

Filed a bug to ECJ.
https://bugs.eclipse.org/bugs/show_bug.cgi?id=577790

> Make ecj and javadoc run with modular paths
> ---
>
> Key: LUCENE-10308
> URL: https://issues.apache.org/jira/browse/LUCENE-10308
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Attachments: repro.zip
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths



[ 
https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459093#comment-17459093
 ] 

Uwe Schindler commented on LUCENE-10308:


That's explicitly wanted by log4j, because it makes the module info file only 
visible to Java 9+ code. It was added because of complaints by Users where 
classpath scanners and IDEs broke when finding the file.

We had the discussion on the openjdk committer meeting on Fosdem 2 years ago 
with Maven people and the result was that it's the recommendation to only add 
Module-Info.class in the MR part of Jar file for best compatibility. Maven does 
this by default I think.

The issue here is a bug in ecj.

> Make ecj and javadoc run with modular paths
> ---
>
> Key: LUCENE-10308
> URL: https://issues.apache.org/jira/browse/LUCENE-10308
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Attachments: repro.zip
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10308) Make ecj and javadoc run with modular paths



 [ 
https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10308:
-
Attachment: repro.zip

> Make ecj and javadoc run with modular paths
> ---
>
> Key: LUCENE-10308
> URL: https://issues.apache.org/jira/browse/LUCENE-10308
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Attachments: repro.zip
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10308) Make ecj and javadoc run with modular paths



[ 
https://issues.apache.org/jira/browse/LUCENE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459087#comment-17459087
 ] 

Dawid Weiss commented on LUCENE-10308:
--

The root cause of this is ECJ failing to parse log4j JAR as a module because 
log4j is a multi-release JAR and has the module descriptor inside META-INF/ mr 
folder for Java 9+.

> Make ecj and javadoc run with modular paths
> ---
>
> Key: LUCENE-10308
> URL: https://issues.apache.org/jira/browse/LUCENE-10308
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.16.0



[ 
https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459075#comment-17459075
 ] 

Dawid Weiss commented on LUCENE-10303:
--

I remember looking at how Luke configures logs (to user's home folder) and 
thinking whether this is really necessary/ correct. Perhaps java logging would 
be entirely sufficient for Luke's needs, eventually?

> Upgrade log4j to 2.16.0
> ---
>
> Key: LUCENE-10303
> URL: https://issues.apache.org/jira/browse/LUCENE-10303
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10303.patch
>
>
> CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker 
> controlled LDAP and other JNDI related endpoints.
> Versions Affected: all versions from 2.0-beta9 to 2.14.1
> [https://logging.apache.org/log4j/2.x/security.html]
>  
> Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile 
> the versions.props is shared by all subprojects, it may be better to upgrade 
> to 2.15.0 I think.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy merged pull request #2632: SOLR-15848 BadApple failing tests in branch_8_11



janhoy merged pull request #2632:
URL: https://github.com/apache/lucene-solr/pull/2632


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.16.0



[ 
https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459049#comment-17459049
 ] 

Tomoko Uchida commented on LUCENE-10303:


{quote}We should update to 2.16.0 (came put today) in all active branches.

Please aso change the changelog entry, no new issue please!
{quote}
I'll update it. Let me wait for a while (to make sure there is no further minor 
update on it).
{quote}No patch release needed for Lucene 9.0, as there's no remote access to 
Luke. I am not sure about Lucene replicator, is it used there, too?
{quote}
Only Luke uses log4j. No other module does not depend on it, I grepped the 
entire source.

> Upgrade log4j to 2.16.0
> ---
>
> Key: LUCENE-10303
> URL: https://issues.apache.org/jira/browse/LUCENE-10303
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10303.patch
>
>
> CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker 
> controlled LDAP and other JNDI related endpoints.
> Versions Affected: all versions from 2.0-beta9 to 2.14.1
> [https://logging.apache.org/log4j/2.x/security.html]
>  
> Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile 
> the versions.props is shared by all subprojects, it may be better to upgrade 
> to 2.15.0 I think.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy opened a new pull request #2632: SOLR-15848 BadApple failing tests in branch_8_11

2021-12-14 Thread ASF subversion and git services (Jira)



janhoy opened a new pull request #2632:
URL: https://github.com/apache/lucene-solr/pull/2632


   https://issues.apache.org/jira/browse/SOLR-15848


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10085) Implement Weight#count on DocValuesFieldExistsQuery



[ 
https://issues.apache.org/jira/browse/LUCENE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459021#comment-17459021
 ] 

ASF subversion and git services commented on LUCENE-10085:
--

Commit 352a6b68f0dfc6847e81bcd41a4c66b86494e2b4 in lucene's branch 
refs/heads/branch_9x from Quentin Pradet
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=352a6b6 ]

LUCENE-10085: Fix flaky testQueryMatchesCount (#538)

Five times every 10 000 tests, we did not index any documents with i
between 0 and 10 (inclusive), which caused the deleted tests to fail.

With this commit, we make sure that we always index at least one
document between 0 and 10.

> Implement Weight#count on DocValuesFieldExistsQuery
> ---
>
> Key: LUCENE-10085
> URL: https://issues.apache.org/jira/browse/LUCENE-10085
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Now that we require all documents to use the same features (LUCENE-9334) we 
> could implement {{Weight#count}} to return docCount if either terms or points 
> are indexed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] iverase commented on pull request #538: LUCENE-10085: Fix flaky testQueryMatchesCount

2021-12-14 Thread ASF subversion and git services (Jira)



iverase commented on pull request #538:
URL: https://github.com/apache/lucene/pull/538#issuecomment-993362650


   thanks @pquentin!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10085) Implement Weight#count on DocValuesFieldExistsQuery



[ 
https://issues.apache.org/jira/browse/LUCENE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459019#comment-17459019
 ] 

ASF subversion and git services commented on LUCENE-10085:
--

Commit 9974f6ac34ac2f17bfcdf30d6df79476579ff1e0 in lucene's branch 
refs/heads/main from Quentin Pradet
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9974f6a ]

LUCENE-10085: Fix flaky testQueryMatchesCount (#538)

Five times every 10 000 tests, we did not index any documents with i
between 0 and 10 (inclusive), which caused the deleted tests to fail.

With this commit, we make sure that we always index at least one
document between 0 and 10.

> Implement Weight#count on DocValuesFieldExistsQuery
> ---
>
> Key: LUCENE-10085
> URL: https://issues.apache.org/jira/browse/LUCENE-10085
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Now that we require all documents to use the same features (LUCENE-9334) we 
> could implement {{Weight#count}} to return docCount if either terms or points 
> are indexed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] iverase merged pull request #538: LUCENE-10085: Fix flaky testQueryMatchesCount



iverase merged pull request #538:
URL: https://github.com/apache/lucene/pull/538


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.16.0



[ 
https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459016#comment-17459016
 ] 

Uwe Schindler commented on LUCENE-10303:


Please aso change the changelog entry, no new issue please!

> Upgrade log4j to 2.16.0
> ---
>
> Key: LUCENE-10303
> URL: https://issues.apache.org/jira/browse/LUCENE-10303
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10303.patch
>
>
> CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker 
> controlled LDAP and other JNDI related endpoints.
> Versions Affected: all versions from 2.0-beta9 to 2.14.1
> [https://logging.apache.org/log4j/2.x/security.html]
>  
> Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile 
> the versions.props is shared by all subprojects, it may be better to upgrade 
> to 2.15.0 I think.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10303) Upgrade log4j to 2.15.0



[ 
https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459015#comment-17459015
 ] 

Uwe Schindler edited comment on LUCENE-10303 at 12/14/21, 9:43 AM:
---

No patch release needed for Lucene 9.0, as there's no remote access to Luke. I 
am not sure about Lucene replicator, is it used there, too?


was (Author: thetaphi):
No patch release needed for Lucene 9.0, as there's no idde for Luke. I am not 
sure about Lucene replicator, is it used there, too?

> Upgrade log4j to 2.15.0
> ---
>
> Key: LUCENE-10303
> URL: https://issues.apache.org/jira/browse/LUCENE-10303
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10303.patch
>
>
> CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker 
> controlled LDAP and other JNDI related endpoints.
> Versions Affected: all versions from 2.0-beta9 to 2.14.1
> [https://logging.apache.org/log4j/2.x/security.html]
>  
> Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile 
> the versions.props is shared by all subprojects, it may be better to upgrade 
> to 2.15.0 I think.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10303) Upgrade log4j to 2.16.0



 [ 
https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10303:
---
Summary: Upgrade log4j to 2.16.0  (was: Upgrade log4j to 2.15.0)

> Upgrade log4j to 2.16.0
> ---
>
> Key: LUCENE-10303
> URL: https://issues.apache.org/jira/browse/LUCENE-10303
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10303.patch
>
>
> CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker 
> controlled LDAP and other JNDI related endpoints.
> Versions Affected: all versions from 2.0-beta9 to 2.14.1
> [https://logging.apache.org/log4j/2.x/security.html]
>  
> Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile 
> the versions.props is shared by all subprojects, it may be better to upgrade 
> to 2.15.0 I think.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10303) Upgrade log4j to 2.15.0



[ 
https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459015#comment-17459015
 ] 

Uwe Schindler commented on LUCENE-10303:


No patch release needed for Lucene 9.0, as there's no idde for Luke. I am not 
sure about Lucene replicator, is it used there, too?

> Upgrade log4j to 2.15.0
> ---
>
> Key: LUCENE-10303
> URL: https://issues.apache.org/jira/browse/LUCENE-10303
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10303.patch
>
>
> CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker 
> controlled LDAP and other JNDI related endpoints.
> Versions Affected: all versions from 2.0-beta9 to 2.14.1
> [https://logging.apache.org/log4j/2.x/security.html]
>  
> Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile 
> the versions.props is shared by all subprojects, it may be better to upgrade 
> to 2.15.0 I think.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-10303) Upgrade log4j to 2.15.0



 [ 
https://issues.apache.org/jira/browse/LUCENE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-10303:


We should update to 2.16.0 (came put today) in all active branches.

> Upgrade log4j to 2.15.0
> ---
>
> Key: LUCENE-10303
> URL: https://issues.apache.org/jira/browse/LUCENE-10303
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10303.patch
>
>
> CVE-2021-44228: Apache Log4j2 JNDI features do not protect against attacker 
> controlled LDAP and other JNDI related endpoints.
> Versions Affected: all versions from 2.0-beta9 to 2.14.1
> [https://logging.apache.org/log4j/2.x/security.html]
>  
> Only luke module uses log4j 2.13.2 (I grepped the entire codebase); meanwhile 
> the versions.props is shared by all subprojects, it may be better to upgrade 
> to 2.15.0 I think.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #2631: SOLR-15843 Upgrade log4j from 2.15 to 2.16



uschindler commented on pull request #2631:
URL: https://github.com/apache/lucene-solr/pull/2631#issuecomment-993354358


   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] raminmjj opened a new pull request #540: LUCENE-10312: Add PersianStemmer



raminmjj opened a new pull request #540:
URL: https://github.com/apache/lucene/pull/540


   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code 
conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10312) Add PersianStemmer

2021-12-14 Thread Ramin Alirezaee (Jira)

Ramin Alirezaee created LUCENE-10312:


 Summary: Add PersianStemmer
 Key: LUCENE-10312
 URL: https://issues.apache.org/jira/browse/LUCENE-10312
 Project: Lucene - Core
  Issue Type: Wish
  Components: modules/analysis
Affects Versions: 9.0
Reporter: Ramin Alirezaee






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] codaitya commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox



codaitya commented on pull request #446:
URL: https://github.com/apache/lucene/pull/446#issuecomment-993311631


   Thanks for the explanation Mikes !. I have updated the PR to leave out 
variable max segment size . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field



jpountz commented on a change in pull request #539:
URL: https://github.com/apache/lucene/pull/539#discussion_r768421436



##
File path: lucene/core/src/test/org/apache/lucene/codecs/TestMinimalCodec.java
##
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs;
+
+import static com.carrotsearch.randomizedtesting.RandomizedTest.randomBoolean;
+
+import java.io.IOException;
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.StoredField;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.search.MatchAllDocsQuery;
+import org.apache.lucene.store.BaseDirectoryWrapper;
+import org.apache.lucene.util.LuceneTestCase;
+import org.apache.lucene.util.TestUtil;
+
+/**
+ * Tests to ensure that {@link Codec}s won't need to implement all formats in 
case where only a
+ * small subset of Lucene's functionality is used.
+ */
+public class TestMinimalCodec extends LuceneTestCase {
+
+  public void testMinimalCodec() throws IOException {
+runMinimalCodecTest(false, false);
+runMinimalCodecTest(false, true);
+runMinimalCodecTest(true, true);
+runMinimalCodecTest(true, false);
+  }
+
+  private void runMinimalCodecTest(boolean useCompoundFile, boolean 
useDeletes) throws IOException {
+try (BaseDirectoryWrapper dir = newDirectory()) {
+  dir.setCheckIndexOnClose(false); // MinimalCodec is not registered with 
SPI

Review comment:
   should we register these 4 codecs via SPI?

##
File path: lucene/core/src/test/org/apache/lucene/codecs/TestMinimalCodec.java
##
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs;
+
+import static com.carrotsearch.randomizedtesting.RandomizedTest.randomBoolean;
+
+import java.io.IOException;
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.StoredField;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.search.MatchAllDocsQuery;
+import org.apache.lucene.store.BaseDirectoryWrapper;
+import org.apache.lucene.util.LuceneTestCase;
+import org.apache.lucene.util.TestUtil;
+
+/**
+ * Tests to ensure that {@link Codec}s won't need to implement all formats in 
case where only a
+ * small subset of Lucene's functionality is used.
+ */
+public class TestMinimalCodec extends LuceneTestCase {
+
+  public void testMinimalCodec() throws IOException {
+runMinimalCodecTest(false, false);
+runMinimalCodecTest(false, true);
+runMinimalCodecTest(true, true);
+runMinimalCodecTest(true, false);
+  }
+
+  private void runMinimalCodecTest(boolean useCompoundFile, boolean 
useDeletes) throws IOException {
+try (BaseDirectoryWrapper dir = newDirectory()) {
+  dir.setCheckIndexOnClose(false); // MinimalCodec is not registered with 
SPI
+
+  IndexWriterConfig writerConfig =
+  newIndexWriterConfig(new MockAnalyzer(random()))
+  .setCodec(new MinimalCodec(useCompoundFile, useDeletes))
+  .setUseCompoundFile(useCompoundFile);
+  if (!useCompoundFile) {
+

[GitHub] [lucene] ywelsch commented on a change in pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field



ywelsch commented on a change in pull request #539:
URL: https://github.com/apache/lucene/pull/539#discussion_r768409970



##
File path: lucene/core/src/java/org/apache/lucene/index/SegmentCommitInfo.java
##
@@ -244,7 +244,9 @@ public long sizeInBytes() throws IOException {
 // updates) and then maybe even be able to remove LiveDocsFormat.files().
 
 // Must separately add any live docs files:
-info.getCodec().liveDocsFormat().files(this, files);
+if (hasDeletions()) {
+  info.getCodec().liveDocsFormat().files(this, files);

Review comment:
   This now allows for a minimal codec that does not have a LiveDocsFormat 
when no deletes are being used. It was a tiny change to make, but perhaps out 
of scope for this PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #534: LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues



jpountz commented on a change in pull request #534:
URL: https://github.com/apache/lucene/pull/534#discussion_r768405655



##
File path: 
lucene/core/src/java/org/apache/lucene/index/EmptyKnnVectorsReader.java
##
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import org.apache.lucene.codecs.KnnVectorsReader;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.util.Bits;
+
+/** Abstract base class implementing a {@link KnnVectorsReader} that has no 
vector values. */
+public abstract class EmptyKnnVectorsReader extends KnnVectorsReader {

Review comment:
   We do it for doc values because doc values support 5 different types 
(numeric, sorted numeric, sorted, sorted set, binary) and the empty producer 
helps only implement the doc values type that we care about. Since there is a 
single type of vectors, I don't think we need this empty producer, let's remove 
it and extend KnnVectorsReader directly?

##
File path: lucene/core/src/java/org/apache/lucene/index/VectorValuesWriter.java
##
@@ -109,11 +110,15 @@ private void updateBytesUsed() {
   public void flush(Sorter.DocMap sortMap, KnnVectorsWriter knnVectorsWriter) 
throws IOException {
 VectorValues vectorValues =
 new BufferedVectorValues(docsWithField, vectors, 
fieldInfo.getVectorDimension());
-if (sortMap != null) {
-  knnVectorsWriter.writeField(fieldInfo, new 
SortingVectorValues(vectorValues, sortMap));
-} else {
-  knnVectorsWriter.writeField(fieldInfo, vectorValues);
-}
+KnnVectorsReader vectorsReader =
+new EmptyKnnVectorsReader() {
+  @Override
+  public VectorValues getVectorValues(String field) throws IOException 
{
+return sortMap != null ? new SortingVectorValues(vectorValues, 
sortMap) : vectorValues;

Review comment:
   This is incorrect as it would return the same instance every time it is 
called. We should instantiate a new BufferedVectorValues instance every time 
this method is called.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] ywelsch commented on a change in pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field



ywelsch commented on a change in pull request #539:
URL: https://github.com/apache/lucene/pull/539#discussion_r768407647



##
File path: lucene/core/src/java/org/apache/lucene/index/FieldInfos.java
##
@@ -200,6 +204,11 @@ public boolean hasFreq() {
 return hasFreq;
   }
 
+  /** Returns true if any fields are indexed */
+  public boolean hasIndexed() {

Review comment:
   I wasn't sure about the naming here. Perhaps `hasPostings` is a better 
name?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] ywelsch opened a new pull request #539: LUCENE-10291: Only read/write postings when there is at least one indexed field

ywelsch opened a new pull request #539:
URL: https://github.com/apache/lucene/pull/539

# Description

Unlike points, norms, term vectors or doc values which only get written to
the directory when at least one of the fields uses the data structure, postings
always get written to the directory.
While this isn't hurting much, it can be surprising at times, e.g. if you
index with SimpleText you will have a file for postings even though none of the
fields indexes postings. This inconsistency is hidden with the default codec
due to the fact that it uses PerFieldPostingsFormat, which only delegates to
any of the per-field codecs if any of the fields is actually indexed, so you
don't actually get a file if none of the fields is indexed.

# Solution

Fixes the situation by making read / write of postings conditional on
whether there is any data indexed. This can be determined from the metadata in
FieldInfos.

# Tests

Added new test `TestMinimalCodec`. Passed existing tests via ./gradlew
clean; ./gradlew check

# Checklist

Please review the following and check all that apply:

- [x] I have reviewed the guidelines for [How to
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code
conforms to the standards described there to the best of my ability.
- [x] I have created a Jira issue and added the issue ID to my pull request
title.
- [x] I have given Lucene maintainers
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
to contribute to my PR branch. (optional but recommended)
- [x] I have developed this patch against the `main` branch.
- [x] I have run `./gradlew check`.
- [x] I have added tests for my changes.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10303) Upgrade log4j to 2.15.0