date:20230112

[GitHub] [lucene] LuXugang commented on a diff in pull request #12078: Enhance XXXField#newRangeQuery

2023-01-12 Thread GitBox



LuXugang commented on code in PR #12078:
URL: https://github.com/apache/lucene/pull/12078#discussion_r1068923445


##
lucene/core/src/java/org/apache/lucene/document/LongField.java:
##
@@ -108,8 +109,9 @@ public static Query newExactQuery(String field, long value) 
{
*/
   public static Query newRangeQuery(String field, long lowerValue, long 
upperValue) {
 PointRangeQuery.checkArgs(field, lowerValue, upperValue);
+Query fallbackQuery = LongPoint.newRangeQuery(field, lowerValue, 
upperValue);
 return new IndexOrDocValuesQuery(
-LongPoint.newRangeQuery(field, lowerValue, upperValue),
+new IndexSortSortedNumericDocValuesRangeQuery(field, lowerValue, 
upperValue, fallbackQuery),
 SortedNumericDocValuesField.newSlowRangeQuery(field, lowerValue, 
upperValue));

Review Comment:
   > IndexSortSortedNumericDocValuesRangeQuery is good at both so it should 
take precedence over both the index and the doc-values query?
   
   Thanks for helping, @jpountz , addressed in 
https://github.com/apache/lucene/pull/12078/commits/7adfdd3eabaffbe8fc1cf53711a7d570d57cd175



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

2023-01-12 Thread GitBox



gsmiller commented on PR #12073:
URL: https://github.com/apache/lucene/pull/12073#issuecomment-1381092287

   Moved this into "draft" state until I'm able to come back and figure out why 
there was a benchmark improvement with this change, given the feedback that 
`ReqExclBulkScorer` would be expected to kick in. I've gotten busy with some 
other work, but plan to come back and dig into this. In the meantime though, 
moving to "draft."


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zhaih commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-12 Thread GitBox



zhaih commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1068747982


##
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##
@@ -94,36 +93,83 @@ public int size() {
   }
 
   /**
-   * Add node on the given level
+   * Add node on the given level. Nodes can be inserted out of order, but it 
requires that the nodes

Review Comment:
   > We need this index to query the OnHeapHnswGraph.graph to get the 
NeighborArray for a particular element 
([ref](https://github.com/jmazanec15/lucene/blob/hnsw-merge-from-graph/lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java#L87)).
   
   I think we generally are not exposing the `graph` outside so the usage is 
only inside the OnHeapHnswGraph I believe? And inside the class I think that's 
the only place we use the nodeIndex, which is used for retrieving certain node 
after binary search. I think this won't be a problem for TreeSet or TreeMap (if 
we use node id as key and neighbor array as value)? Or have I overlooked some 
other usage of retrieving via nodeIndex?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-12 Thread GitBox



jmazanec15 commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1068562367


##
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##
@@ -94,36 +93,83 @@ public int size() {
   }
 
   /**
-   * Add node on the given level
+   * Add node on the given level. Nodes can be inserted out of order, but it 
requires that the nodes

Review Comment:
   Thinking about this more, one issue with a TreeSet based approach would be 
that we wouldnt be able to look up the index of a particular element (unless we 
converted the set to a list each time we need to do lookup). We need this index 
to query the OnHeapHnswGraph.graph to get the NeighborArray for a particular 
element 
([ref](https://github.com/jmazanec15/lucene/blob/hnsw-merge-from-graph/lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java#L87)).
 I am not sure if there is a good way around this - are you aware of any?
   
   Alternatively, I was thinking if we wanted to avoid the expensive copy for 
inserting when position < idx, we could switch from using an int[] to an 
ArrayList to represent nodesByLevel for a particular level. This 
should avoid in most cases the extreme copy that out of order insertion would 
require.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] Brain2000 opened a new issue, #12082: LeafFieldComparator setBottom not being called before compareBottom

2023-01-12 Thread GitBox



Brain2000 opened a new issue, #12082:
URL: https://github.com/apache/lucene/issues/12082

   ### Description
   
   It looks like there's a problem in the TopFieldCollector.java where it calls 
"compareBottom" without calling "setBottom" first.
   I believe this is an issue if getLeafComparer( ) is called more than once, 
as the new class will no longer have the previous bottom slot value.
   
   I found a workaround is to rescope the variable storing the bottom slot set 
from calling "setBottom( )", to outside of the LeafFieldComparator class to 
make it persistent (just like the setTopValue( ) is called outside of the 
LeafFieldComparator class).
   
   Rescoping the bottom slot should be safe unless multiple threads are being 
run at the same time and need their own version of bottomSlot for the 
LeafFieldComparator class.
   
   The documentation here 
 
https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/LeafFieldComparator.html
   states that _"[setBottom] will always be called before compareBottom(int)"_
   
   So one might think the LeafFieldComparator would be used like this:
   
   ```
   //get new leaf comparator
   LeafFieldComparator lc = getLeafComparator( );
   //setBottom before compareBottom
   lc.setBottom(xxx); 
   lc.compareBottom(yyy);
   
   //get new leaf comparator
   lc = getLeafComparator( );
   //setBottom before compareBottom
   lc.setBottom(xxx2);
   lc.compareBottom(yyy2);
   ```
   
   However, it is being called in this fashion:
   
   ```
   //get new leaf comparator
   LeafFieldComparator lc = getLeafComparator( );
   //setBottom before compareBottom
   lc.setBottom(xxx); 
   lc.compareBottom(yyy);
   
   //get new leaf comparator
   LeafFieldComparator lc = getLeafComparator( );
   //compareBottom... hopefully the slot is correct...
   lc.compareBottom(yyy);
   ```
   
   As you can see, the 2nd set of code does not treat the LeafComparator as if 
each instance has it's own bottomSlot variable. I believe the purpose of this 
would be to allow multiple threads to call this simultaneously, which would 
require each one to have it's own bottomSlot, otherwise it would not be thread 
safe.
   
   Also, if setBottom was meant to be outside of the scope of LeafComparator, 
then why was the time taken to put setTop outside and setBottom inside? This 
seems to have been done very deliberately according to the documentation.
   
   Here is a sample code snippet:
   ```
   import org.apache.lucene.index.LeafReaderContext;
   import org.apache.lucene.search.FieldComparator;
   import org.apache.lucene.search.LeafFieldComparator;
   import org.apache.lucene.search.Scorable;
   
   public final class MySimpleStringComparator extends FieldComparator {
private String[] values;
private final String field;
private String topValue;
   
//SCOPING bottomSlot HERE WILL CAUSE ITEMS TO RETURN PROPERLY
//private int bottomSlot;
   
public MySimpleStringComparator(int numHits, String field) {
this.values = new String[numHits];
this.field = field;
}
   
@Override
public int compare(int slot1, int slot2) {
return this.compareValues(this.values[slot1], 
this.values[slot2]);
}
   
@Override
public LeafFieldComparator getLeafComparator(LeafReaderContext lrc) 
throws IOException {
final LeafReaderContext frc = lrc;
   
return new LeafFieldComparator() {
//SCOPING bottomSlot HERE WILL CAUSE ITEMS ON A SUBSET 
TO RANDOMLY DISAPPEAR
private int bottomSlot;
   
public void setScorer(Scorable arg0) throws IOException 
{
}
   
public void setBottom(int slot) throws IOException {
bottomSlot = slot;
}
   
public void copy(int slot, int doc) throws IOException {
values[slot] = 
frc.reader().document(doc).get(field);
}
   
public int compareTop(int doc) throws IOException {
return compareValues(topValue, 
frc.reader().document(doc).get(field));
}
   
public int compareBottom(int doc) throws IOException {
return compareValues(values[bottomSlot], 
frc.reader().document(doc).get(field));
}
};
}
   
@Override
public void setTopValue(String val) {
this.topValue = val;
}
   
@Override
public String value(int slot) {
return this.values[slot];
}
   }
   ```
   
   Here is a code snippet that runs a search/sort for a single page that 
utilizes the MySimpleStringComparator above.
   ```
   ComplexPhraseQueryParser cpqp

[GitHub] [lucene] jpountz commented on pull request #11807: No need to rewrite queries in unified highlighter

2023-01-12 Thread GitBox



jpountz commented on PR #11807:
URL: https://github.com/apache/lucene/pull/11807#issuecomment-1380775336

   @romseygeek Should it be backported to branch_9x?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11779: GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori

2023-01-12 Thread GitBox



jpountz commented on PR #11779:
URL: https://github.com/apache/lucene/pull/11779#issuecomment-1380774395

   @danmuzi Should this be backported to branch_9x?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #12081: Speed up DocIdMerger on sorted indexes.

2023-01-12 Thread GitBox



jpountz merged PR #12081:
URL: https://github.com/apache/lucene/pull/12081


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #12079: Speed up 1D BKD merging.

2023-01-12 Thread GitBox



jpountz merged PR #12079:
URL: https://github.com/apache/lucene/pull/12079


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna commented on pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-12 Thread GitBox



javanna commented on PR #12072:
URL: https://github.com/apache/lucene/pull/12072#issuecomment-1380596699

   Thanks @benwtrent !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna closed issue #12069: Long rewrite times for deeply nested, non-scoring Boolean queries

2023-01-12 Thread GitBox



javanna closed issue #12069: Long rewrite times for deeply nested, non-scoring 
Boolean queries
URL: https://github.com/apache/lucene/issues/12069


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna merged pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-12 Thread GitBox



javanna merged PR #12072:
URL: https://github.com/apache/lucene/pull/12072


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #12078: Enhance XXXField#newRangeQuery

2023-01-12 Thread GitBox



jpountz commented on code in PR #12078:
URL: https://github.com/apache/lucene/pull/12078#discussion_r1068260998


##
lucene/core/src/java/org/apache/lucene/document/LongField.java:
##
@@ -108,8 +109,9 @@ public static Query newExactQuery(String field, long value) 
{
*/
   public static Query newRangeQuery(String field, long lowerValue, long 
upperValue) {
 PointRangeQuery.checkArgs(field, lowerValue, upperValue);
+Query fallbackQuery = LongPoint.newRangeQuery(field, lowerValue, 
upperValue);
 return new IndexOrDocValuesQuery(
-LongPoint.newRangeQuery(field, lowerValue, upperValue),
+new IndexSortSortedNumericDocValuesRangeQuery(field, lowerValue, 
upperValue, fallbackQuery),
 SortedNumericDocValuesField.newSlowRangeQuery(field, lowerValue, 
upperValue));

Review Comment:
   I would have wrapped the IndexOrDocValuesQuery within the 
IndexSortSortedNumericDocValuesRangeQuery instead. IndexOrDocValuesQuery is 
needed because the point query is bad at skipping and the doc values query is 
bad at iterating over all matches, but when an index sort is configured, 
`IndexSortSortedNumericDocValuesRangeQuery` is good at both so it should take 
precedence over both the index and the doc-values query?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dantuzi commented on pull request #12048: Move HNSW parameters to the HnswGraphBuilder class

2023-01-12 Thread GitBox



dantuzi commented on PR #12048:
URL: https://github.com/apache/lucene/pull/12048#issuecomment-1380490660

   @msokolov I'm finalizing another PR that works with HnswGraph and, to create 
the graph, I use the constants `DEFAULT_MAX_CONN` and `DEFAULT_BEAM_WIDTH` 
already defined. So, in the future, these constants will also be used outside 
the class `HnswVectorsFormat`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox



rmuir commented on code in PR #12054:
URL: https://github.com/apache/lucene/pull/12054#discussion_r1068156278


##
lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java:
##
@@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long 
lastModified) throws IOExcepti
   // field that is indexed (i.e. searchable), but don't tokenize
   // the field into separate words and don't index term frequency
   // or positional information:
-  Field pathField = new StringField("path", file.toString(), 
Field.Store.YES);
-  doc.add(pathField);
+  doc.add(new KeywordField("path", file.toString()));
+  doc.add(new StoredField("path", file.toString()));

Review Comment:
   and obviously, fixing this can be a followup to this PR. but we should do it 
before releasing the new APIs. These new fields are supposed to be easy to use, 
so they should support storing as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox



rmuir commented on code in PR #12054:
URL: https://github.com/apache/lucene/pull/12054#discussion_r1068154625


##
lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java:
##
@@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long 
lastModified) throws IOExcepti
   // field that is indexed (i.e. searchable), but don't tokenize
   // the field into separate words and don't index term frequency
   // or positional information:
-  Field pathField = new StringField("path", file.toString(), 
Field.Store.YES);
-  doc.add(pathField);
+  doc.add(new KeywordField("path", file.toString()));
+  doc.add(new StoredField("path", file.toString()));

Review Comment:
   look at current stored fields writer as example. All these damn java 
abstractions, yet our codec writer is doing **TYPE-GUESSING?**. Let's add a new 
method, so the codec knows the type and never guesses. This "guessing" belongs 
as an impl detail behind a new method in Field.java IMO (because Field.java 
tries to be a superhero and support all types). For structured types like 
KeywordField it should just be `return STRING`.
   
   ```
   Number number = field.numericValue();
   if (number != null) {
 if (number instanceof Byte || number instanceof Short || number 
instanceof Integer) {
   bits = NUMERIC_INT;
 } else if (number instanceof Long) {
   bits = NUMERIC_LONG;
 } else if (number instanceof Float) {
   bits = NUMERIC_FLOAT;
 } else if (number instanceof Double) {
   bits = NUMERIC_DOUBLE;
 } else {
   throw new IllegalArgumentException("cannot store numeric type " + 
number.getClass());
 }
 string = null;
 bytes = null;
   } else {
 bytes = field.binaryValue();
 if (bytes != null) {
   bits = BYTE_ARR;
   string = null;
 } else {
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox



rmuir commented on code in PR #12054:
URL: https://github.com/apache/lucene/pull/12054#discussion_r1068145212


##
lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java:
##
@@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long 
lastModified) throws IOExcepti
   // field that is indexed (i.e. searchable), but don't tokenize
   // the field into separate words and don't index term frequency
   // or positional information:
-  Field pathField = new StringField("path", file.toString(), 
Field.Store.YES);
-  doc.add(pathField);
+  doc.add(new KeywordField("path", file.toString()));
+  doc.add(new StoredField("path", file.toString()));

Review Comment:
   by self-created i mean, too many java abstractions / java abstractions are 
the things causing the issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox



rmuir commented on code in PR #12054:
URL: https://github.com/apache/lucene/pull/12054#discussion_r1068144382


##
lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java:
##
@@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long 
lastModified) throws IOExcepti
   // field that is indexed (i.e. searchable), but don't tokenize
   // the field into separate words and don't index term frequency
   // or positional information:
-  Field pathField = new StringField("path", file.toString(), 
Field.Store.YES);
-  doc.add(pathField);
+  doc.add(new KeywordField("path", file.toString()));
+  doc.add(new StoredField("path", file.toString()));

Review Comment:
   Right, its a self-created problem though, because we know it should go into 
storedfields as a string.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox



jpountz commented on code in PR #12054:
URL: https://github.com/apache/lucene/pull/12054#discussion_r1068141063


##
lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java:
##
@@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long 
lastModified) throws IOExcepti
   // field that is indexed (i.e. searchable), but don't tokenize
   // the field into separate words and don't index term frequency
   // or positional information:
-  Field pathField = new StringField("path", file.toString(), 
Field.Store.YES);
-  doc.add(pathField);
+  doc.add(new KeywordField("path", file.toString()));
+  doc.add(new StoredField("path", file.toString()));

Review Comment:
   I don't mind killing the BytesRef ctor, but I don't think it would be 
enough. We need this field to implement `binaryValue()` so that doc values can 
be indexed. But then stored fields are going to see a field where both 
`stringValue()` and `binaryValue()` return non-null, and it would be a problem 
for the current stored fields format which checks the binary value first, so 
this `KeywordField` would be considered as a binary field by stored fields.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #12081: Speed up DocIdMerger on sorted indexes.

2023-01-12 Thread GitBox



jpountz commented on PR #12081:
URL: https://github.com/apache/lucene/pull/12081#issuecomment-1380369577

   Here are timings of the first doc value merges on the IndexTaxis benchmark 
and a sorted dense index:
   
   ```
   SM 0 [2023-01-12T13:17:47.581987785Z; Thread-0]: 564 ms to merge doc values 
[367596 docs]
   SM 0 [2023-01-12T13:17:52.428441520Z; Thread-0]: 392 ms to merge doc values 
[365886 docs]
   SM 0 [2023-01-12T13:17:57.028935759Z; Thread-0]: 377 ms to merge doc values 
[365212 docs]
   SM 0 [2023-01-12T13:18:01.642897130Z; Thread-0]: 379 ms to merge doc values 
[367056 docs]
   SM 0 [2023-01-12T13:18:06.191013759Z; Thread-0]: 368 ms to merge doc values 
[364343 docs]
   SM 0 [2023-01-12T13:18:10.726133213Z; Thread-0]: 367 ms to merge doc values 
[363438 docs]
   SM 0 [2023-01-12T13:18:15.299072784Z; Thread-0]: 374 ms to merge doc values 
[366954 docs]
   SM 0 [2023-01-12T13:18:19.907346178Z; Thread-0]: 370 ms to merge doc values 
[365804 docs]
   SM 0 [2023-01-12T13:18:24.545160224Z; Thread-0]: 371 ms to merge doc values 
[366997 docs]
   SM 0 [2023-01-12T13:18:29.163147828Z; Thread-0]: 374 ms to merge doc values 
[367039 docs]
   SM 0 [2023-01-12T13:18:34.031698013Z; Thread-0]: 3709 ms to merge doc values 
[3660325 docs]
   SM 0 [2023-01-12T13:18:41.003270469Z; Thread-0]: 369 ms to merge doc values 
[365722 docs]
   SM 0 [2023-01-12T13:18:45.598362451Z; Thread-0]: 369 ms to merge doc values 
[365728 docs]
   SM 0 [2023-01-12T13:18:50.243855116Z; Thread-0]: 376 ms to merge doc values 
[367656 docs]
   SM 0 [2023-01-12T13:18:54.862259612Z; Thread-0]: 368 ms to merge doc values 
[365076 docs]
   SM 0 [2023-01-12T13:18:59.503139644Z; Thread-0]: 372 ms to merge doc values 
[365477 docs]
   SM 0 [2023-01-12T13:19:04.151794532Z; Thread-0]: 374 ms to merge doc values 
[367064 docs]
   SM 0 [2023-01-12T13:19:08.810873192Z; Thread-0]: 373 ms to merge doc values 
[367412 docs]
   SM 0 [2023-01-12T13:19:13.460733970Z; Thread-0]: 371 ms to merge doc values 
[366363 docs]
   SM 0 [2023-01-12T13:19:18.065982734Z; Thread-0]: 384 ms to merge doc values 
[366018 docs]
   SM 0 [2023-01-12T13:19:22.840259876Z; Thread-0]: 384 ms to merge doc values 
[365759 docs]
   SM 0 [2023-01-12T13:19:27.916927516Z; Thread-0]: 3842 ms to merge doc values 
[3662275 docs]
   SM 0 [2023-01-12T13:19:35.187165246Z; Thread-0]: 386 ms to merge doc values 
[366840 docs]
   ```
   
   Now the same log with the change for the same set of merges:
   ```
   SM 0 [2023-01-12T13:27:49.359139222Z; Thread-0]: 482 ms to merge doc values 
[367596 docs]
   SM 0 [2023-01-12T13:27:54.237137785Z; Thread-0]: 380 ms to merge doc values 
[365886 docs]
   SM 0 [2023-01-12T13:27:59.001734090Z; Thread-0]: 371 ms to merge doc values 
[365212 docs]
   SM 0 [2023-01-12T13:28:03.713076713Z; Thread-0]: 365 ms to merge doc values 
[367056 docs]
   SM 0 [2023-01-12T13:28:08.355025029Z; Thread-0]: 352 ms to merge doc values 
[364343 docs]
   SM 0 [2023-01-12T13:28:13.032506866Z; Thread-0]: 352 ms to merge doc values 
[363438 docs]
   SM 0 [2023-01-12T13:28:17.722795068Z; Thread-0]: 354 ms to merge doc values 
[366954 docs]
   SM 0 [2023-01-12T13:28:22.468102730Z; Thread-0]: 367 ms to merge doc values 
[365804 docs]
   SM 0 [2023-01-12T13:28:27.314385233Z; Thread-0]: 363 ms to merge doc values 
[366997 docs]
   SM 0 [2023-01-12T13:28:32.162119724Z; Thread-0]: 366 ms to merge doc values 
[367039 docs]
   SM 0 [2023-01-12T13:28:37.027154229Z; Thread-0]: 3678 ms to merge doc values 
[3660325 docs]
   SM 0 [2023-01-12T13:28:44.367846338Z; Thread-0]: 364 ms to merge doc values 
[365722 docs]
   SM 0 [2023-01-12T13:28:49.208172350Z; Thread-0]: 365 ms to merge doc values 
[365728 docs]
   SM 0 [2023-01-12T13:28:54.096854303Z; Thread-0]: 367 ms to merge doc values 
[367656 docs]
   SM 0 [2023-01-12T13:28:58.937724069Z; Thread-0]: 355 ms to merge doc values 
[365076 docs]
   SM 0 [2023-01-12T13:29:03.645241892Z; Thread-0]: 350 ms to merge doc values 
[365477 docs]
   SM 0 [2023-01-12T13:29:08.355256074Z; Thread-0]: 355 ms to merge doc values 
[367064 docs]
   SM 0 [2023-01-12T13:29:13.091695689Z; Thread-0]: 353 ms to merge doc values 
[367412 docs]
   SM 0 [2023-01-12T13:29:17.813632707Z; Thread-0]: 353 ms to merge doc values 
[366363 docs]
   SM 0 [2023-01-12T13:29:22.522670731Z; Thread-0]: 355 ms to merge doc values 
[366018 docs]
   SM 0 [2023-01-12T13:29:27.252547913Z; Thread-0]: 357 ms to merge doc values 
[365759 docs]
   SM 0 [2023-01-12T13:29:32.005065809Z; Thread-0]: 3580 ms to merge doc values 
[3662275 docs]
   SM 0 [2023-01-12T13:29:39.224502172Z; Thread-0]: 359 ms to merge doc values 
[366840 docs]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [lucene] jpountz opened a new pull request, #12081: Speed up DocIdMerger on sorted indexes.

2023-01-12 Thread GitBox



jpountz opened a new pull request, #12081:
URL: https://github.com/apache/lucene/pull/12081

   In the case when an index is sorted on a low-cardinality field, or the index 
sort order correlates with the order in which documents get ingested, we can 
optimize `SortedDocIDMerger` by doing a single comparison with the doc ID on 
the next sub. This checks covers at the same time whether the priority queue 
needs reordering and whether the current sub reached `NO_MORE_DOCS`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mmatela opened a new issue, #12080: SynonymGraphFilter: wrong output token position when input positions overlap

2023-01-12 Thread GitBox

mmatela opened a new issue, #12080:
URL: https://github.com/apache/lucene/issues/12080

### Description

In my example, the query is 'test polskie'.
I use MorfologikFilter for Polish stemming, it turns 'polskie' into 'polski'
+ 'polskie'.
I also use SynonymGraphFilter which turns 'polski' into 'pol'. It's applied
**only for query**.
Here's what I see in quey analysis (token position in parenthesis):
```
Tokenizer: test(1) polskie(2)
MF: test(1) polskie(2) polski(2)
SGF: test(1) polskie(2) pol(3) polski(3).
```
When I search for "test polskie" with quotation marks, a document with the
same text doesn't match, because SGF changes positions of tokens in query
compared to index.

In documentation, the description for the old `SynonymFilter` says "_The
position value of the new tokens are set such they all occur at the same
position as the original token._" In `SynonymGraphFilter` instead they are set
to a position after the previous token. Is that an intentional change? Doesn't
seem so, because it doesn't work as expected in my example.

Looking at the code, it seems the problem is in
https://github.com/apache/lucene/blob/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymGraphFilter.java#L246:
`nextNodeOut = lastNodeOut + posLenAtt.getPositionLength();`

`nextNodeOut` is always set as the position after the current token, and
that is later used as position of output token.
I tried to remove this line and instead set this field right after the call
to `input.incrementToken()`, in line 340:
`nextNodeOut = lastNodeOut + posIncrAtt.getPositionIncrement();`
This sets it to the original token's position. This way the final positions
are `SGF: test(1) polskie(2) pol(2) polski(2).` and my document does match. I
didn't experience any unexpected side effects.

Hope this helps. I'm not familiar with the project enough to easilly submit
a proper pull request, with tests and all.

### Version and environment details

lucene 9.4

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox



rmuir commented on code in PR #12054:
URL: https://github.com/apache/lucene/pull/12054#discussion_r1068105620


##
lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java:
##
@@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long 
lastModified) throws IOExcepti
   // field that is indexed (i.e. searchable), but don't tokenize
   // the field into separate words and don't index term frequency
   // or positional information:
-  Field pathField = new StringField("path", file.toString(), 
Field.Store.YES);
-  doc.add(pathField);
+  doc.add(new KeywordField("path", file.toString()));
+  doc.add(new StoredField("path", file.toString()));

Review Comment:
   as far as the LongField etc, we should deal with that in another issue. I 
agree it should support Field.Store.YES/NO. The things are you speak of are not 
barriers to that. They are self-created problems that we can fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox



rmuir commented on code in PR #12054:
URL: https://github.com/apache/lucene/pull/12054#discussion_r1068103957


##
lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java:
##
@@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long 
lastModified) throws IOExcepti
   // field that is indexed (i.e. searchable), but don't tokenize
   // the field into separate words and don't index term frequency
   // or positional information:
-  Field pathField = new StringField("path", file.toString(), 
Field.Store.YES);
-  doc.add(pathField);
+  doc.add(new KeywordField("path", file.toString()));
+  doc.add(new StoredField("path", file.toString()));

Review Comment:
   Whoah... this field is for strings. Kill the BytesRef constructor. It is 
enough to pass a String. and you can support Field.Store.YES/NO



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox



jpountz commented on code in PR #12054:
URL: https://github.com/apache/lucene/pull/12054#discussion_r1068102375


##
lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java:
##
@@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long 
lastModified) throws IOExcepti
   // field that is indexed (i.e. searchable), but don't tokenize
   // the field into separate words and don't index term frequency
   // or positional information:
-  Field pathField = new StringField("path", file.toString(), 
Field.Store.YES);
-  doc.add(pathField);
+  doc.add(new KeywordField("path", file.toString()));
+  doc.add(new StoredField("path", file.toString()));

Review Comment:
   Right. The challenge I'm seeing is that to index both points and doc values 
on numeric fields, we're creating a field that produces a binaryValue() 
consumed by points, as well as a numeric value consumed by doc values. But 
stored fields can store both binary and numeric data, so how should they know 
which value they should look at?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #12079: Speed up 1D BKD merging.

2023-01-12 Thread GitBox



jpountz commented on PR #12079:
URL: https://github.com/apache/lucene/pull/12079#issuecomment-1380286154

   I remember thinking about it, and there are legitimate use-cases for 
`Arrays#compareUnsigned` like `BytesRef#compareTo`. Another thing is that 
`ArrayUtil#getUnsignedComparator` only helps if we expect the lengths to 
compare to be commonly 4 or 8, otherwise it just wraps 
`Arrays#compareUnsigned`. But to your point, maybe we could do another round or 
review of all call sites or `Arrays#compareUnsigned` to see if there are some 
of them that should switch to `ArrayUtil#getUnsignedComparator`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

2023-01-12 Thread GitBox



rmuir commented on code in PR #12054:
URL: https://github.com/apache/lucene/pull/12054#discussion_r1068060646


##
lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java:
##
@@ -234,8 +234,8 @@ void indexDoc(IndexWriter writer, Path file, long 
lastModified) throws IOExcepti
   // field that is indexed (i.e. searchable), but don't tokenize
   // the field into separate words and don't index term frequency
   // or positional information:
-  Field pathField = new StringField("path", file.toString(), 
Field.Store.YES);
-  doc.add(pathField);
+  doc.add(new KeywordField("path", file.toString()));
+  doc.add(new StoredField("path", file.toString()));

Review Comment:
   > If you also need to store the value, you should add a separate {@link 
StoredField} instance.
   
   Let's rethink this for the new fields we are adding. I think storing is 
quite common and offering Field.Store.YES/NO is the best choice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] iverase commented on pull request #12079: Speed up 1D BKD merging.

2023-01-12 Thread GitBox



iverase commented on PR #12079:
URL: https://github.com/apache/lucene/pull/12079#issuecomment-1380193686

   I wonder if we should add `Arrays.compareUnsigned` to forbidden APIs to 
force always to use the faster comparators.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna commented on pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-12 Thread GitBox



javanna commented on PR #12072:
URL: https://github.com/apache/lucene/pull/12072#issuecomment-1380141646

   @benwtrent could you add a changelog entry too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on issue #10458: Ordered intervals can give inaccurate hits on interleaved terms [LUCENE-9418]

2023-01-12 Thread GitBox



romseygeek commented on issue #10458:
URL: https://github.com/apache/lucene/issues/10458#issuecomment-1380139574

   Hi @Brain2000, yes please open a separate issue.  If you could include a 
reproduction then we can work out if it's an issue with TopFieldCollector 
itself, or with how you're using it and if we need to improve the API 
documentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz opened a new pull request, #12079: Speed up 1D BKD merging.

2023-01-12 Thread GitBox



jpountz opened a new pull request, #12079:
URL: https://github.com/apache/lucene/pull/12079

   On the NYC taxis dataset on my local machine, switching from 
`Arrays#compareUnsigned` to `ArrayUtil#getUnsignedComparator` yielded a 15% 
speedup of BKD merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

2023-01-12 Thread GitBox



jpountz commented on code in PR #12072:
URL: https://github.com/apache/lucene/pull/12072#discussion_r1067925752


##
lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java:
##
@@ -203,9 +203,18 @@ BooleanQuery rewriteNoScoring(IndexSearcher indexSearcher) 
throws IOException {
 
 for (BooleanClause clause : clauses) {
   Query query = clause.getQuery();
-  Query rewritten = new ConstantScoreQuery(query).rewrite(indexSearcher);
-  if (rewritten instanceof ConstantScoreQuery) {
-rewritten = ((ConstantScoreQuery) rewritten).getQuery();
+  // NOTE: rewritingNoScoring() should not call rewrite(), otherwise this
+  // method could run in exponential time with the depth of the query as
+  // every new level would rewrite 2x more than its parent level.
+  Query rewritten = query;
+  if (query instanceof BoostQuery) {
+rewritten = ((BoostQuery) query).getQuery();
+  }
+  if (query instanceof ConstantScoreQuery) {
+rewritten = ((ConstantScoreQuery) query).getQuery();
+  }
+  if (query instanceof BooleanQuery) {
+rewritten = ((BooleanQuery) query).rewriteNoScoring(indexSearcher);

Review Comment:
   Sorry for the confusion, my suggestion should have kept the call to 
rewriteNoScoring on boolean queries, otherwise we're missing simplifications on 
inner queries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #12053: Allow reusing indexed binary fields.

2023-01-12 Thread GitBox



jpountz merged PR #12053:
URL: https://github.com/apache/lucene/pull/12053


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #12053: Allow reusing indexed binary fields.

2023-01-12 Thread GitBox



jpountz commented on PR #12053:
URL: https://github.com/apache/lucene/pull/12053#issuecomment-1379974864

   Thanks @rmuir !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang commented on a diff in pull request #12078: Enhance XXXField#newRangeQuery

[GitHub] [lucene] gsmiller commented on pull request #12073: Move ReqExclScorer exclusion checking into first-phase when the exclusion Scorer has no second-phase check

[GitHub] [lucene] zhaih commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

[GitHub] [lucene] Brain2000 opened a new issue, #12082: LeafFieldComparator setBottom not being called before compareBottom

[GitHub] [lucene] jpountz commented on pull request #11807: No need to rewrite queries in unified highlighter

[GitHub] [lucene] jpountz commented on pull request #11779: GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori

[GitHub] [lucene] jpountz merged pull request #12081: Speed up DocIdMerger on sorted indexes.

[GitHub] [lucene] jpountz merged pull request #12079: Speed up 1D BKD merging.

[GitHub] [lucene] javanna commented on pull request #12072: Fix exponential runtime for Boolean#rewrite

[GitHub] [lucene] javanna closed issue #12069: Long rewrite times for deeply nested, non-scoring Boolean queries

[GitHub] [lucene] javanna merged pull request #12072: Fix exponential runtime for Boolean#rewrite

[GitHub] [lucene] jpountz commented on a diff in pull request #12078: Enhance XXXField#newRangeQuery

[GitHub] [lucene] dantuzi commented on pull request #12048: Move HNSW parameters to the HnswGraphBuilder class

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

[GitHub] [lucene] jpountz commented on a diff in pull request #12054: Introduce a new `KeywordField`.

[GitHub] [lucene] jpountz commented on pull request #12081: Speed up DocIdMerger on sorted indexes.

[GitHub] [lucene] jpountz opened a new pull request, #12081: Speed up DocIdMerger on sorted indexes.

[GitHub] [lucene] mmatela opened a new issue, #12080: SynonymGraphFilter: wrong output token position when input positions overlap

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

[GitHub] [lucene] jpountz commented on a diff in pull request #12054: Introduce a new `KeywordField`.

[GitHub] [lucene] jpountz commented on pull request #12079: Speed up 1D BKD merging.

[GitHub] [lucene] rmuir commented on a diff in pull request #12054: Introduce a new `KeywordField`.

[GitHub] [lucene] iverase commented on pull request #12079: Speed up 1D BKD merging.

[GitHub] [lucene] javanna commented on pull request #12072: Fix exponential runtime for Boolean#rewrite

[GitHub] [lucene] romseygeek commented on issue #10458: Ordered intervals can give inaccurate hits on interleaved terms [LUCENE-9418]

[GitHub] [lucene] jpountz opened a new pull request, #12079: Speed up 1D BKD merging.

[GitHub] [lucene] jpountz commented on a diff in pull request #12072: Fix exponential runtime for Boolean#rewrite

[GitHub] [lucene] jpountz merged pull request #12053: Allow reusing indexed binary fields.

[GitHub] [lucene] jpountz commented on pull request #12053: Allow reusing indexed binary fields.

34 matches

Site Navigation

Mail list logo

Footer information