date:20220628

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Ignacio Vera (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Ignacio Vera commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 I think you are assuming that we always visit all terms in the index. The iteration might be driven by the result of a query that contains a subset of the documents. In pseudo-code would look like: 

 

DocIdSetIterator iterator = executeQuery();
SortedDocValues sortedDocValues = getSortedDocValues();
int doc = iterator.nextDoc();
while (doc != DocIdSetIterator.NO_MORE_DOCS) {
if (sortedDocValues.advanceExact(doc)) {
BytesRef bytesRef = sortedDocValues.lookupOrd(sortedDocValues.ordValue());
consume(bytesRef);
// advance our iterator to the next document with different ordinal
doc = iterator.advance(sortedDocValues.advanceOrd());
} else {
doc = iterator.nextDoc();
}
}
 

 I don't see how to do this efficiently with the inverted index only.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] mocobeta commented on issue #993: Raise a test issue

2022-06-28 Thread GitBox



mocobeta commented on issue #993:
URL: https://github.com/apache/lucene/issues/993#issuecomment-1169535375

   Thanks @LuXugang for noticing this.
   
   To devs,
   I will keep issues enabled for a while, please feel free to test and play 
around it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang commented on issue #993: Raise a test issue

2022-06-28 Thread GitBox



LuXugang commented on issue #993:
URL: https://github.com/apache/lucene/issues/993#issuecomment-1169533099

   A new start, Thanks for your great work 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on issue #994: Notification test

2022-06-28 Thread GitBox



mocobeta commented on issue #994:
URL: https://github.com/apache/lucene/issues/994#issuecomment-1169533110

   Confirmed that notification mails were sent to issues@lucene.apache.org when 
opening/closing an issue, and adding comments.
   Looks fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta closed issue #994: Notification test

2022-06-28 Thread GitBox



mocobeta closed issue #994: Notification test
URL: https://github.com/apache/lucene/issues/994


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta opened a new issue, #994: Notification test

2022-06-28 Thread GitBox



mocobeta opened a new issue, #994:
URL: https://github.com/apache/lucene/issues/994

   Notifications should be sent to `issues@lucene.apache.org`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-28 Thread GitBox



zacharymorn commented on PR #972:
URL: https://github.com/apache/lucene/pull/972#issuecomment-1169526849

   Here are the latest benchmark results after the update:
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
Prefix3  196.69  (6.3%)  189.02  
(8.6%)   -3.9% ( -17% -   11%) 0.102
   HighTermTitleBDVSort   33.24 (11.6%)   32.31 
(10.3%)   -2.8% ( -22% -   21%) 0.421
  HighTermDayOfYearSort  137.58 (10.1%)  135.58  
(9.7%)   -1.4% ( -19% -   20%) 0.644
 OrHighMedDayTaxoFacets   25.11  (6.6%)   24.81  
(8.8%)   -1.2% ( -15% -   15%) 0.620
   Wildcard  348.15  (6.8%)  343.94  
(6.3%)   -1.2% ( -13% -   12%) 0.559
 TermDTSort  188.94 (10.2%)  187.75  
(9.7%)   -0.6% ( -18% -   21%) 0.841
  HighTermMonthSort  192.52  (9.7%)  191.37  
(9.3%)   -0.6% ( -17% -   20%) 0.842
MedTerm 2947.19  (3.0%) 2936.13  
(3.7%)   -0.4% (  -6% -6%) 0.726
   HighTerm 3104.91  (3.5%) 3100.88  
(5.0%)   -0.1% (  -8% -8%) 0.925
LowSloppyPhrase   54.77  (0.8%)   54.75  
(2.0%)   -0.0% (  -2% -2%) 0.935
   MedTermDayTaxoFacets   93.60  (4.1%)   93.60  
(4.6%)0.0% (  -8% -9%) 0.998
   OrNotHighMed 1661.26  (2.1%) 1661.95  
(3.1%)0.0% (  -5% -5%) 0.960
AndHighHigh   63.20  (4.6%)   63.25  
(5.1%)0.1% (  -9% -   10%) 0.956
   AndHighHighDayTaxoFacets   58.91  (1.1%)   58.96  
(1.4%)0.1% (  -2% -2%) 0.831
 HighPhrase 1040.52  (2.0%) 1041.64  
(2.2%)0.1% (  -3% -4%) 0.869
  MedPhrase   61.40  (2.5%)   61.48  
(3.3%)0.1% (  -5% -6%) 0.887
MedSloppyPhrase  156.44  (1.6%)  156.80  
(3.9%)0.2% (  -5% -5%) 0.808
  LowPhrase  348.79  (1.1%)  349.68  
(2.1%)0.3% (  -2% -3%) 0.633
 Fuzzy1  141.81  (2.3%)  142.24  
(1.6%)0.3% (  -3% -4%) 0.633
   OrHighNotMed 1471.87  (2.6%) 1476.69  
(3.5%)0.3% (  -5% -6%) 0.737
  OrNotHighHigh 1115.16  (2.6%) 1119.36  
(3.6%)0.4% (  -5% -6%) 0.704
  OrHighNotHigh 1434.59  (3.0%) 1440.05  
(2.8%)0.4% (  -5% -6%) 0.679
AndHighMedDayTaxoFacets  117.61  (2.8%)  118.17  
(3.3%)0.5% (  -5% -6%) 0.623
 AndHighMed  122.71  (4.5%)  123.41  
(5.3%)0.6% (  -8% -   10%) 0.716
   HighSloppyPhrase   10.46  (2.6%)   10.52  
(3.7%)0.6% (  -5% -7%) 0.552
Respell   77.60  (2.9%)   78.10  
(2.4%)0.6% (  -4% -6%) 0.453
LowIntervalsOrdered   59.06  (2.0%)   59.52  
(2.4%)0.8% (  -3% -5%) 0.263
   BrowseDateSSDVFacets4.59 (32.0%)4.63 
(32.4%)0.8% ( -48% -   96%) 0.938
LowSpanNear  168.24  (2.0%)  169.63  
(2.3%)0.8% (  -3% -5%) 0.225
MedSpanNear   29.72  (2.7%)   29.98  
(3.1%)0.9% (  -4% -6%) 0.354
 AndHighLow 1433.34  (5.4%) 1446.44  
(5.3%)0.9% (  -9% -   12%) 0.589
   OrHighNotLow 1989.50  (3.0%) 2010.48  
(4.2%)1.1% (  -5% -8%) 0.359
   PKLookup  284.02  (4.4%)  287.16  
(5.4%)1.1% (  -8% -   11%) 0.476
   HighSpanNear   11.27  (3.3%)   11.39  
(4.3%)1.1% (  -6% -9%) 0.346
 Fuzzy2  161.50  (2.4%)  163.37  
(1.2%)1.2% (  -2% -4%) 0.051
   OrNotHighLow 1490.10  (3.6%) 1511.49  
(3.5%)1.4% (  -5% -8%) 0.198
BrowseRandomLabelSSDVFacets   20.26 (16.3%)   20.65  
(5.0%)1.9% ( -16% -   27%) 0.616
LowTerm 4286.35  (3.9%) 4381.66  
(5.6%)2.2% (  -7% -   12%) 0.145
MedIntervalsOrdered   23.28  (4.9%)   23.83  
(6.4%)2.4% (  -8% -   14%) 0.185
  BrowseDayOfYearSSDVFacets   26.94  (3.8%)   27.59  
(9.0%)2.4% ( -10% -   15%) 0.268
  BrowseMonthSSDVFacets   29.56  (9.1%)   30.49 
(13.4%)3.2% (

[GitHub] [lucene] mocobeta commented on issue #993: Raise a test issue

2022-06-28 Thread GitBox



mocobeta commented on issue #993:
URL: https://github.com/apache/lucene/issues/993#issuecomment-1169525142

   Verified that external contributors cannot modify labels.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta opened a new issue, #993: Raise a test issue

2022-06-28 Thread GitBox



mocobeta opened a new issue, #993:
URL: https://github.com/apache/lucene/issues/993

   It's a test issue.
   Please do not use GitHub issues until the migration will have done; Issues 
should be raised in [Jira](https://issues.apache.org/jira/projects/LUCENE/).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread ASF subversion and git services (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 ASF subversion and git services commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 Commit 64321114e1d8579e52376a97f5eb3e4cd13338e8 in lucene's branch refs/heads/main from Tomoko Uchida [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=64321114e1d ] LUCENE-10557: temprarily enable github issue (#988)  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] mocobeta merged pull request #988: LUCENE-10557: temprarily enable github issue

2022-06-28 Thread GitBox



mocobeta merged PR #988:
URL: https://github.com/apache/lucene/pull/988


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-28 Thread GitBox



zacharymorn commented on code in PR #972:
URL: https://github.com/apache/lucene/pull/972#discussion_r909134436


##
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java:
##
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.LinkedList;
+import java.util.List;
+
+/** Scorer implementing Block-Max Maxscore algorithm */
+public class BlockMaxMaxscoreScorer extends Scorer {
+  private final ScoreMode scoreMode;
+
+  // current doc ID of the leads
+  private int doc;
+
+  // doc id boundary that all scorers maxScore are valid
+  private int upTo = -1;
+
+  // heap of scorers ordered by doc ID
+  private final DisiPriorityQueue essentialsScorers;
+  // list of scorers ordered by maxScore
+  private final LinkedList maxScoreSortedEssentialScorers;
+
+  private final DisiWrapper[] allScorers;
+
+  // sum of max scores of scorers in nonEssentialScorers list
+  private float nonEssentialMaxScoreSum;
+
+  private long cost;
+
+  private final MaxScoreSumPropagator maxScoreSumPropagator;
+
+  // scaled min competitive score
+  private float minCompetitiveScore = 0;
+
+  /**
+   * Constructs a Scorer
+   *
+   * @param weight The weight to be used.
+   * @param scorers The sub scorers this Scorer should iterate on for optional 
clauses
+   * @param scoreMode The scoreMode
+   */
+  public BlockMaxMaxscoreScorer(Weight weight, List scorers, ScoreMode 
scoreMode)
+  throws IOException {
+super(weight);
+assert scoreMode == ScoreMode.TOP_SCORES;
+
+this.scoreMode = scoreMode;
+this.doc = -1;
+
+this.allScorers = new DisiWrapper[scorers.size()];
+int i = 0;
+this.essentialsScorers = new DisiPriorityQueue(scorers.size());
+this.maxScoreSortedEssentialScorers = new LinkedList<>();
+
+long cost = 0;
+for (Scorer scorer : scorers) {
+  DisiWrapper w = new DisiWrapper(scorer);
+  cost += w.cost;
+  allScorers[i++] = w;
+}
+
+this.cost = cost;
+maxScoreSumPropagator = new MaxScoreSumPropagator(scorers);
+  }
+
+  @Override
+  public DocIdSetIterator iterator() {
+// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee
+return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator());
+  }
+
+  @Override
+  public TwoPhaseIterator twoPhaseIterator() {
+DocIdSetIterator approximation =
+new DocIdSetIterator() {
+
+  @Override
+  public int docID() {
+return doc;
+  }
+
+  @Override
+  public int nextDoc() throws IOException {
+return advance(doc + 1);
+  }
+
+  @Override
+  public int advance(int target) throws IOException {
+while (true) {
+
+  if (target > upTo) {
+updateMaxScoresAndLists(target);
+  } else {
+// minCompetitiveScore might have increased,
+// move potentially no-longer-competitive scorers from 
essential to non-essential
+// list
+movePotentiallyNonCompetitiveScorers();
+  }
+
+  assert target <= upTo;
+
+  DisiWrapper top = essentialsScorers.top();
+
+  if (top == null) {
+// all scorers in non-essential list, skip to next boundary or 
return no_more_docs
+if (upTo == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else {
+  target = upTo + 1;
+}
+  } else {
+// position all scorers in essential list to on or after target
+while (top.doc < target) {
+  top.doc = top.iterator.advance(target);
+  top = essentialsScorers.updateTop();
+}
+
+if (top.doc == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else if (top.doc > upTo) {
+  target = upTo + 1;
+} else {
+  float

[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-28 Thread GitBox



zacharymorn commented on code in PR #972:
URL: https://github.com/apache/lucene/pull/972#discussion_r909131493


##
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java:
##
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.LinkedList;
+import java.util.List;
+
+/** Scorer implementing Block-Max Maxscore algorithm */
+public class BlockMaxMaxscoreScorer extends Scorer {
+  private final ScoreMode scoreMode;
+
+  // current doc ID of the leads
+  private int doc;
+
+  // doc id boundary that all scorers maxScore are valid
+  private int upTo = -1;
+
+  // heap of scorers ordered by doc ID
+  private final DisiPriorityQueue essentialsScorers;
+  // list of scorers ordered by maxScore
+  private final LinkedList maxScoreSortedEssentialScorers;
+
+  private final DisiWrapper[] allScorers;
+
+  // sum of max scores of scorers in nonEssentialScorers list
+  private float nonEssentialMaxScoreSum;
+
+  private long cost;
+
+  private final MaxScoreSumPropagator maxScoreSumPropagator;
+
+  // scaled min competitive score
+  private float minCompetitiveScore = 0;
+
+  /**
+   * Constructs a Scorer
+   *
+   * @param weight The weight to be used.
+   * @param scorers The sub scorers this Scorer should iterate on for optional 
clauses
+   * @param scoreMode The scoreMode
+   */
+  public BlockMaxMaxscoreScorer(Weight weight, List scorers, ScoreMode 
scoreMode)
+  throws IOException {
+super(weight);
+assert scoreMode == ScoreMode.TOP_SCORES;
+
+this.scoreMode = scoreMode;
+this.doc = -1;
+
+this.allScorers = new DisiWrapper[scorers.size()];
+int i = 0;
+this.essentialsScorers = new DisiPriorityQueue(scorers.size());
+this.maxScoreSortedEssentialScorers = new LinkedList<>();
+
+long cost = 0;
+for (Scorer scorer : scorers) {
+  DisiWrapper w = new DisiWrapper(scorer);
+  cost += w.cost;
+  allScorers[i++] = w;
+}
+
+this.cost = cost;
+maxScoreSumPropagator = new MaxScoreSumPropagator(scorers);
+  }
+
+  @Override
+  public DocIdSetIterator iterator() {
+// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee
+return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator());
+  }
+
+  @Override
+  public TwoPhaseIterator twoPhaseIterator() {
+DocIdSetIterator approximation =
+new DocIdSetIterator() {
+
+  @Override
+  public int docID() {
+return doc;
+  }
+
+  @Override
+  public int nextDoc() throws IOException {
+return advance(doc + 1);
+  }
+
+  @Override
+  public int advance(int target) throws IOException {
+while (true) {
+
+  if (target > upTo) {
+updateMaxScoresAndLists(target);
+  } else {
+// minCompetitiveScore might have increased,
+// move potentially no-longer-competitive scorers from 
essential to non-essential
+// list
+movePotentiallyNonCompetitiveScorers();
+  }
+
+  assert target <= upTo;
+
+  DisiWrapper top = essentialsScorers.top();
+
+  if (top == null) {
+// all scorers in non-essential list, skip to next boundary or 
return no_more_docs
+if (upTo == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else {
+  target = upTo + 1;
+}
+  } else {
+// position all scorers in essential list to on or after target
+while (top.doc < target) {
+  top.doc = top.iterator.advance(target);
+  top = essentialsScorers.updateTop();
+}
+
+if (top.doc == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else if (top.doc > upTo) {
+  target = upTo + 1;
+} else {
+  float

[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-28 Thread GitBox



zacharymorn commented on code in PR #972:
URL: https://github.com/apache/lucene/pull/972#discussion_r909131198


##
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java:
##
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.LinkedList;
+import java.util.List;
+
+/** Scorer implementing Block-Max Maxscore algorithm */
+public class BlockMaxMaxscoreScorer extends Scorer {
+  private final ScoreMode scoreMode;
+
+  // current doc ID of the leads
+  private int doc;
+
+  // doc id boundary that all scorers maxScore are valid
+  private int upTo = -1;
+
+  // heap of scorers ordered by doc ID
+  private final DisiPriorityQueue essentialsScorers;
+  // list of scorers ordered by maxScore
+  private final LinkedList maxScoreSortedEssentialScorers;
+
+  private final DisiWrapper[] allScorers;
+
+  // sum of max scores of scorers in nonEssentialScorers list
+  private float nonEssentialMaxScoreSum;
+
+  private long cost;
+
+  private final MaxScoreSumPropagator maxScoreSumPropagator;
+
+  // scaled min competitive score
+  private float minCompetitiveScore = 0;
+
+  /**
+   * Constructs a Scorer
+   *
+   * @param weight The weight to be used.
+   * @param scorers The sub scorers this Scorer should iterate on for optional 
clauses
+   * @param scoreMode The scoreMode
+   */
+  public BlockMaxMaxscoreScorer(Weight weight, List scorers, ScoreMode 
scoreMode)
+  throws IOException {
+super(weight);
+assert scoreMode == ScoreMode.TOP_SCORES;
+
+this.scoreMode = scoreMode;
+this.doc = -1;
+
+this.allScorers = new DisiWrapper[scorers.size()];
+int i = 0;
+this.essentialsScorers = new DisiPriorityQueue(scorers.size());
+this.maxScoreSortedEssentialScorers = new LinkedList<>();
+
+long cost = 0;
+for (Scorer scorer : scorers) {
+  DisiWrapper w = new DisiWrapper(scorer);
+  cost += w.cost;
+  allScorers[i++] = w;
+}
+
+this.cost = cost;
+maxScoreSumPropagator = new MaxScoreSumPropagator(scorers);
+  }
+
+  @Override
+  public DocIdSetIterator iterator() {
+// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee
+return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator());
+  }
+
+  @Override
+  public TwoPhaseIterator twoPhaseIterator() {
+DocIdSetIterator approximation =
+new DocIdSetIterator() {
+
+  @Override
+  public int docID() {
+return doc;
+  }
+
+  @Override
+  public int nextDoc() throws IOException {
+return advance(doc + 1);
+  }
+
+  @Override
+  public int advance(int target) throws IOException {
+while (true) {
+
+  if (target > upTo) {
+updateMaxScoresAndLists(target);
+  } else {
+// minCompetitiveScore might have increased,
+// move potentially no-longer-competitive scorers from 
essential to non-essential
+// list
+movePotentiallyNonCompetitiveScorers();
+  }
+
+  assert target <= upTo;
+
+  DisiWrapper top = essentialsScorers.top();
+
+  if (top == null) {
+// all scorers in non-essential list, skip to next boundary or 
return no_more_docs
+if (upTo == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else {
+  target = upTo + 1;
+}
+  } else {
+// position all scorers in essential list to on or after target
+while (top.doc < target) {
+  top.doc = top.iterator.advance(target);
+  top = essentialsScorers.updateTop();
+}
+
+if (top.doc == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else if (top.doc > upTo) {
+  target = upTo + 1;
+} else {
+  float

[GitHub] [lucene] mayya-sharipova commented on pull request #992: LUCENE-105902 Build HNSW Graph on indexing

2022-06-28 Thread GitBox



mayya-sharipova commented on PR #992:
URL: https://github.com/apache/lucene/pull/992#issuecomment-1169371144

   I still need to run benchmarking to make sure there are no regression, but 
opening a PR to get an initial review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova opened a new pull request, #992: LUCENE-105902 Build HNSW Graph on indexing

2022-06-28 Thread GitBox



mayya-sharipova opened a new pull request, #992:
URL: https://github.com/apache/lucene/pull/992

   Currently, when indexing knn vectors, we buffer them in memory and
   on flush during a segment construction we build an HNSW graph.
   As building an HNSW graph is very expensive, this makes flush
   operation take a lot of time. This also makes overall indexing
   performance quite unpredictable – some indexing operations return
   almost instantly while others that trigger flush take a lot of time.
   This happens because flushes are unpredictable and trigged
   by memory used, presence of concurrent searches etc.
   
   Building an HNSW graph as we index vectors avoid these problems,
   as the load of HNSW graph construction is spread evenly during indexing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts

2022-06-28 Thread GitBox



gsmiller commented on code in PR #974:
URL: https://github.com/apache/lucene/pull/974#discussion_r908942536


##
lucene/demo/src/java/org/apache/lucene/demo/facet/DistanceFacetsExample.java:
##
@@ -212,7 +212,26 @@ public static Query getBoundingBoxQuery(
   }
 
   /** User runs a query and counts facets. */
-  public FacetResult search() throws IOException {
+  public FacetResult searchAllChildren() throws IOException {
+
+FacetsCollector fc = searcher.search(new MatchAllDocsQuery(), new 
FacetsCollectorManager());
+
+Facets facets =
+new DoubleRangeFacetCounts(
+"field",
+getDistanceValueSource(),
+fc,
+getBoundingBoxQuery(ORIGIN_LATITUDE, ORIGIN_LONGITUDE, 10.0),
+ONE_KM,
+TWO_KM,
+FIVE_KM,
+TEN_KM);
+
+return facets.getAllChildren("field");
+  }
+
+  /** User runs a query and counts facets. */
+  public FacetResult searchTopChildren() throws IOException {

Review Comment:
   OK thanks. I'm not opposed to demoing a `getTopChildren` example for what 
it's worth, but if we want to do that, let's try to come up with a real-world 
type use-case where a user might care about the "top" values?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts

2022-06-28 Thread GitBox



gsmiller commented on code in PR #974:
URL: https://github.com/apache/lucene/pull/974#discussion_r908941811


##
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##
@@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... 
path) throws IOException
 return new FacetResult(dim, path, totCount, labelValues, 
labelValues.length);
   }
 
-  // The current getTopChildren method is not returning "top" ranges. Instead, 
it returns all
-  // user-provided ranges in
-  // the order the user specified them when instantiating. This concept is 
being introduced and
-  // supported in the
-  // getAllChildren functionality in LUCENE-10550. getTopChildren is 
temporarily calling
-  // getAllChildren to maintain its
-  // current behavior, and the current implementation will be replaced by an 
actual "top children"
-  // implementation
-  // in LUCENE-10614
-  // TODO: fix getTopChildren in LUCENE-10614
   @Override
   public FacetResult getTopChildren(int topN, String dim, String... path) 
throws IOException {
 validateTopN(topN);
-return getAllChildren(dim, path);
+validateDimAndPathForGetChildren(dim, path);
+
+int resultSize = Math.min(topN, counts.length);
+PriorityQueue pq =
+new PriorityQueue<>(resultSize) {
+  @Override
+  protected boolean lessThan(LabelAndValue a, LabelAndValue b) {
+int cmp = Integer.compare(a.value.intValue(), b.value.intValue());
+if (cmp == 0) {
+  cmp = b.label.compareTo(a.label);
+}
+return cmp < 0;
+  }
+};
+
+for (int i = 0; i < counts.length; i++) {
+  if (pq.size() < resultSize) {
+pq.add(new LabelAndValue(ranges[i].label, counts[i]));

Review Comment:
   Hmm, good point. I don't think we should change the behavior of 
`getAllChildren`. I think users will expect to get back an entry for each range 
they provided, and since that's the existing behavior anyway, I'd prefer not to 
change it. So now I'm actually wondering if the `getAllChildren` API should 
actually provide _all_ children in all of the implementations, whether-or-not 
the count is zero. I think it's actually OK that `getAllChildren` is not 
identical behavior as `getTopChildren` with a huge top-n value. I think it's OK 
if we ignore entries with a zero-count in the "top children" case, but now I'm 
not so sure in the "all children" case. What do you think users would most 
likely expect here? Should we change `getAllChildren` in all `Facets` 
implementations to return children with a zero-count?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on a diff in pull request #932: LUCENE-10559: Add Prefilter Option to KnnGraphTester

2022-06-28 Thread GitBox



jtibshirani commented on code in PR #932:
URL: https://github.com/apache/lucene/pull/932#discussion_r908941633


##
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##
@@ -225,6 +225,11 @@ public BitSetIterator getIterator(int contextOrd) {
   return new BitSetIterator(bitSets[contextOrd], cost[contextOrd]);
 }
 
+public void setBitSet(BitSet bitSet, int cost) {
+  bitSets[ord] = bitSet;

Review Comment:
   Sounds good! Feel free to ping me for a review once it's updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Robert Muir (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Robert Muir commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 In fact, i don't see any need to involve docvalues at all for this feature. You can go thru terms dict and just read the first docID() of the first posting for each term and then you know how the terms map to these docid "ranges". No need to read the whole postings list, just get the first doc, since the index is sorted. No need to consult docvalues at all. This could be another way to implement whatever it is you are doing, that's what i referred to as "XY" problem. It seems to me, for mapping terms to ranges of documents for a sorted index, that the inverted index is the correct data structure. But maybe again, we need to optimize something here so that getting that first doc is even faster for dense terms for this use case: e.g. inlining SingletonDocID (delta) for terms when the index is sorted on the field, so that its all super-efficient solely from the terms dictionary.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Robert Muir (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Robert Muir commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 I don't understand why you are starting from an ordinal at all? it seems a bit of an XY problem. Doesn't the task start with a term (e.g. bytes or text)? the two terms dictionaries are "aligned". if you are just gonna next() thru the terms of the terms dict, you can make int ordinal = 0; and just do ordinal++ after processing each term. then you don't need to do any ordinal lookups anywhere.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] kaivalnp commented on a diff in pull request #932: LUCENE-10559: Add Prefilter Option to KnnGraphTester

2022-06-28 Thread GitBox



kaivalnp commented on code in PR #932:
URL: https://github.com/apache/lucene/pull/932#discussion_r908869810


##
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##
@@ -225,6 +225,11 @@ public BitSetIterator getIterator(int contextOrd) {
   return new BitSetIterator(bitSets[contextOrd], cost[contextOrd]);
 }
 
+public void setBitSet(BitSet bitSet, int cost) {
+  bitSets[ord] = bitSet;

Review Comment:
   > What would you think of this plan?
   > 
   > * Spin off a separate issue around removing overhead from copying `BitSet` 
when the query is cached or precomputed. Maybe we'll end up with something 
similar to your change where we access the iterator directly.
   > * Either hold off on this PR until that overhead is addressed, or merge it 
but without a special workaround to prevent copying. To unblock any testing you 
could fork `KnnGraphTester` locally or `KnnVectorQuery` to add a workaround?
   
   Now that the issue for the overhead is addressed, should we look into this 
PR again?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] kaivalnp commented on a diff in pull request #951: LUCENE-10606: Optimize Prefilter Hit Collection

2022-06-28 Thread GitBox



kaivalnp commented on code in PR #951:
URL: https://github.com/apache/lucene/pull/951#discussion_r908867024


##
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##
@@ -92,20 +91,40 @@ public KnnVectorQuery(String field, float[] target, int k, 
Query filter) {
   public Query rewrite(IndexReader reader) throws IOException {
 TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()];
 
-BitSetCollector filterCollector = null;
+Weight filterWeight = null;
 if (filter != null) {
-  filterCollector = new BitSetCollector(reader.leaves().size());
   IndexSearcher indexSearcher = new IndexSearcher(reader);
   BooleanQuery booleanQuery =
   new BooleanQuery.Builder()
   .add(filter, BooleanClause.Occur.FILTER)
   .add(new FieldExistsQuery(field), BooleanClause.Occur.FILTER)
   .build();
-  indexSearcher.search(booleanQuery, filterCollector);
+  Query rewritten = indexSearcher.rewrite(booleanQuery);
+  filterWeight = indexSearcher.createWeight(rewritten, 
ScoreMode.COMPLETE_NO_SCORES, 1f);
 }
 
 for (LeafReaderContext ctx : reader.leaves()) {
-  TopDocs results = searchLeaf(ctx, filterCollector);
+  Bits acceptDocs;
+  int cost;
+  if (filterWeight != null) {
+Scorer scorer = filterWeight.scorer(ctx);
+if (scorer != null) {
+  DocIdSetIterator iterator = scorer.iterator();
+  if (iterator instanceof BitSetIterator) {
+acceptDocs = ((BitSetIterator) iterator).getBitSet();
+  } else {
+acceptDocs = BitSet.of(iterator, ctx.reader().maxDoc());
+  }
+  cost = (int) iterator.cost();

Review Comment:
   Yes, I think we should add another case where if both are `FixedBitSet`s, we 
can do a `FixedBitSet.and` to compute live + matching docs.. It might make the 
optimization more common



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts

2022-06-28 Thread Marc D'Mello (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Marc D'Mello created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Lucene - Core /  LUCENE-10629  
 
 
  Add fastMatchQuery param to MatchingFacetSetCounts   
 

  
 
 
 
 

 
Issue Type: 
  Improvement  
 
 
Assignee: 
 Unassigned  
 
 
Created: 
 28/Jun/22 17:32  
 
 
Priority: 
  Minor  
 
 
Reporter: 
 Marc D'Mello  
 

  
 
 
 
 

 
 Some facet counters, like RangeFacetCounts, allow the user to pass in a fastMatchQuery parameter in order to quickly and efficiently filter out documents in the passed in match set. We should create this same parameter in MatchingFacetSetCounts as well.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Created] (LUCENE-10628) Enable MatchingFacetSetCounts to use space partitioning data structures

2022-06-28 Thread Marc D'Mello (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Marc D'Mello created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Lucene - Core /  LUCENE-10628  
 
 
  Enable MatchingFacetSetCounts to use space partitioning data structures   
 

  
 
 
 
 

 
Issue Type: 
  Improvement  
 
 
Assignee: 
 Unassigned  
 
 
Created: 
 28/Jun/22 17:25  
 
 
Priority: 
  Minor  
 
 
Reporter: 
 Marc D'Mello  
 

  
 
 
 
 

 
 Currently, MatchingFacetSetCounts iterates over FacetSetMatcher instances passed into it linearly. While this is fine in some cases, if we have a large amount of FacetSetMatcher's, this can be inefficient. We should provide the option to users to enable the use of space partitioning data structures (namely R trees and KD trees) so we can potentially scan over these FacetSetMatcher's in sub-linear time.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by

[jira] [Resolved] (LUCENE-10274) Implement "hyperrectangle" faceting

2022-06-28 Thread Greg Miller (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Greg Miller resolved as Fixed  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Very excited to see this shipped! Thanks Shai Erera and Marc D'Mello for all the PR iterations and conversation. Great example of shipping something much stronger than the original idea after rounds of discussion and iteration. Thanks again!  
 

  
 
 
 
 

 
 Lucene - Core /  LUCENE-10274  
 
 
  Implement "hyperrectangle" faceting   
 

  
 
 
 
 

 
Change By: 
 Greg Miller  
 
 
Fix Version/s: 
 9.3  
 
 
Resolution: 
 Fixed  
 
 
Status: 
 Open Resolved  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Ignacio Vera (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Ignacio Vera commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 More exactly, the SortedDocValues iterator might have been advanced to a different ordinal.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Ignacio Vera (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Ignacio Vera commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 > The slowness is probably the lookupOrd? Can you avoid this? Just next() the termsenum to move on to the next ord. Not really because the call to advance can move the SortedDocValues iterator to a different ordinal. We need to position our TermsEnum in each call. That's one of the advantage of my proposal as everything advances together. > I'd also modify the call to termsEnum.postings() to be termsEnum.postings(postingsEnum, PostingEnum.NONE) I will try that and I will report back.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Robert Muir (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Robert Muir commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 I'd also modify the call to termsEnum.postings() to be termsEnum.postings(postingsEnum, PostingEnum.NONE). Depending on your data, it might not do anything, but you don't need frequencies so it is ok to skip over them rather than decode them.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Robert Muir (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Robert Muir commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 The slowness is probably the lookupOrd? Can you avoid this? Just next() the termsenum to move on to the next ord.   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] madrob commented on pull request #989: Add back-compat indices for 8.11.2

2022-06-28 Thread GitBox



madrob commented on PR #989:
URL: https://github.com/apache/lucene/pull/989#issuecomment-1168939508

   Both the old index in d64900e and the regenerated index in f158be9 came off 
of branch_8_11, and more specifically the 8.11.2 release tag. I don't 
understand why they are different though. Old index was generated using 
commands in the release wizard and copied over from the commit in 963814fc454, 
regenerated index was built by running directly running `ant`, then zipped.
   
   The error that we get with the "bad" index can be reproduced on `d64900e` 
with `./gradlew test --tests TestBackwardsCompatibility.testSortedIndex`
   
   ```
   org.apache.lucene.backward_index.TestBackwardsCompatibility > 
testSortedIndex FAILED
   java.lang.AssertionError
   at 
__randomizedtesting.SeedInfo.seed([80EE95A3F6D9343B:F771F355CE204DEF]:0)
   at junit@4.13.1/org.junit.Assert.fail(Assert.java:87)
   at junit@4.13.1/org.junit.Assert.assertTrue(Assert.java:42)
   at junit@4.13.1/org.junit.Assert.assertTrue(Assert.java:53)
   at 
org.apache.lucene.backward_index.TestBackwardsCompatibility.searchExampleIndex(TestBackwardsCompatibility.java:2179)
   at 
org.apache.lucene.backward_index.TestBackwardsCompatibility.testSortedIndex(TestBackwardsCompatibility.java:2148)
   ```
   
   The test code in question is these lines:
   
   ```
   topDocs = searcher.search(new TermQuery(new Term("body", "the")), 5);
   assertTrue(topDocs.totalHits.value > 0);
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-28 Thread GitBox



jpountz commented on code in PR #972:
URL: https://github.com/apache/lucene/pull/972#discussion_r908672681


##
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java:
##
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.LinkedList;
+import java.util.List;
+
+/** Scorer implementing Block-Max Maxscore algorithm */
+public class BlockMaxMaxscoreScorer extends Scorer {
+  private final ScoreMode scoreMode;
+
+  // current doc ID of the leads
+  private int doc;
+
+  // doc id boundary that all scorers maxScore are valid
+  private int upTo = -1;
+
+  // heap of scorers ordered by doc ID
+  private final DisiPriorityQueue essentialsScorers;
+  // list of scorers ordered by maxScore
+  private final LinkedList maxScoreSortedEssentialScorers;
+
+  private final DisiWrapper[] allScorers;
+
+  // sum of max scores of scorers in nonEssentialScorers list
+  private float nonEssentialMaxScoreSum;
+
+  private long cost;
+
+  private final MaxScoreSumPropagator maxScoreSumPropagator;
+
+  // scaled min competitive score
+  private float minCompetitiveScore = 0;
+
+  /**
+   * Constructs a Scorer
+   *
+   * @param weight The weight to be used.
+   * @param scorers The sub scorers this Scorer should iterate on for optional 
clauses
+   * @param scoreMode The scoreMode
+   */
+  public BlockMaxMaxscoreScorer(Weight weight, List scorers, ScoreMode 
scoreMode)
+  throws IOException {
+super(weight);
+assert scoreMode == ScoreMode.TOP_SCORES;
+
+this.scoreMode = scoreMode;
+this.doc = -1;
+
+this.allScorers = new DisiWrapper[scorers.size()];
+int i = 0;
+this.essentialsScorers = new DisiPriorityQueue(scorers.size());
+this.maxScoreSortedEssentialScorers = new LinkedList<>();
+
+long cost = 0;
+for (Scorer scorer : scorers) {
+  DisiWrapper w = new DisiWrapper(scorer);
+  cost += w.cost;
+  allScorers[i++] = w;
+}
+
+this.cost = cost;
+maxScoreSumPropagator = new MaxScoreSumPropagator(scorers);
+  }
+
+  @Override
+  public DocIdSetIterator iterator() {
+// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee
+return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator());
+  }
+
+  @Override
+  public TwoPhaseIterator twoPhaseIterator() {
+DocIdSetIterator approximation =
+new DocIdSetIterator() {
+
+  @Override
+  public int docID() {
+return doc;
+  }
+
+  @Override
+  public int nextDoc() throws IOException {
+return advance(doc + 1);
+  }
+
+  @Override
+  public int advance(int target) throws IOException {
+while (true) {
+
+  if (target > upTo) {
+updateMaxScoresAndLists(target);
+  } else {
+// minCompetitiveScore might have increased,
+// move potentially no-longer-competitive scorers from 
essential to non-essential
+// list
+movePotentiallyNonCompetitiveScorers();
+  }
+
+  assert target <= upTo;
+
+  DisiWrapper top = essentialsScorers.top();
+
+  if (top == null) {
+// all scorers in non-essential list, skip to next boundary or 
return no_more_docs
+if (upTo == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else {
+  target = upTo + 1;
+}
+  } else {
+// position all scorers in essential list to on or after target
+while (top.doc < target) {
+  top.doc = top.iterator.advance(target);
+  top = essentialsScorers.updateTop();
+}
+
+if (top.doc == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else if (top.doc > upTo) {
+  target = upTo + 1;
+} else {
+  float

[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-28 Thread GitBox



zacharymorn commented on code in PR #972:
URL: https://github.com/apache/lucene/pull/972#discussion_r908663130


##
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java:
##
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.LinkedList;
+import java.util.List;
+
+/** Scorer implementing Block-Max Maxscore algorithm */
+public class BlockMaxMaxscoreScorer extends Scorer {
+  private final ScoreMode scoreMode;
+
+  // current doc ID of the leads
+  private int doc;
+
+  // doc id boundary that all scorers maxScore are valid
+  private int upTo = -1;
+
+  // heap of scorers ordered by doc ID
+  private final DisiPriorityQueue essentialsScorers;
+  // list of scorers ordered by maxScore
+  private final LinkedList maxScoreSortedEssentialScorers;
+
+  private final DisiWrapper[] allScorers;
+
+  // sum of max scores of scorers in nonEssentialScorers list
+  private float nonEssentialMaxScoreSum;
+
+  private long cost;
+
+  private final MaxScoreSumPropagator maxScoreSumPropagator;
+
+  // scaled min competitive score
+  private float minCompetitiveScore = 0;
+
+  /**
+   * Constructs a Scorer
+   *
+   * @param weight The weight to be used.
+   * @param scorers The sub scorers this Scorer should iterate on for optional 
clauses
+   * @param scoreMode The scoreMode
+   */
+  public BlockMaxMaxscoreScorer(Weight weight, List scorers, ScoreMode 
scoreMode)
+  throws IOException {
+super(weight);
+assert scoreMode == ScoreMode.TOP_SCORES;
+
+this.scoreMode = scoreMode;
+this.doc = -1;
+
+this.allScorers = new DisiWrapper[scorers.size()];
+int i = 0;
+this.essentialsScorers = new DisiPriorityQueue(scorers.size());
+this.maxScoreSortedEssentialScorers = new LinkedList<>();
+
+long cost = 0;
+for (Scorer scorer : scorers) {
+  DisiWrapper w = new DisiWrapper(scorer);
+  cost += w.cost;
+  allScorers[i++] = w;
+}
+
+this.cost = cost;
+maxScoreSumPropagator = new MaxScoreSumPropagator(scorers);
+  }
+
+  @Override
+  public DocIdSetIterator iterator() {
+// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee
+return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator());
+  }
+
+  @Override
+  public TwoPhaseIterator twoPhaseIterator() {
+DocIdSetIterator approximation =
+new DocIdSetIterator() {
+
+  @Override
+  public int docID() {
+return doc;
+  }
+
+  @Override
+  public int nextDoc() throws IOException {
+return advance(doc + 1);
+  }
+
+  @Override
+  public int advance(int target) throws IOException {
+while (true) {
+
+  if (target > upTo) {
+updateMaxScoresAndLists(target);
+  } else {
+// minCompetitiveScore might have increased,
+// move potentially no-longer-competitive scorers from 
essential to non-essential
+// list
+movePotentiallyNonCompetitiveScorers();
+  }
+
+  assert target <= upTo;
+
+  DisiWrapper top = essentialsScorers.top();
+
+  if (top == null) {
+// all scorers in non-essential list, skip to next boundary or 
return no_more_docs
+if (upTo == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else {
+  target = upTo + 1;
+}
+  } else {
+// position all scorers in essential list to on or after target
+while (top.doc < target) {
+  top.doc = top.iterator.advance(target);
+  top = essentialsScorers.updateTop();
+}
+
+if (top.doc == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else if (top.doc > upTo) {
+  target = upTo + 1;
+} else {
+  float

[GitHub] [lucene] msokolov commented on pull request #926: VectorSimilarityFunction reverse removal

2022-06-28 Thread GitBox



msokolov commented on PR #926:
URL: https://github.com/apache/lucene/pull/926#issuecomment-1168906824

   Yes please go ahead and backport the codec version upgrade to 9.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #924: Create Lucene93 Codec and move Lucene92 to backwards_codecs

2022-06-28 Thread GitBox



msokolov commented on PR #924:
URL: https://github.com/apache/lucene/pull/924#issuecomment-1168905880

   Well, maybe we should wait until there is some actual change to be applied?
   Otherwise ... no, it should be fine to cherry-pick
   
   
   On Tue, Jun 28, 2022 at 9:52 AM Alessandro Benedetti <
   ***@***.***> wrote:
   
   > Any reason this has not been cherry-picked to branch_9x
   >  ?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you modified the open/close state.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida edited a comment on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 I think I have addressed attachments. The links in comments are also replaced with the correct urls that point to the attachment files (in another repo).  -  [  https://github.com/mocobeta/migration-test-2/issues/126 ] -  [  https://github.com/mocobeta/migration-test-2/issues/127 ]  (pictures should be directly rendered in the comment areas; there are conversion errors and should be fixed.) > I would only print the "(versions: 1)" if it's > 1. Fixed this.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 I think I have addressed attachments. The links in comments are also replaced with the correct urls that point to the attachment files (in another repo). 
 
https://github.com/mocobeta/migration-test-2/issues/126 
https://github.com/mocobeta/migration-test-2/issues/127 (pictures should be directly rendered in the comment areas; there are conversion errors and should be fixed.) 
  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Ignacio Vera (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Ignacio Vera commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 Here is the code I am using when using postings/TermsEnum from the inverted index which might be totally wrong / inefficient as unfortunately I am not an expert in this area: 

 

// position the terms enum on the current term
termsEnum.seekExact(sortedDocValues.lookupOrd(sortedDocValues.ordValue()));
// advance
if (termsEnum.next() == null) {
doc = sortedDocValues.advance(DocIdSetIterator.NO_MORE_DOCS);
} else {
termsEnum.postings(postingsEnum);
doc = sortedDocValues.advance(postingsEnum.nextDoc());
}

 

 This code performs ok for lower cardinality but it becomes slow for high cardinality. Similar to what I have done in the linked PR, I have indexed 50 million documents in a sorted index. The documents contain a SortedDocValues with a 10 bytes term and the term is indexed using a StringField as well. I checked the index size and the speed of retrieving the first document per term with different cardinalities and the results looks like:  1000 cardinality INDEX SIZE: 6.110762596130371 MB Average: 0.00264192705 seconds  1 cardinality INDEX SIZE: 22.32368278503418 MB Average: 0.01891452695003 seconds  10 cardinality INDEX SIZE: 86.16338920593262 MB Average: 0.1449143188 seconds  50 cardinality INDEX SIZE: 108.61055660247803 MB Average: 0.4431338875005 seconds  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] alessandrobenedetti commented on pull request #926: VectorSimilarityFunction reverse removal

2022-06-28 Thread GitBox



alessandrobenedetti commented on PR #926:
URL: https://github.com/apache/lucene/pull/926#issuecomment-1168758957

   I was planning to cherry pick this to 
[branch_9x](https://github.com/apache/lucene/tree/branch_9x) but I found a 
conflict related to the lack of cherry picking of this commit: 
https://github.com/apache/lucene/commit/1b105f0eebecb3efd8f9e677bcb1caa82c928950
 
   
   @msokolov was it done on purpose? should I cherry pick first yours and then 
this commit?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] alessandrobenedetti commented on pull request #924: Create Lucene93 Codec and move Lucene92 to backwards_codecs

2022-06-28 Thread GitBox



alessandrobenedetti commented on PR #924:
URL: https://github.com/apache/lucene/pull/924#issuecomment-1168755985

   Any reason this has not been cherry-picked to 
[branch_9x](https://github.com/apache/lucene/tree/branch_9x) ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss merged pull request #991: Update randomizedtesting to 2.8.0, hppc to 0.9.1, morfologik to 2.1.9.

2022-06-28 Thread GitBox



dweiss merged PR #991:
URL: https://github.com/apache/lucene/pull/991


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 I would only print the "(versions: 1)" if it's > 1.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10593) VectorSimilarityFunction reverse removal

2022-06-28 Thread ASF subversion and git services (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 ASF subversion and git services commented on  LUCENE-10593  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: VectorSimilarityFunction reverse removal   
 

  
 
 
 
 

 
 Commit 8cf694fed2131c71679c24277fbb76e0d981d564 in lucene's branch refs/heads/main from Alessandro Benedetti [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8cf694fed21 ] LUCENE-10593: VectorSimilarityFunction reverse removal (#926) 
 
Vector Similarity Function reverse property removed 
 
 
NeighborQueue tie-breaking fixed (node id + node score encoding) 
 
 
NeighborQueue readability refactor 
 
 
BoundChecker removal (now it's only in backward-codecs) 
  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] alessandrobenedetti merged pull request #926: VectorSimilarityFunction reverse removal

2022-06-28 Thread GitBox



alessandrobenedetti merged PR #926:
URL: https://github.com/apache/lucene/pull/926


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida edited a comment on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 I would do this for attachments.- Keep the latest version only.- Next to the attachment link, add the count of the files with the same name.{code}Attachments- LUCENE-.patch (4 versions){code}Since I don't see the large merit to keeping all attachments with duplicate filenames if users cannot download them anyway. It's good for completeness to keep all versions though. e.g. https://github.com/mocobeta/migration-test-2/issues/123  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Robert Muir (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Robert Muir commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 Also if performance changes are minor between the two solutions, perhaps we could speed terms/postings up for the sorted case to close the gap. For example, in the sorted case we could consider always writing SingletonDocID, and because of sorting, it could be delta-encoded.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-06-28 Thread Greg Miller (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Greg Miller commented on  LUCENE-10603  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Improve iteration of ords for SortedSetDocValues   
 

  
 
 
 
 

 
 Thanks Lu Xugang for letting me know! As I have some free time, I'll try to migrate a few more modules over (and will update here as I put out PRs for the modules).  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Robert Muir (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Robert Muir commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 do we have any idea of the comparison? I'm just curious because it seems like doing TermsEnum.next() and getting first doc ID should be relatively optimized. The nice thing about it, is that it can already be done today without adding additional datastructures and APIs. The docvalues advanceOrd is a bit of a mismatch for column data structure, it seems like an inverted structure is more appropriate for this.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] msokolov commented on a diff in pull request #951: LUCENE-10606: Optimize Prefilter Hit Collection

2022-06-28 Thread GitBox



msokolov commented on code in PR #951:
URL: https://github.com/apache/lucene/pull/951#discussion_r908461540


##
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##
@@ -92,20 +91,40 @@ public KnnVectorQuery(String field, float[] target, int k, 
Query filter) {
   public Query rewrite(IndexReader reader) throws IOException {
 TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()];
 
-BitSetCollector filterCollector = null;
+Weight filterWeight = null;
 if (filter != null) {
-  filterCollector = new BitSetCollector(reader.leaves().size());
   IndexSearcher indexSearcher = new IndexSearcher(reader);
   BooleanQuery booleanQuery =
   new BooleanQuery.Builder()
   .add(filter, BooleanClause.Occur.FILTER)
   .add(new FieldExistsQuery(field), BooleanClause.Occur.FILTER)
   .build();
-  indexSearcher.search(booleanQuery, filterCollector);
+  Query rewritten = indexSearcher.rewrite(booleanQuery);
+  filterWeight = indexSearcher.createWeight(rewritten, 
ScoreMode.COMPLETE_NO_SCORES, 1f);
 }
 
 for (LeafReaderContext ctx : reader.leaves()) {
-  TopDocs results = searchLeaf(ctx, filterCollector);
+  Bits acceptDocs;
+  int cost;
+  if (filterWeight != null) {
+Scorer scorer = filterWeight.scorer(ctx);
+if (scorer != null) {
+  DocIdSetIterator iterator = scorer.iterator();
+  if (iterator instanceof BitSetIterator) {
+acceptDocs = ((BitSetIterator) iterator).getBitSet();
+  } else {
+acceptDocs = BitSet.of(iterator, ctx.reader().maxDoc());
+  }
+  cost = (int) iterator.cost();

Review Comment:
   Could we apply the optimization using liveDocs in the case that it *is* a 
FixedBitSet? can we tell?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 I would do this for attachments. 
 
Keep the latest version only. 
Next to the attachment link, add the count of the files with the same name. 

 

Attachments
- LUCENE-.patch (4 versions)
 

 
 Since I don't see the large merit to keeping all attachments with duplicate filenames if users cannot download them anyway. It's good for completeness to keep all versions though.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Ignacio Vera (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Ignacio Vera commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 If I understand you correct, you mean leveraging the inverted index to get the first document per term.  I tried that and my conclusion was that it was slower than manually iterate the doc values.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10396) Automatically create sparse indexes for sort fields

2022-06-28 Thread Robert Muir (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Robert Muir commented on  LUCENE-10396  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Automatically create sparse indexes for sort fields   
 

  
 
 
 
 

 
 If you just need the first document with the each value, why not use postings/TermsEnum?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] uschindler commented on a diff in pull request #978: Remove/deprecate obsolete constants in oal.util.Constants; remove code which is no longer executed after Java 9

2022-06-28 Thread GitBox



uschindler commented on code in PR #978:
URL: https://github.com/apache/lucene/pull/978#discussion_r908355653


##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -4775,14 +4775,14 @@ private static void setDiagnostics(SegmentInfo info, 
String source, Map

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Uwe Schindler (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Uwe Schindler edited a comment on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 bq. We'll be able to save multiple attachments with the same filename, but I think the reference disambiguation is almost impossible (filename is the unique key for an attachment in an issue).The older versions have a different database id, so the link is different. The filename is just for human consumption. Where it is ambiguous is inside comments. The comments referring to a file are always pointing to latets version.This was also a long-standing issue in JIRA and thousands of people complained (I maintain a huuuge JIRA instance). They "solved" it in later versions by appending numbers to filename in later versions - but only during upload. Internally, the filename is still not a unique key.  In addition you can still create duplicates using JIRA's API.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Uwe Schindler (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Uwe Schindler commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 Yes, that's exactly the problem of JIRA: https://jira.atlassian.com/browse/JRASERVER-2169  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 

Where it is ambiguous is inside comments. The comments referring to a file are always pointing to latets version.
 Yes, so - It'd be more confusing if we allow downloading old files...? There will be inconsistencies between the comments and the attachment.  While we can keep all versions of each attachment though, perhaps we cannot (or shouldn't) create links to old ones I think.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida edited a comment on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 bq. Where it is ambiguous is inside comments. The comments referring to a file are always pointing to latets version.Yes, so - It'd be more confusing if we allow downloading old files...? There will be inconsistencies between the comments and the attachment  (in timestamp) .   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 

Where it is ambiguous is inside comments. The comments referring to a file are always pointing to latets version.
 Yes, so - It'd be more confusing if we allow downloading old files...? There will be inconsistencies between the comments and the attachment.   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Uwe Schindler (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Uwe Schindler edited a comment on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 bq. We'll be able to save multiple attachments with the same filename, but I think the reference disambiguation is almost impossible (filename is the unique key for an attachment in an issue).The older versions have a different database id, so the link is different. The filename is just for human consumption. Where it is ambiguous is inside comments. The comments referring to a file are always pointing to latets version. This was also a long-standing issue in JIRA and thousands of people complained (I maintain a huuuge JIRA instance). They "solved" it in later versions by appending numbers to filename in later versions - but only during upload. Internally, the filename is still not a unique key.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida edited a comment on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 bq. I didn't check for multiple attachments with the same name (perhaps it's uncommon but definitely possible) - these would have to be saved under a subfolder or something, so that they can be distinguished.We'll be able to save multiple attachments with the same filename, but I think the reference disambiguation is almost impossible (filename is the unique key for an attachment in an issue). Jira itself creates only one link (to the latest one) for a filename in the issue's attachment list. I think we can archive the latest file only per filename and safely omit old files, it would not practically harm anything to me - I might miss something though. If we try to allow downloading multiple versions of the files with same name via github (unlike jira itself), the ambiguity of the references will immediately come up,  I think we can't cope with it...  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Uwe Schindler (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Uwe Schindler edited a comment on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 bq. We'll be able to save multiple attachments with the same filename, but I think the reference disambiguation is almost impossible (filename is the unique key for an attachment in an issue).The older versions have a different database id, so the link is different. The filename is just for human consumption.  Where it is ambiguous is inside comments. The comments referring to a file are always pointing to latets version.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Uwe Schindler (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Uwe Schindler commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 

We'll be able to save multiple attachments with the same filename, but I think the reference disambiguation is almost impossible (filename is the unique key for an attachment in an issue).
 The older versions have a different database id, so the link is different. The filename is just for human consumption.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 

I didn't check for multiple attachments with the same name (perhaps it's uncommon but definitely possible) - these would have to be saved under a subfolder or something, so that they can be distinguished.
 We'll be able to save multiple attachments with the same filename, but I think the reference disambiguation is almost impossible (filename is the unique key for an attachment in an issue). Jira itself creates only one link (to the latest one) for a filename in the issue's attachment list. I think we can archive the latest file only per filename and safely omit old files, it would not practically harm anything to me - I might miss something though.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] iverase commented on pull request #979: LUCENE-10396: Add capability to jump to the next document with different ord in SortedDocValues

2022-06-28 Thread GitBox



iverase commented on PR #979:
URL: https://github.com/apache/lucene/pull/979#issuecomment-1168539694

   I make a quick check if this patch by indexing 50 million documents in a 
sorted index. The documents just contain a SortedDocValues with a 10 bytes 
term. I checked the index size and the speed of retrieving the first document 
per term with different cardinalities and the results looks like:
   
   Cardinality ~1000
   ```
  |  without patch   | with patch   
   Index Size (MB)|  2.800084114074707   |  2.8039379119873047 
   average advanceOrd (ms)|  0.3925505353495 |  0.00110124379
   ```
   
   Cardinality ~1
   ```
  |  without patch   | with patch   
  
   Index Size (MB)|  16.125946044921875  |  16.164132118225098
   average advanceOrd (ms)|  0.52939177705   |  0.01008831655
   ```
   
   Cardinality ~1
   ```
  |  without patch   | with patch   
   Index Size (MB)| 49.320682525634766   |  49.57721138000488
   average advanceOrd (ms)|  0.547911470999  |  0.03804306865
   ```
   Cardinality ~5
   ```
  |  without patch   | with patch   
   Index Size (MB)|   52.81498718261719  |  53.66002082824707 
   average advanceOrd (ms)|   0.651533527099 |  0.0689882125501
   ```
   
   The new jump table is tiny compared to the size of the doc value while this 
new way of navigation os at least one order of magnitude faster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Lucene - Core /  LUCENE-10622  
 
 
  Prepare complete migration script to GitHub issue from Jira (best effort)   
 

  
 
 
 
 

 
Change By: 
 Tomoko Uchida  
 
 
Attachment: 
 test-1.txt  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida commented on  LUCENE-10622  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Prepare complete migration script to GitHub issue from Jira (best effort)   
 

  
 
 
 
 

 
  test.txt test attachment  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Updated] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)

2022-06-28 Thread Tomoko Uchida (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Tomoko Uchida updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Lucene - Core /  LUCENE-10622  
 
 
  Prepare complete migration script to GitHub issue from Jira (best effort)   
 

  
 
 
 
 

 
Change By: 
 Tomoko Uchida  
 
 
Attachment: 
 test.txt  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10616) Moving to dictionaries has made stored fields slower at skipping

2022-06-28 Thread fang hou (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 fang hou commented on  LUCENE-10616  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Moving to dictionaries has made stored fields slower at skipping   
 

  
 
 
 
 

 
 hi Adrien Grand if no one takes it, may I give it a try?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 I would leave those issue numbers as they were - like I said, these issue numbers are widely mentioned everywhere (mailing list archives, etc.) and I don't think they should be replaced. Spring redirects Jira URLs to their corresponding ported github issues - this is a much better resolution, I think.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] uschindler commented on a diff in pull request #978: Remove/deprecate obsolete constants in oal.util.Constants; remove code which is no longer executed after Java 9

2022-06-28 Thread GitBox



uschindler commented on code in PR #978:
URL: https://github.com/apache/lucene/pull/978#discussion_r908184454


##
lucene/core/src/java/org/apache/lucene/util/Constants.java:
##
@@ -84,10 +97,27 @@ private Constants() {} // can't construct
 JRE_IS_64BIT = is64Bit;
   }
 
-  public static final boolean JRE_IS_MINIMUM_JAVA8 =
-  JVM_MAJOR_VERSION > 1 || (JVM_MAJOR_VERSION == 1 && JVM_MINOR_VERSION >= 
8);
-  public static final boolean JRE_IS_MINIMUM_JAVA9 =
-  JVM_MAJOR_VERSION > 1 || (JVM_MAJOR_VERSION == 1 && JVM_MINOR_VERSION >= 
9);
-  public static final boolean JRE_IS_MINIMUM_JAVA11 =

Review Comment:
   This code for Java 11 was wrong anyways. Lucily it was not used at all.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10627) Using CompositeByteBuf to Reduce Memory Copy

2022-06-28 Thread Uwe Schindler (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Uwe Schindler commented on  LUCENE-10627  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Using CompositeByteBuf to Reduce Memory Copy   
 

  
 
 
 
 

 
 

I wonder if we could reduce this complexity by reusing some existing abstractions like ByteBuffersDataInput instead of this new CompositeByteBuf, and have a single Compressor#compress API instead of two.
 +1  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10627) Using CompositeByteBuf to Reduce Memory Copy

2022-06-28 Thread Adrien Grand (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Adrien Grand commented on  LUCENE-10627  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Using CompositeByteBuf to Reduce Memory Copy   
 

  
 
 
 
 

 
 I understand how the change helps, but overall based on the benchmark result that you shared, this is only a 0.3% (BEST_COMPRESSION) or 1.4% (BEST_SPEED) improvement while the change adds some complexity. I wonder if we could reduce this complexity by reusing some existing abstractions like ByteBuffersDataInput instead of this new CompositeByteBuf, and have a single Compressor#compress API instead of two.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Uwe Schindler (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Uwe Schindler commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 
 

Once we have done this: Should we rewrite CHANGES.txt and replace all LUCENE- links to GITHUB# links?
 
I'm not sure if it should be done. Just for your information the current changes2html.pl supports only Pull Requests, so the script should be changed if we want to mention GitHub issues in CHANGES. (I have little experience with perl, but I'll take a look if it's needed. Maybe we should also support issues near future.)
 Actually we do not need extra code for this. As Github issue numbers and pull requests share the same increasing integer space, GITHUB#1234 is always either a PR or an issue. So there is no problem in converting those. If we decide to move all historic issues to Github, we should also update the file. Github actually does the right thing when you create a link to an issue or PR, it will redirect to the canonic URL. I would prefer to use "issue/" in the URL. If it is a PR then github redirects.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] uschindler commented on a diff in pull request #978: Remove/deprecate obsolete constants in oal.util.Constants; remove code which is no longer executed after Java 9

2022-06-28 Thread GitBox



uschindler commented on code in PR #978:
URL: https://github.com/apache/lucene/pull/978#discussion_r908210561


##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -4775,14 +4775,14 @@ private static void setDiagnostics(SegmentInfo info, 
String source, Map

[GitHub] [lucene] uschindler commented on a diff in pull request #978: Remove/deprecate obsolete constants in oal.util.Constants; remove code which is no longer executed after Java 9

2022-06-28 Thread GitBox



uschindler commented on code in PR #978:
URL: https://github.com/apache/lucene/pull/978#discussion_r908185028


##
lucene/core/src/java/org/apache/lucene/util/Constants.java:
##
@@ -84,10 +97,27 @@ private Constants() {} // can't construct
 JRE_IS_64BIT = is64Bit;
   }
 
-  public static final boolean JRE_IS_MINIMUM_JAVA8 =
-  JVM_MAJOR_VERSION > 1 || (JVM_MAJOR_VERSION == 1 && JVM_MINOR_VERSION >= 
8);
-  public static final boolean JRE_IS_MINIMUM_JAVA9 =
-  JVM_MAJOR_VERSION > 1 || (JVM_MAJOR_VERSION == 1 && JVM_MINOR_VERSION >= 
9);
-  public static final boolean JRE_IS_MINIMUM_JAVA11 =

Review Comment:
   I tend to patch it in 8.11 branch...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #978: Remove/deprecate obsolete constants in oal.util.Constants; remove code which is no longer executed after Java 9

2022-06-28 Thread GitBox



uschindler commented on code in PR #978:
URL: https://github.com/apache/lucene/pull/978#discussion_r908184454


##
lucene/core/src/java/org/apache/lucene/util/Constants.java:
##
@@ -84,10 +97,27 @@ private Constants() {} // can't construct
 JRE_IS_64BIT = is64Bit;
   }
 
-  public static final boolean JRE_IS_MINIMUM_JAVA8 =
-  JVM_MAJOR_VERSION > 1 || (JVM_MAJOR_VERSION == 1 && JVM_MINOR_VERSION >= 
8);
-  public static final boolean JRE_IS_MINIMUM_JAVA9 =
-  JVM_MAJOR_VERSION > 1 || (JVM_MAJOR_VERSION == 1 && JVM_MINOR_VERSION >= 
9);
-  public static final boolean JRE_IS_MINIMUM_JAVA11 =

Review Comment:
   This code was wrong anyways. Lucily it was not used at all.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a diff in pull request #978: Remove/deprecate obsolete constants in oal.util.Constants; remove code which is no longer executed after Java 9

2022-06-28 Thread GitBox



uschindler commented on code in PR #978:
URL: https://github.com/apache/lucene/pull/978#discussion_r908181282


##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -4775,14 +4775,14 @@ private static void setDiagnostics(SegmentInfo info, 
String source, Map

[GitHub] [lucene] jpountz commented on pull request #989: Add back-compat indices for 8.11.2

2022-06-28 Thread GitBox



jpountz commented on PR #989:
URL: https://github.com/apache/lucene/pull/989#issuecomment-1168391661

   What sort of failures did you get? I wonder if it would make things easier 
to copy indexes that have been produced on `branch_8_11` instead of 
regenerating them from the `main` branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues

2022-06-28 Thread GitBox



jpountz commented on code in PR #967:
URL: https://github.com/apache/lucene/pull/967#discussion_r908179203


##
lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java:
##
@@ -415,34 +419,45 @@ public BytesRef lookupOrd(long ord) throws IOException {
 public long getValueCount() {
   return in.getValueCount();
 }
+
+private void initCount() {
+  assert docID >= 0;
+  ordUpto = ords.offsets[docID] - 1;
+  count = (int) ords.docValueCounts.get(docID);
+  limit = ordUpto + count;
+}
   }
 
   static final class DocOrds {
 final long[] offsets;
 final PackedLongValues ords;
+final GrowableWriter docValueCounts;
+
+public static final int START_BITS_PER_VALUE = 2;
 
 DocOrds(
 int maxDoc,
 Sorter.DocMap sortMap,
 SortedSetDocValues oldValues,
-float acceptableOverheadRatio)
+float acceptableOverheadRatio,
+int bitsPerValue)
 throws IOException {
   offsets = new long[maxDoc];
   PackedLongValues.Builder builder = 
PackedLongValues.packedBuilder(acceptableOverheadRatio);
-  long ordOffset = 1; // 0 marks docs with no values
+  docValueCounts = new GrowableWriter(bitsPerValue, maxDoc, 
acceptableOverheadRatio);
+  long ordOffset = 1;

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-28 Thread GitBox



jpountz commented on code in PR #972:
URL: https://github.com/apache/lucene/pull/972#discussion_r908131031


##
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java:
##
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.LinkedList;
+import java.util.List;
+
+/** Scorer implementing Block-Max Maxscore algorithm */
+public class BlockMaxMaxscoreScorer extends Scorer {
+  private final ScoreMode scoreMode;
+
+  // current doc ID of the leads
+  private int doc;
+
+  // doc id boundary that all scorers maxScore are valid
+  private int upTo = -1;
+
+  // heap of scorers ordered by doc ID
+  private final DisiPriorityQueue essentialsScorers;
+  // list of scorers ordered by maxScore
+  private final LinkedList maxScoreSortedEssentialScorers;
+
+  private final DisiWrapper[] allScorers;
+
+  // sum of max scores of scorers in nonEssentialScorers list
+  private float nonEssentialMaxScoreSum;
+
+  private long cost;
+
+  private final MaxScoreSumPropagator maxScoreSumPropagator;
+
+  // scaled min competitive score
+  private float minCompetitiveScore = 0;
+
+  /**
+   * Constructs a Scorer
+   *
+   * @param weight The weight to be used.
+   * @param scorers The sub scorers this Scorer should iterate on for optional 
clauses
+   * @param scoreMode The scoreMode
+   */
+  public BlockMaxMaxscoreScorer(Weight weight, List scorers, ScoreMode 
scoreMode)
+  throws IOException {
+super(weight);
+assert scoreMode == ScoreMode.TOP_SCORES;
+
+this.scoreMode = scoreMode;
+this.doc = -1;
+
+this.allScorers = new DisiWrapper[scorers.size()];
+int i = 0;
+this.essentialsScorers = new DisiPriorityQueue(scorers.size());
+this.maxScoreSortedEssentialScorers = new LinkedList<>();
+
+long cost = 0;
+for (Scorer scorer : scorers) {
+  DisiWrapper w = new DisiWrapper(scorer);
+  cost += w.cost;
+  allScorers[i++] = w;
+}
+
+this.cost = cost;
+maxScoreSumPropagator = new MaxScoreSumPropagator(scorers);
+  }
+
+  @Override
+  public DocIdSetIterator iterator() {
+// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee
+return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator());
+  }
+
+  @Override
+  public TwoPhaseIterator twoPhaseIterator() {
+DocIdSetIterator approximation =
+new DocIdSetIterator() {
+
+  @Override
+  public int docID() {
+return doc;
+  }
+
+  @Override
+  public int nextDoc() throws IOException {
+return advance(doc + 1);
+  }
+
+  @Override
+  public int advance(int target) throws IOException {
+while (true) {
+
+  if (target > upTo) {
+updateMaxScoresAndLists(target);
+  } else {
+// minCompetitiveScore might have increased,
+// move potentially no-longer-competitive scorers from 
essential to non-essential
+// list
+movePotentiallyNonCompetitiveScorers();
+  }
+
+  assert target <= upTo;
+
+  DisiWrapper top = essentialsScorers.top();
+
+  if (top == null) {
+// all scorers in non-essential list, skip to next boundary or 
return no_more_docs
+if (upTo == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else {
+  target = upTo + 1;
+}
+  } else {
+// position all scorers in essential list to on or after target
+while (top.doc < target) {
+  top.doc = top.iterator.advance(target);
+  top = essentialsScorers.updateTop();
+}
+
+if (top.doc == NO_MORE_DOCS) {
+  return doc = NO_MORE_DOCS;
+} else if (top.doc > upTo) {
+  target = upTo + 1;
+} else {
+  float

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 > Do we need a git repository at all? We won't need version control for the files. Is a file storage sufficient and easy to handle if we can have one? My hope was that these attachments could be stored in the primary git repository for convenience - keeping the historical artifacts together and having them served for free via github's infrastructure. It's also just convenient as it can be modified/ updated by multiple people (and those same people can freeze the repository for updates, once the migration is complete). Having those artifacts elsewhere (on home.apache.org) lacks some of these conveniences but it's fine too, of course. Also, I don't think infra will have any problem in adding a repository called "lucene-archives" or something like this. I can ask if we decide to push in this direction.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[GitHub] [lucene] Yuti-G commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts

2022-06-28 Thread GitBox



Yuti-G commented on code in PR #974:
URL: https://github.com/apache/lucene/pull/974#discussion_r908084991


##
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##
@@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... 
path) throws IOException
 return new FacetResult(dim, path, totCount, labelValues, 
labelValues.length);
   }
 
-  // The current getTopChildren method is not returning "top" ranges. Instead, 
it returns all
-  // user-provided ranges in
-  // the order the user specified them when instantiating. This concept is 
being introduced and
-  // supported in the
-  // getAllChildren functionality in LUCENE-10550. getTopChildren is 
temporarily calling
-  // getAllChildren to maintain its
-  // current behavior, and the current implementation will be replaced by an 
actual "top children"
-  // implementation
-  // in LUCENE-10614
-  // TODO: fix getTopChildren in LUCENE-10614
   @Override
   public FacetResult getTopChildren(int topN, String dim, String... path) 
throws IOException {
 validateTopN(topN);
-return getAllChildren(dim, path);
+validateDimAndPathForGetChildren(dim, path);
+
+int resultSize = Math.min(topN, counts.length);
+PriorityQueue pq =
+new PriorityQueue<>(resultSize) {
+  @Override
+  protected boolean lessThan(LabelAndValue a, LabelAndValue b) {
+int cmp = Integer.compare(a.value.intValue(), b.value.intValue());
+if (cmp == 0) {
+  cmp = b.label.compareTo(a.label);
+}
+return cmp < 0;
+  }
+};
+
+for (int i = 0; i < counts.length; i++) {
+  if (pq.size() < resultSize) {
+pq.add(new LabelAndValue(ranges[i].label, counts[i]));

Review Comment:
   In this case, I propose we also change the `getAllChildren` functionality in 
RangeFacetCounts to populate LabelAndValue only when count is > 0 to be 
consistent with `getAllChildren` in other Facet implementations. Since if top-N 
is equal to all, we should return the same results from getAllChildren and 
getTopChildren. Please let me know what you think. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] Yuti-G commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts

2022-06-28 Thread GitBox



Yuti-G commented on code in PR #974:
URL: https://github.com/apache/lucene/pull/974#discussion_r908084991


##
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##
@@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... 
path) throws IOException
 return new FacetResult(dim, path, totCount, labelValues, 
labelValues.length);
   }
 
-  // The current getTopChildren method is not returning "top" ranges. Instead, 
it returns all
-  // user-provided ranges in
-  // the order the user specified them when instantiating. This concept is 
being introduced and
-  // supported in the
-  // getAllChildren functionality in LUCENE-10550. getTopChildren is 
temporarily calling
-  // getAllChildren to maintain its
-  // current behavior, and the current implementation will be replaced by an 
actual "top children"
-  // implementation
-  // in LUCENE-10614
-  // TODO: fix getTopChildren in LUCENE-10614
   @Override
   public FacetResult getTopChildren(int topN, String dim, String... path) 
throws IOException {
 validateTopN(topN);
-return getAllChildren(dim, path);
+validateDimAndPathForGetChildren(dim, path);
+
+int resultSize = Math.min(topN, counts.length);
+PriorityQueue pq =
+new PriorityQueue<>(resultSize) {
+  @Override
+  protected boolean lessThan(LabelAndValue a, LabelAndValue b) {
+int cmp = Integer.compare(a.value.intValue(), b.value.intValue());
+if (cmp == 0) {
+  cmp = b.label.compareTo(a.label);
+}
+return cmp < 0;
+  }
+};
+
+for (int i = 0; i < counts.length; i++) {
+  if (pq.size() < resultSize) {
+pq.add(new LabelAndValue(ranges[i].label, counts[i]));

Review Comment:
   In this case, I propose we also change the `getAllChildren` functionality in 
RangeFacetCounts to populate LabelAndValue only when count is > 0 to be 
consistent with `getAllChildren` in other Facet implementations. Please let me 
know what you think. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] alessandrobenedetti commented on pull request #926: VectorSimilarityFunction reverse removal

2022-06-28 Thread GitBox



alessandrobenedetti commented on PR #926:
URL: https://github.com/apache/lucene/pull/926#issuecomment-1168280355

   > "I was also wondering if you have addressed the previous Mike S.'s 
https://github.com/apache/lucene/pull/926#issuecomment-1164418508. I assume 
that your train files (e.g.sift-128-euclidean.hdf5-test ) are not in hdf5 
format, but just called like this"
   
   yes @mayya-sharipova , the latest benchmarks reported used the 
pre-processing @msokolov suggested.
   That's just the name of the file  that's automatically generated by that 
script :) 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

84 matches

Mail list logo