[PR] Remove concatenation in String.format() calls [lucene]

2024-01-28 Thread via GitHub


sabi0 opened a new pull request, #13038:
URL: https://github.com/apache/lucene/pull/13038

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Make use of null-checked variable [lucene]

2024-01-28 Thread via GitHub


sabi0 opened a new pull request, #13040:
URL: https://github.com/apache/lucene/pull/13040

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Cleanup TokenizedPhraseQueryNode code [lucene]

2024-01-28 Thread via GitHub


sabi0 commented on code in PR #13041:
URL: https://github.com/apache/lucene/pull/13041#discussion_r1468923263


##
lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/core/nodes/TokenizedPhraseQueryNode.java:
##
@@ -70,10 +70,8 @@ public QueryNode cloneTree() throws 
CloneNotSupportedException {
   @Override
   public CharSequence getField() {
 List children = getChildren();
-
-if (children == null || children.size() == 0) {
+if (children == null || children.isEmpty()) {
   return null;
-
 } else {
   return ((FieldableNode) children.get(0)).getField();

Review Comment:
   According to the `setField(CharSequence)` method below children are not 
necessarily `FieldableNode`.
   I.e. this line might throw `ClassCastException`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-28 Thread via GitHub


jfreden commented on code in PR #13036:
URL: https://github.com/apache/lucene/pull/13036#discussion_r1468906825


##
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##
@@ -249,10 +249,74 @@ BulkScorer optionalBulkScorer(LeafReaderContext context) 
throws IOException {
   return optional.get(0);
 }
 
+// Calculate count(clause1 OR clause2) as count(clause1) + count(clause2) 
- count(clause1 AND
+// clause2)
+if (scoreMode == ScoreMode.COMPLETE_NO_SCORES
+&& context.reader().hasDeletions() == false
+&& query.isTwoClauseDisjunctionWithTerms()) {
+  return twoClauseTermDisjunctionOptimizedScorer(context);
+}
+
 return new BooleanScorer(
 this, optional, Math.max(1, query.getMinimumNumberShouldMatch()), 
scoreMode.needsScores());
   }
 
+  private BulkScorer twoClauseTermDisjunctionOptimizedScorer(LeafReaderContext 
context)
+  throws IOException {
+List optionalScorers = new ArrayList<>();
+final int[] clauseDocFreqSum = new int[1];
+for (WeightedBooleanClause wc : weightedClauses) {
+  clauseDocFreqSum[0] += wc.weight.count(context);
+  ScorerSupplier scorerSupplier = wc.weight.scorerSupplier(context);
+  if (scorerSupplier != null) {
+optionalScorers.add(scorerSupplier.get(Long.MAX_VALUE));
+  }
+}
+
+final ConjunctionBulkScorer conjunctionBulkScorer =
+optionalScorers.size() == 2 ? new ConjunctionBulkScorer(List.of(), 
optionalScorers) : null;
+return new BulkScorer() {
+  @Override
+  public int score(LeafCollector collector, Bits acceptDocs, int min, int 
max)
+  throws IOException {
+final int[] intersectionScore = new int[1];
+LeafCollector intersectionCollector =
+new LeafCollector() {
+  @Override
+  public void setScorer(Scorable scorer) {}
+
+  @Override
+  public void collect(int doc) {
+intersectionScore[0]++;
+  }
+
+  @Override
+  public void collect(DocIdStream stream) throws IOException {
+intersectionScore[0] += stream.count();
+  }
+};
+
+int leadDocId = 0;
+if (conjunctionBulkScorer != null) {
+  leadDocId = conjunctionBulkScorer.score(intersectionCollector, 
acceptDocs, min, max);
+}
+
+for (int i = 1; i <= clauseDocFreqSum[0] - intersectionScore[0]; i++) {
+  collector.collect(i);

Review Comment:
   Thanks for looking at this! That helps a lot. Wasn't sure how to proceed 
since I couldn't come up with a nice way to do this without modifying 
`IndexSearcher#count` (felt unsure about this since I couldn't find any similar 
optimizations in `IndexSearcher`) or breaking the contract with `LeafCollector` 
(like I ended up doing, but only works for the count case where the doc ids are 
discarded). 
   
   I've pushed a change to do this in `IndexSearcher#count` instead. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Cleanup TokenizedPhraseQueryNode code [lucene]

2024-01-28 Thread via GitHub


sabi0 opened a new pull request, #13041:
URL: https://github.com/apache/lucene/pull/13041

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-28 Thread via GitHub


mrkm4ntr opened a new pull request, #13043:
URL: https://github.com/apache/lucene/pull/13043

   ### Description
   After introducing topLevelScoringClause, ConjunctionScorer with multiple 
scorers can be used for non top level scoring clause conjunctions instead of 
BlockMaxConjunctionScorer even requiredScorers is empty. In such case, 
ConjunctionScorer returns Infinity as maxScore and it ruins some optimizations 
like parent WANDScorer.
   
https://github.com/apache/lucene/blob/7d35ae485807147460f63ea58ae495124e972e13/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java#L218C77-L218C98
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary [lucene]

2024-01-28 Thread via GitHub


github-actions[bot] commented on PR #12517:
URL: https://github.com/apache/lucene/pull/12517#issuecomment-1913774063

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] [LUCENE-13044][replicator] NRT add configurable commitData for Custom… [lucene]

2024-01-28 Thread via GitHub


dianjifzm opened a new pull request, #13045:
URL: https://github.com/apache/lucene/pull/13045

   … security verification
   
   ### Description
   
   
   
   开放commitData的修改,可以自定义主从同步的安全机制


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-01-28 Thread via GitHub


vsop-479 commented on PR #11888:
URL: https://github.com/apache/lucene/pull/11888#issuecomment-1914149013

   @jpountz @mikemccand 
   I resolved the conflicts, and moved the test case for target greater than 
the last entry of matched block from `TestLucene90PostingsFormat` to 
`TestLucene99PostingsFormat`.
   Please take a look when you get a chance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Propagate topLevelScoringClause from QueryProfiler [lucene]

2024-01-28 Thread via GitHub


mrkm4ntr commented on PR #13031:
URL: https://github.com/apache/lucene/pull/13031#issuecomment-1914149588

   @jpountz Sure. Added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Propagate topLevelScoringClause from QueryProfiler [lucene]

2024-01-28 Thread via GitHub


jpountz commented on PR #13031:
URL: https://github.com/apache/lucene/pull/13031#issuecomment-1914131295

   @mrkm4ntr 9.10 please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org