[GitHub] [lucene-jira-archive] mocobeta commented on issue #1: Fix markup conversion error

2022-06-30 Thread GitBox
mocobeta commented on issue #1: URL: https://github.com/apache/lucene-jira-archive/issues/1#issuecomment-1171959478 I think pre- or post- processing to tweak the converter's result is risky since any text processing required contexts (which block element the character sequence resides in

[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-30 Thread GitBox
zacharymorn commented on code in PR #972: URL: https://github.com/apache/lucene/pull/972#discussion_r911590359 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[jira] [Created] (LUCENE-10636) Could the partial score sum from essential list scores be cached?

2022-06-30 Thread Zach Chen (Jira)
Zach Chen created LUCENE-10636: -- Summary: Could the partial score sum from essential list scores be cached? Key: LUCENE-10636 URL: https://issues.apache.org/jira/browse/LUCENE-10636 Project: Lucene -

[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-30 Thread GitBox
zacharymorn commented on PR #972: URL: https://github.com/apache/lucene/pull/972#issuecomment-1171893517 > I like the idea of creating WANDScorer more explicitly in tests. It doesn't look easy though and this change is already great so I wonder if we should keep it for a follow-up.

[jira] [Created] (LUCENE-10635) Ensure test coverage for WANDScorer after additional scorers get added

2022-06-30 Thread Zach Chen (Jira)
Zach Chen created LUCENE-10635: -- Summary: Ensure test coverage for WANDScorer after additional scorers get added Key: LUCENE-10635 URL: https://issues.apache.org/jira/browse/LUCENE-10635 Project: Lucene

[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-30 Thread GitBox
zacharymorn commented on code in PR #972: URL: https://github.com/apache/lucene/pull/972#discussion_r911579468 ## lucene/core/src/java/org/apache/lucene/search/DisiWrapper.java: ## @@ -39,6 +39,9 @@ public class DisiWrapper { // For WANDScorer long maxScore; + // For

[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-30 Thread GitBox
zacharymorn commented on code in PR #972: URL: https://github.com/apache/lucene/pull/972#discussion_r911579245 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java: ## @@ -0,0 +1,332 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[jira] [Comment Edited] (LUCENE-10246) Support getting counts from "association" facets

2022-06-30 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561222#comment-17561222 ] Greg Miller edited comment on LUCENE-10246 at 7/1/22 12:27 AM: ---

[jira] [Commented] (LUCENE-10246) Support getting counts from "association" facets

2022-06-30 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561222#comment-17561222 ] Greg Miller commented on LUCENE-10246: -- [~shahrs87] I'd start by becoming familiar with the

[jira] [Commented] (LUCENE-10246) Support getting counts from "association" facets

2022-06-30 Thread Rushabh Shah (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561207#comment-17561207 ] Rushabh Shah commented on LUCENE-10246: --- [~gsmiller] [~sokolov] I am pretty new to LUCENE project

[jira] [Commented] (LUCENE-10345) remove non-NRT replication support

2022-06-30 Thread Rushabh Shah (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561205#comment-17561205 ] Rushabh Shah commented on LUCENE-10345: --- Hi [~rcmuir] Can you please help me scope the changes

[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-06-30 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561203#comment-17561203 ] Greg Miller commented on LUCENE-10603: -- I pushed another commit that takes care of the remaining

[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

2022-06-30 Thread GitBox
shahrs87 commented on PR #907: URL: https://github.com/apache/lucene/pull/907#issuecomment-1171746326 > could you now try to remove all instances of if (terms == Terms.EMPTY)? @jpountz, I tried to remove all the instances of `if (terms == Terms.EMPTY)?` but couldn't remove the

[jira] [Commented] (LUCENE-10628) Enable MatchingFacetSetCounts to use space partitioning data structures

2022-06-30 Thread Marc D'Mello (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561200#comment-17561200 ] Marc D'Mello commented on LUCENE-10628: --- I am planning on working on this. > Enable

[jira] [Commented] (LUCENE-10546) Update Faceting user guide

2022-06-30 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561199#comment-17561199 ] Greg Miller commented on LUCENE-10546: -- Great, thanks [~epotiom]! I'm not aware of anyone else

[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-06-30 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561189#comment-17561189 ] ASF subversion and git services commented on LUCENE-10603: -- Commit

[GitHub] [lucene] gsmiller merged pull request #1000: LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in production code

2022-06-30 Thread GitBox
gsmiller merged PR #1000: URL: https://github.com/apache/lucene/pull/1000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [lucene] gsmiller opened a new pull request, #1000: LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in production code

2022-06-30 Thread GitBox
gsmiller opened a new pull request, #1000: URL: https://github.com/apache/lucene/pull/1000 PR only for backport. No review requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-06-30 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561187#comment-17561187 ] ASF subversion and git services commented on LUCENE-10603: -- Commit

[GitHub] [lucene] gsmiller merged pull request #995: LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in production code

2022-06-30 Thread GitBox
gsmiller merged PR #995: URL: https://github.com/apache/lucene/pull/995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [lucene] gsmiller commented on pull request #995: LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in production code

2022-06-30 Thread GitBox
gsmiller commented on PR #995: URL: https://github.com/apache/lucene/pull/995#issuecomment-1171671380 Merging as I've addressed the outstanding feedback and the change is otherwise straight-forward. Thanks @jpountz for the suggestions and review! -- This is an automated message from the

[jira] [Comment Edited] (LUCENE-10624) Binary Search for Sparse IndexedDISI advanceWithinBlock & advanceExactWithinBlock

2022-06-30 Thread Weiming Wu (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561166#comment-17561166 ] Weiming Wu edited comment on LUCENE-10624 at 6/30/22 7:55 PM: -- I started a

[GitHub] [lucene] wuwm commented on pull request #968: [LUCENE-10624] Binary Search for Sparse IndexedDISI advanceWithinBloc…

2022-06-30 Thread GitBox
wuwm commented on PR #968: URL: https://github.com/apache/lucene/pull/968#issuecomment-1171617717 @yuzhoujianxia There are some discussion on if binary or exponential search can cause performance regression in some use cases. We need to address the concerns before merging.

[jira] [Commented] (LUCENE-10624) Binary Search for Sparse IndexedDISI advanceWithinBlock & advanceExactWithinBlock

2022-06-30 Thread Weiming Wu (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561166#comment-17561166 ] Weiming Wu commented on LUCENE-10624: - I started a new AWS EC2 host and reran the test. The

[jira] [Updated] (LUCENE-10624) Binary Search for Sparse IndexedDISI advanceWithinBlock & advanceExactWithinBlock

2022-06-30 Thread Weiming Wu (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiming Wu updated LUCENE-10624: Description: h3. Problem Statement We noticed DocValue read performance regression with the

[GitHub] [lucene] yuzhoujianxia commented on pull request #968: [LUCENE-10624] Binary Search for Sparse IndexedDISI advanceWithinBloc…

2022-06-30 Thread GitBox
yuzhoujianxia commented on PR #968: URL: https://github.com/apache/lucene/pull/968#issuecomment-1171575344 Can we get this merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene-jira-archive] mocobeta commented on issue #1: Fix markup conversion error

2022-06-30 Thread GitBox
mocobeta commented on issue #1: URL: https://github.com/apache/lucene-jira-archive/issues/1#issuecomment-1171540949 Unfortunately, it was not so trivial. I forgot code blocks. In code blocs, spaces and line feed characters in the original text should be preserved and my solution breaks

[GitHub] [lucene-jira-archive] mikemccand commented on issue #1: Fix markup conversion error

2022-06-30 Thread GitBox
mikemccand commented on issue #1: URL: https://github.com/apache/lucene-jira-archive/issues/1#issuecomment-1171490096 > Looks like the converter library does not support Carriage Return `\r` and succeeding spaces after Line Feed `\n` Sigh, will our species ever get past the

[GitHub] [lucene-jira-archive] mocobeta commented on issue #1: Fix markup conversion error

2022-06-30 Thread GitBox
mocobeta commented on issue #1: URL: https://github.com/apache/lucene-jira-archive/issues/1#issuecomment-1171467919 The conversion tool seems to erase consecutive LFs (`\n\n`); this causes indent errors n Markdown. Removed LFs would be recovered by this regex (hack). ```

[jira] [Commented] (LUCENE-10627) Using CompositeByteBuf to Reduce Memory Copy

2022-06-30 Thread LuYunCheng (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561107#comment-17561107 ] LuYunCheng commented on LUCENE-10627: - it is a nice suggestion, i try to combine it > Using

[GitHub] [lucene] jpountz commented on a diff in pull request #992: LUCENE-10592 Build HNSW Graph on indexing

2022-06-30 Thread GitBox
jpountz commented on code in PR #992: URL: https://github.com/apache/lucene/pull/992#discussion_r911190808 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -24,28 +24,40 @@ import org.apache.lucene.index.DocIDMerger; import

[jira] [Updated] (LUCENE-10627) Using CompositeByteBuf to Reduce Memory Copy

2022-06-30 Thread LuYunCheng (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LuYunCheng updated LUCENE-10627: Description: Code: [https://github.com/apache/lucene/pull/987] I see When Lucene Do flush and

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-30 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561089#comment-17561089 ] Tomoko Uchida commented on LUCENE-10557: I'm sorry for the noise - Jira's special emojis should

[GitHub] [lucene-jira-archive] mocobeta commented on issue #1: Fix markup conversion error

2022-06-30 Thread GitBox
mocobeta commented on issue #1: URL: https://github.com/apache/lucene-jira-archive/issues/1#issuecomment-1171325947 ![Screenshot from 2022-06-30 23-54-46](https://user-images.githubusercontent.com/1825333/176709310-dbe249df-5f86-439d-95ec-cbe932905d16.png) Indents are still not

[GitHub] [lucene-jira-archive] mocobeta commented on issue #1: Fix markup conversion error

2022-06-30 Thread GitBox
mocobeta commented on issue #1: URL: https://github.com/apache/lucene-jira-archive/issues/1#issuecomment-1171316604 Looks like the converter library does not support Carriage Return `\r` and succeeding spaces after Line Feed `\n` and that causes the conversion errors. This quick fix in

[GitHub] [lucene] mayya-sharipova commented on pull request #992: LUCENE-10592 Build HNSW Graph on indexing

2022-06-30 Thread GitBox
mayya-sharipova commented on PR #992: URL: https://github.com/apache/lucene/pull/992#issuecomment-1171306876 > I am a bit surprised about the benchmark results. In [LUCENE-10375](https://issues.apache.org/jira/browse/LUCENE-10375), we found that writing all vectors to disk before building

[jira] [Resolved] (LUCENE-10581) Optimize stored fields merges on the first segment

2022-06-30 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10581. --- Resolution: Won't Fix > Optimize stored fields merges on the first segment >

[GitHub] [lucene] jpountz commented on pull request #892: LUCENE-10581: Optimize stored fields bulk merges on the first segment

2022-06-30 Thread GitBox
jpountz commented on PR #892: URL: https://github.com/apache/lucene/pull/892#issuecomment-1171282676 Thinking more about it, I'm thinking of not merging this change. In the normal case when merges are balanced, it doesn't help because the first segment would generally have a dirty block

[GitHub] [lucene] jpountz closed pull request #892: LUCENE-10581: Optimize stored fields bulk merges on the first segment

2022-06-30 Thread GitBox
jpountz closed pull request #892: LUCENE-10581: Optimize stored fields bulk merges on the first segment URL: https://github.com/apache/lucene/pull/892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-30 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10557: --- Description: A few (not the majority) Apache projects already use the GitHub issue instead

[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-30 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10557: --- Description: A few (not the majority) Apache projects already use the GitHub issue instead

[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #7: Make a detailed migration plan

2022-06-30 Thread GitBox
mocobeta opened a new issue, #7: URL: https://github.com/apache/lucene-jira-archive/issues/7 It will take at least a few days and there will be some moratorium time where GitHub issue is not lifted yet but a Jira issues snapshot was already taken. We need a detailed migration plan to avoid

[GitHub] [lucene] jpountz opened a new pull request, #999: LUCENE-10634: Speed up WANDScorer.

2022-06-30 Thread GitBox
jpountz opened a new pull request, #999: URL: https://github.com/apache/lucene/pull/999 This speeds up WANDScorer by computing scores of docs that are positioned on the next candidate competitive document in order to potentially detect that no further match is possible, before

[jira] [Created] (LUCENE-10634) Speed up WANDScorer by computing scores before advancing tail scorers

2022-06-30 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-10634: - Summary: Speed up WANDScorer by computing scores before advancing tail scorers Key: LUCENE-10634 URL: https://issues.apache.org/jira/browse/LUCENE-10634 Project:

[jira] [Created] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-06-30 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-10633: - Summary: Dynamic pruning for queries sorted by SORTED(_SET) field Key: LUCENE-10633 URL: https://issues.apache.org/jira/browse/LUCENE-10633 Project: Lucene - Core

[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #6: Document issue label / template managiment policy

2022-06-30 Thread GitBox
mocobeta opened a new issue, #6: URL: https://github.com/apache/lucene-jira-archive/issues/6 - Explicitly define label families (e.g., `type:xxx`, `fixVersion:x.x.x`) - Clarify the mapping between labels and index templates - Write documentation and make it accessible to developers

[GitHub] [lucene] jtibshirani commented on a diff in pull request #992: LUCENE-10592 Build HNSW Graph on indexing

2022-06-30 Thread GitBox
jtibshirani commented on code in PR #992: URL: https://github.com/apache/lucene/pull/992#discussion_r910836731 ## lucene/core/src/java/org/apache/lucene/index/VectorValuesWriter.java: ## @@ -26,233 +26,153 @@ import org.apache.lucene.codecs.KnnVectorsWriter; import

[GitHub] [lucene] jtibshirani opened a new pull request, #998: LUCENE-10577: Add vectors format unit test and fix toString

2022-06-30 Thread GitBox
jtibshirani opened a new pull request, #998: URL: https://github.com/apache/lucene/pull/998 We forgot to add this unit test when introducing the new 9.3 vectors format. This commit adds the test and fixes issues it uncovered in toString. -- This is an automated message from the Apache

[jira] [Commented] (LUCENE-10546) Update Faceting user guide

2022-06-30 Thread Egor Potemkin (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560917#comment-17560917 ] Egor Potemkin commented on LUCENE-10546: I will work on this if no one else is already doing

[GitHub] [lucene] jpountz commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-30 Thread GitBox
jpountz commented on code in PR #972: URL: https://github.com/apache/lucene/pull/972#discussion_r910683270 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java: ## @@ -0,0 +1,332 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[jira] [Created] (LUCENE-10632) Change getAllChildren to return all children regardless of the count

2022-06-30 Thread Yuting Gan (Jira)
Yuting Gan created LUCENE-10632: --- Summary: Change getAllChildren to return all children regardless of the count Key: LUCENE-10632 URL: https://issues.apache.org/jira/browse/LUCENE-10632 Project: Lucene