[GitHub] [lucene] zacharymorn commented on pull request #81: LUCENE-9335: [WIP] Speed up pure disjunction with BMM

2021-04-14 Thread GitBox
zacharymorn commented on pull request #81: URL: https://github.com/apache/lucene/pull/81#issuecomment-820100925 Ahhh yes I did assume `DisjunctionSumScorer` implements BMM...I thought BMM was used previously in Lucene and was later replaced by BMW `WANDScorer` for better performance, and

[jira] [Assigned] (LUCENE-9908) Move VectorValues#search to VectorReader and LeafReader

2021-04-14 Thread Julie Tibshirani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani reassigned LUCENE-9908: Assignee: Julie Tibshirani > Move VectorValues#search to VectorReader and

[jira] [Commented] (LUCENE-9334) Require consistency between data-structures on a per-field basis

2021-04-14 Thread Julie Tibshirani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321833#comment-17321833 ] Julie Tibshirani commented on LUCENE-9334: -- I noticed a test failure from this commit (I

[GitHub] [lucene] rmuir commented on a change in pull request #84: LUCENE-9929 Make ScandinavianNormalizationFilter configurable wrt fol…

2021-04-14 Thread GitBox
rmuir commented on a change in pull request #84: URL: https://github.com/apache/lucene/pull/84#discussion_r613628276 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.java ## @@ -33,14 +34,45 @@ *

[jira] [Commented] (LUCENE-9843) Remove compression option on doc values

2021-04-14 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321328#comment-17321328 ] Robert Muir commented on LUCENE-9843: - Great example of why its important: for a while lucene's

[jira] [Commented] (LUCENE-9843) Remove compression option on doc values

2021-04-14 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321327#comment-17321327 ] Robert Muir commented on LUCENE-9843: - I moved this issue to a blocker for 9.0 because i've already

[GitHub] [lucene] janhoy commented on a change in pull request #84: LUCENE-9929 Make ScandinavianNormalizationFilter configurable wrt fol…

2021-04-14 Thread GitBox
janhoy commented on a change in pull request #84: URL: https://github.com/apache/lucene/pull/84#discussion_r613595880 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.java ## @@ -33,14 +34,45 @@ *

[jira] [Updated] (LUCENE-9843) Remove compression option on doc values

2021-04-14 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9843: Priority: Blocker (was: Minor) > Remove compression option on doc values >

[GitHub] [lucene] rmuir commented on a change in pull request #84: LUCENE-9929 Make ScandinavianNormalizationFilter configurable wrt fol…

2021-04-14 Thread GitBox
rmuir commented on a change in pull request #84: URL: https://github.com/apache/lucene/pull/84#discussion_r613583132 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.java ## @@ -33,14 +34,45 @@ *

[jira] [Commented] (LUCENE-9929) Make ScandinavianNormalizationFilter configurable wrt foldings

2021-04-14 Thread Jira
[ https://issues.apache.org/jira/browse/LUCENE-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321313#comment-17321313 ] Jan Høydahl commented on LUCENE-9929: - We have a customer who reported a bug that the "oo" and "ao"

[GitHub] [lucene] msokolov commented on a change in pull request #83: LUCENE-9798 : Fix looping bug and made Full Knn calculation parallelizable

2021-04-14 Thread GitBox
msokolov commented on a change in pull request #83: URL: https://github.com/apache/lucene/pull/83#discussion_r613192059 ## File path: lucene/test-framework/src/java/org/apache/lucene/util/FullKnn.java ## @@ -0,0 +1,254 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Commented] (LUCENE-9334) Require consistency between data-structures on a per-field basis

2021-04-14 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321255#comment-17321255 ] ASF subversion and git services commented on LUCENE-9334: - Commit

[GitHub] [lucene] mayya-sharipova merged pull request #11: LUCENE-9334 Consistency of field data structures

2021-04-14 Thread GitBox
mayya-sharipova merged pull request #11: URL: https://github.com/apache/lucene/pull/11 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[jira] [Resolved] (LUCENE-9844) Document the disk layout of Lucene90VectorFormat

2021-04-14 Thread Michael Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Sokolov resolved LUCENE-9844. - Resolution: Fixed > Document the disk layout of Lucene90VectorFormat >

[jira] [Commented] (LUCENE-9850) Explore PFOR for Doc ID delta encoding (instead of FOR)

2021-04-14 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321173#comment-17321173 ] Greg Miller commented on LUCENE-9850: - Thanks [~jpountz] and [~rcmuir] for the guidance and reviews!

[jira] [Commented] (LUCENE-9929) Make ScandinavianNormalizationFilter configurable wrt foldings

2021-04-14 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321157#comment-17321157 ] Robert Muir commented on LUCENE-9929: - Adding new tokenfilters isn't a breaking change for anyone.

[GitHub] [lucene] janhoy opened a new pull request #84: LUCENE-9929 Make ScandinavianNormalizationFilter configurable wrt fol…

2021-04-14 Thread GitBox
janhoy opened a new pull request #84: URL: https://github.com/apache/lucene/pull/84 See https://issues.apache.org/jira/browse/LUCENE-9929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Commented] (LUCENE-9929) Make ScandinavianNormalizationFilter configurable wrt foldings

2021-04-14 Thread Jira
[ https://issues.apache.org/jira/browse/LUCENE-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321102#comment-17321102 ] Jan Høydahl commented on LUCENE-9929: - We could, but it would be a breaking change. The language

[jira] [Commented] (LUCENE-9929) Make ScandinavianNormalizationFilter configurable wrt foldings

2021-04-14 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321068#comment-17321068 ] Robert Muir commented on LUCENE-9929: - Can we just add filters for each of the 3 languages instead?

[jira] [Commented] (LUCENE-9929) Make ScandinavianNormalizationFilter configurable wrt foldings

2021-04-14 Thread Jira
[ https://issues.apache.org/jira/browse/LUCENE-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321066#comment-17321066 ] Jan Høydahl commented on LUCENE-9929: - [~rcmuir] any comments? > Make

[jira] [Updated] (LUCENE-9929) Make ScandinavianNormalizationFilter configurable wrt foldings

2021-04-14 Thread Jira
[ https://issues.apache.org/jira/browse/LUCENE-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated LUCENE-9929: Description: The ScandinavianNormalizationFilter applies foldings for aa, ao, ae, oe and oo. But

[jira] [Created] (LUCENE-9929) Make ScandinavianNormalizationFilter configurable wrt foldings

2021-04-14 Thread Jira
Jan Høydahl created LUCENE-9929: --- Summary: Make ScandinavianNormalizationFilter configurable wrt foldings Key: LUCENE-9929 URL: https://issues.apache.org/jira/browse/LUCENE-9929 Project: Lucene - Core

[GitHub] [lucene] rmuir commented on pull request #82: LUCENE-9928: speed up analysis/icu regeneration

2021-04-14 Thread GitBox
rmuir commented on pull request #82: URL: https://github.com/apache/lucene/pull/82#issuecomment-819533286 @dweiss, thanks for the assistance. using the additional cores shaved another 10s off on my mac. bench of latest patch on mac (uses LLVM/clang vs GCC, has 6 cores): ```

[jira] [Resolved] (LUCENE-9850) Explore PFOR for Doc ID delta encoding (instead of FOR)

2021-04-14 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9850. -- Fix Version/s: main (9.0) Resolution: Fixed Thanks [~gsmiller]! > Explore PFOR for

[jira] [Resolved] (LUCENE-9387) Remove RAM accounting from LeafReader

2021-04-14 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9387. -- Resolution: Fixed > Remove RAM accounting from LeafReader >

[jira] [Commented] (LUCENE-9387) Remove RAM accounting from LeafReader

2021-04-14 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320973#comment-17320973 ] ASF subversion and git services commented on LUCENE-9387: - Commit

[GitHub] [lucene] jpountz merged pull request #79: LUCENE-9387: Remove CodecReader#ramBytesUsed.

2021-04-14 Thread GitBox
jpountz merged pull request #79: URL: https://github.com/apache/lucene/pull/79 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[jira] [Commented] (LUCENE-9850) Explore PFOR for Doc ID delta encoding (instead of FOR)

2021-04-14 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320970#comment-17320970 ] ASF subversion and git services commented on LUCENE-9850: - Commit

[GitHub] [lucene] jpountz merged pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)

2021-04-14 Thread GitBox
jpountz merged pull request #69: URL: https://github.com/apache/lucene/pull/69 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [lucene] rmuir commented on pull request #83: LUCENE-9798 : Fix looping bug and made Full Knn calculation parallelizable

2021-04-14 Thread GitBox
rmuir commented on pull request #83: URL: https://github.com/apache/lucene/pull/83#issuecomment-819477655 > Fixed the bug and also made the code to execute parallelly, so as to take less time for large document vector files. please, these need to be 2 separate issues. -- This is

[GitHub] [lucene] jpountz commented on pull request #81: LUCENE-9335: [WIP] Speed up pure disjunction with BMM

2021-04-14 Thread GitBox
jpountz commented on pull request #81: URL: https://github.com/apache/lucene/pull/81#issuecomment-819463702 @zacharymorn I'm a bit confused as you seem to be assuming that `DisjunctionSumScorer` implements BMM when in fact it doesn't? Lucene has no support for BMM currently. -- This is

[jira] [Commented] (LUCENE-9798) Fix looping bug when calculating full KNN results in KnnGraphTester

2021-04-14 Thread Nitiraj Rathore (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320794#comment-17320794 ] Nitiraj Rathore commented on LUCENE-9798: - Hi Michael, Sorry, my bad. I think I got confused

[GitHub] [lucene] nitirajrathore opened a new pull request #83: LUCENE-9798 : Fix looping bug and made Full Knn calculation parallelizable

2021-04-14 Thread GitBox
nitirajrathore opened a new pull request #83: URL: https://github.com/apache/lucene/pull/83 # Description There was a bug in the KNN Tester where in for large document vector files only first set of documents will be considered for Full Knn calculation # Solution Fixed

[GitHub] [lucene] zacharymorn edited a comment on pull request #81: LUCENE-9335: [WIP] Speed up pure disjunction with BMM

2021-04-14 Thread GitBox
zacharymorn edited a comment on pull request #81: URL: https://github.com/apache/lucene/pull/81#issuecomment-819252139 Hi Adrien, I've pushed up two additional commits with different changes, and run luceneutil to get multiple benchmark results: --- Commit :

[GitHub] [lucene] zacharymorn commented on pull request #81: LUCENE-9335: [WIP] Speed up pure disjunction with BMM

2021-04-14 Thread GitBox
zacharymorn commented on pull request #81: URL: https://github.com/apache/lucene/pull/81#issuecomment-819252139 Hi Adrien, I've pushed up two additional commits with different changes, and run luceneutil to get multiple benchmark results: --- Commit :