from:"Dawid Weiss \(Jira\)"

[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert

2022-08-18 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581246#comment-17581246
 ] 

Dawid Weiss commented on LUCENE-10662:
--

Yep, closed it just now, thanks.

> Make LuceneTestCase to not extend from org.junit.Assert
> ---
>
> Key: LUCENE-10662
> URL: https://issues.apache.org/jira/browse/LUCENE-10662
> Project: Lucene - Core
>  Issue Type: Test
>  Components: general/test
>Reporter: Marios Trivyzas
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since *LuceneTestCase* is a very useful abstract class that can be extended 
> and used by many projects, having it extending *org.junit.Assert* limits all 
> users to exclusively use the static methods of {*}org.junit.Assert{*}. In our 
> project we want to use [https://joel-costigliola.github.io/assertj] where the 
> main method to call is *org.assertj.core.api.Assertions.assertThat* which 
> conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized 
> by default by the compiler. So one can only use assertj if on every call uses 
> fully qualified name for the *assertThat* method, i.e.
>  
> {code:java}
> org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert

2022-08-18 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10662.
--
Resolution: Won't Do

> Make LuceneTestCase to not extend from org.junit.Assert
> ---
>
> Key: LUCENE-10662
> URL: https://issues.apache.org/jira/browse/LUCENE-10662
> Project: Lucene - Core
>  Issue Type: Test
>  Components: general/test
>Reporter: Marios Trivyzas
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since *LuceneTestCase* is a very useful abstract class that can be extended 
> and used by many projects, having it extending *org.junit.Assert* limits all 
> users to exclusively use the static methods of {*}org.junit.Assert{*}. In our 
> project we want to use [https://joel-costigliola.github.io/assertj] where the 
> main method to call is *org.assertj.core.api.Assertions.assertThat* which 
> conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized 
> by default by the compiler. So one can only use assertj if on every call uses 
> fully qualified name for the *assertThat* method, i.e.
>  
> {code:java}
> org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert

2022-08-15 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579703#comment-17579703
 ] 

Dawid Weiss commented on LUCENE-10662:
--

Hi [~matriv] . I don't think we'll integrate this change. You may have to 
prefix your assertj static methods in your code or derive your own base class 
based on LuceneTestCase. Thanks for bringing the problem to our attention 
though. I agree assertj outputs are much nicer to read (especially for 
collections).

> Make LuceneTestCase to not extend from org.junit.Assert
> ---
>
> Key: LUCENE-10662
> URL: https://issues.apache.org/jira/browse/LUCENE-10662
> Project: Lucene - Core
>  Issue Type: Test
>  Components: general/test
>Reporter: Marios Trivyzas
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since *LuceneTestCase* is a very useful abstract class that can be extended 
> and used by many projects, having it extending *org.junit.Assert* limits all 
> users to exclusively use the static methods of {*}org.junit.Assert{*}. In our 
> project we want to use [https://joel-costigliola.github.io/assertj] where the 
> main method to call is *org.assertj.core.api.Assertions.assertThat* which 
> conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized 
> by default by the compiler. So one can only use assertj if on every call uses 
> fully qualified name for the *assertThat* method, i.e.
>  
> {code:java}
> org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10677) Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale

2022-08-10 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577859#comment-17577859
 ] 

Dawid Weiss commented on LUCENE-10677:
--

> It looks like we might be able to intercept the relevant calls to 
> `DataInput#readString` ourselves, although adding support for compound 
> segments introduces an enormous amount of extra complexity to that approach. 

With the right tools it shouldn't be a problem. A hot-mode aspectj aspect that 
would deduplicate those strings selectively, where it matters, comes to mind.

This said, perhaps there are cleaner solutions to solve this elegantly. Feel 
free to propose a patch (but no String.intern, please...).

> Duplicate strings in FieldInfo#attributes contribute significantly to heap 
> usage at scale
> -
>
> Key: LUCENE-10677
> URL: https://issues.apache.org/jira/browse/LUCENE-10677
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/codecs
>Affects Versions: 9.3
>Reporter: Armin Braun
>Priority: Minor
>  Labels: heap, scalability
> Attachments: lucene_duplicate_fields.png
>
>
> This has the same origin as issue LUCENE-10676 . Running a single process 
> with thousands of fields across many indexes will lead to a lot of duplicate 
> strings retained as keys and values in the `attributes` map. This can amount 
> to GBs of heap for thousands of fields across a few thousand segments. The 
> strings in the below heap dump analysis account for more than half  (roughly 
> 2/3 and the field names are somewhat unusually long in this example) the 
> duplicate strings from `FieldInfo` instances.
> If we could deduplicate theses obvious known strings when reading `FieldInfo` 
> we could save GBs of heap for use cases like this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10677) Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale

2022-08-09 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577468#comment-17577468
 ] 

Dawid Weiss commented on LUCENE-10677:
--

String.intern is evil for many reasons and your use case is indeed, ahem, 
atypical. I don't think adding "a few known strings" is an elegant solution 
since hacks like this one tend to become stale quickly... You could try the 
JVM's UseStringDeduplication option - an ugly workaround but easy one - but I 
think you'll run into other problems soon enough with this number of concurrent 
indices/segments/fields. If you have to live with this then it's likely that 
you'll have to follow Rob's advice sooner or later.

> Duplicate strings in FieldInfo#attributes contribute significantly to heap 
> usage at scale
> -
>
> Key: LUCENE-10677
> URL: https://issues.apache.org/jira/browse/LUCENE-10677
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/codecs
>Affects Versions: 9.3
>Reporter: Armin Braun
>Priority: Minor
>  Labels: heap, scalability
> Attachments: lucene_duplicate_fields.png
>
>
> This has the same origin as issue LUCENE-10676 . Running a single process 
> with thousands of fields across many indexes will lead to a lot of duplicate 
> strings retained as keys and values in the `attributes` map. This can amount 
> to GBs of heap for thousands of fields across a few thousand segments. The 
> strings in the below heap dump analysis account for more than half  (roughly 
> 2/3 and the field names are somewhat unusually long in this example) the 
> duplicate strings from `FieldInfo` instances.
> If we could deduplicate theses obvious known strings when reading `FieldInfo` 
> we could save GBs of heap for use cases like this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10677) Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale

2022-08-09 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577408#comment-17577408
 ] 

Dawid Weiss commented on LUCENE-10677:
--

It would help a lot if you could provide an example of how you ended up with 25 
million FieldInfo objects that cannot be garbage collected. This is weird and 
unexpected, I'd say.

> Duplicate strings in FieldInfo#attributes contribute significantly to heap 
> usage at scale
> -
>
> Key: LUCENE-10677
> URL: https://issues.apache.org/jira/browse/LUCENE-10677
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/codecs
>Affects Versions: 9.3
>Reporter: Armin Braun
>Priority: Minor
>  Labels: heap, scalability
> Attachments: lucene_duplicate_fields.png
>
>
> This has the same origin as issue LUCENE-10676 . Running a single process 
> with thousands of fields across many indexes will lead to a lot of duplicate 
> strings retained as keys and values in the `attributes` map. This can amount 
> to GBs of heap for thousands of fields across a few thousand segments. The 
> strings in the below heap dump analysis account for more than half  (roughly 
> 2/3 and the field names are somewhat unusually long in this example) the 
> duplicate strings from `FieldInfo` instances.
> If we could deduplicate theses obvious known strings when reading `FieldInfo` 
> we could save GBs of heap for use cases like this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10386) Add BOM module for ease of dependency management in dependent projects

2022-08-07 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576342#comment-17576342
 ] 

Dawid Weiss commented on LUCENE-10386:
--

I'm for closing this.

> Add BOM module for ease of dependency management in dependent projects
> --
>
> Key: LUCENE-10386
> URL: https://issues.apache.org/jira/browse/LUCENE-10386
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: general/build
>Affects Versions: 9.0, 8.4, 8.11.1
>Reporter: Petr Portnov
>Priority: Trivial
>  Labels: BOM, Dependencies
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h1. Short description
> Add a Bill-of-Materials (BOM) module to Lucene to allow foreign projects to 
> use it for dependency management.
> h1. Reasoning
> [A lot of|https://mvnrepository.com/search?q=bom] multi-module projects are 
> providing BOMs in order to simplify dependency management. This allows 
> dependant projects to only specify the version of the BOM module while 
> declaring the dependencies without them (as the will be provided by BOM).
> For example:
> {code:groovy}
> dependencies {
> // Only specify the version of the BOM
> implementation platform('com.fasterxml.jackson:jackson-bom:2.13.1')
> // Don't specify dependency versions as they are provided by the BOM
> implementation "com.fasterxml.jackson.core:jackson-annotations"
> implementation "com.fasterxml.jackson.core:jackson-core"
> implementation "com.fasterxml.jackson.core:jackson-databind"
> implementation "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-guava"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jdk8"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jsr310"
> implementation 
> "com.fasterxml.jackson.module:jackson-module-parameter-names"
> }{code}
>  
> Not only is this approach "popular" but it also has the following pros:
>  * there is no need to declare a variable (via Maven properties or Gradle 
> ext) to hold the version
>  * this is more automation-friendly because tools like Dependabot only have 
> to update the single version per dependency group
> h1. Other suggestions
> It may be reasonable to also publish BOMs for old versions so that the 
> projects which currently rely on older Lucene versions (such as 8.4) can 
> migrate to the BOM approach without migrating to Lucene 9.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] (LUCENE-10671) Lucene

2022-08-01 Thread Dawid Weiss (Jira)



[ https://issues.apache.org/jira/browse/LUCENE-10671 ]


Dawid Weiss deleted comment on LUCENE-10671:
--

was (Author: JIRAUSER293699):
https://allnewcracksoftwares.com/typing-master-pro-11-crack-with-serial-keys-download/

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10671) Lucene

2022-08-01 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573638#comment-17573638
 ] 

Dawid Weiss commented on LUCENE-10671:
--

Spammer.

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10671) Lucene

2022-08-01 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10671:
-
Environment: (was: 
https://allnewcracksoftwares.com/avast-secure-line-vpn-crack-download-with-key-latest-version/)

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Closed] (LUCENE-10671) Lucene

2022-08-01 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss closed LUCENE-10671.


> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
> Environment: 
> https://allnewcracksoftwares.com/avast-secure-line-vpn-crack-download-with-key-latest-version/
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10671) Lucene

2022-08-01 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10671.
--
Resolution: Invalid

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
> Environment: 
> https://allnewcracksoftwares.com/avast-secure-line-vpn-crack-download-with-key-latest-version/
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10669) The build should be more helpful when generated resources are touched

2022-07-30 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10669.
--
Fix Version/s: 9.4
   Resolution: Fixed

> The build should be more helpful when generated resources are touched
> -
>
> Key: LUCENE-10669
> URL: https://issues.apache.org/jira/browse/LUCENE-10669
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 10.0 (main)
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As per discussion at [https://github.com/apache/lucene/pull/1016,] it'd be 
> good if a build failure could point at the sources and generated files of the 
> task for which checksums are mismatched (signaling either modified templates 
> or accidentally modified generated files).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert

2022-07-29 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573181#comment-17573181
 ] 

Dawid Weiss commented on LUCENE-10662:
--

> Why is it necessary to break the inheritance in order to achieve what is 
> wanted here? 

I think it's because static imports won't be resolved properly in a subclass if 
there's an "assertThat" method in a superclass, which would require the kind of 
delegation trickery you mentioned, Mike. 

I'm a bit torn on this one, actually. I like aspectj but it does seem like 
changing LuceneTestCase's inheritance may be too invasive for both Lucene and 
existing downstream projects that rely on it.

> Make LuceneTestCase to not extend from org.junit.Assert
> ---
>
> Key: LUCENE-10662
> URL: https://issues.apache.org/jira/browse/LUCENE-10662
> Project: Lucene - Core
>  Issue Type: Test
>  Components: general/test
>Reporter: Marios Trivyzas
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since *LuceneTestCase* is a very useful abstract class that can be extended 
> and used by many projects, having it extending *org.junit.Assert* limits all 
> users to exclusively use the static methods of {*}org.junit.Assert{*}. In our 
> project we want to use [https://joel-costigliola.github.io/assertj] where the 
> main method to call is *org.assertj.core.api.Assertions.assertThat* which 
> conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized 
> by default by the compiler. So one can only use assertj if on every call uses 
> fully qualified name for the *assertThat* method, i.e.
>  
> {code:java}
> org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10669) The build should be more helpful when generated resources are touched

2022-07-29 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573059#comment-17573059
 ] 

Dawid Weiss commented on LUCENE-10669:
--

PR is at: [https://github.com/apache/lucene/pull/1053]

Essentially it prints the inputs and outputs of the regeneration task (from 
which checksums are computed). It won't help if the sources for generation are 
non-files (only properties) but it's better than before?

{code}
> Task :lucene:core:utilGenPackedChecksumCheck FAILED

FAILURE: Build failed with an exception.

* Where:
Script 'C:\Work\apache\lucene\main\gradle\generation\regenerate.gradle' line: 
186

* What went wrong:
Execution failed for task ':lucene:core:utilGenPackedChecksumCheck'.
> Checksums mismatch for derived resources; you might have modified a generated 
> resource (regenerate task: utilGenPacked):
  Current:

lucene/core/src/java/org/apache/lucene/util/packed/Packed64SingleBlock.java=14326081c8c6a281051f9ffe94695d2a467f3db8

  Expected:

lucene/core/src/java/org/apache/lucene/util/packed/Packed64SingleBlock.java=2680e0a7c7207ddf615f50fd22465c809904ac42

  Input files for this task are:
C:\Work\apache\lucene\main\gradle\generation\moman\gen_BulkOperation.py

C:\Work\apache\lucene\main\gradle\generation\moman\gen_Packed64SingleBlock.py

  Files generated by this task are:

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperation.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked1.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked10.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked11.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked12.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked13.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked14.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked15.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked16.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked17.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked18.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked19.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked2.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked20.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked21.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked22.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked23.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked24.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked3.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked4.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked5.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked6.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked7.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked8.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked9.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPackedSingleBlock.java

C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\Packed64SingleBlock.java
{code}

> The build should be more helpful when generated resources are touched
> -
>
> Key: LUCENE-10669
> URL: https://issues.apache.org/jira/browse/LUCENE-10669
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 10.0 (main)
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>
> As per discussion at

[jira] [Created] (LUCENE-10669) The build should be more helpful when generated resources are touched

2022-07-29 Thread Dawid Weiss (Jira)

Dawid Weiss created LUCENE-10669:


 Summary: The build should be more helpful when generated resources 
are touched
 Key: LUCENE-10669
 URL: https://issues.apache.org/jira/browse/LUCENE-10669
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 10.0 (main)
Reporter: Dawid Weiss
Assignee: Dawid Weiss


As per discussion at [https://github.com/apache/lucene/pull/1016,] it'd be good 
if a build failure could point at the sources and generated files of the task 
for which checksums are mismatched (signaling either modified templates or 
accidentally modified generated files).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert

2022-07-28 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572733#comment-17572733
 ] 

Dawid Weiss commented on LUCENE-10662:
--

I guess so. But you'd still have to go through the code and change the 
inheritance hierarchy.

> Make LuceneTestCase to not extend from org.junit.Assert
> ---
>
> Key: LUCENE-10662
> URL: https://issues.apache.org/jira/browse/LUCENE-10662
> Project: Lucene - Core
>  Issue Type: Test
>  Components: general/test
>Reporter: Marios Trivyzas
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since *LuceneTestCase* is a very useful abstract class that can be extended 
> and used by many projects, having it extending *org.junit.Assert* limits all 
> users to exclusively use the static methods of {*}org.junit.Assert{*}. In our 
> project we want to use [https://joel-costigliola.github.io/assertj] where the 
> main method to call is *org.assertj.core.api.Assertions.assertThat* which 
> conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized 
> by default by the compiler. So one can only use assertj if on every call uses 
> fully qualified name for the *assertThat* method, i.e.
>  
> {code:java}
> org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert

2022-07-28 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572636#comment-17572636
 ] 

Dawid Weiss commented on LUCENE-10662:
--

I'll send a heads up email to the mailing list so that this issue gets some 
attention.

> Make LuceneTestCase to not extend from org.junit.Assert
> ---
>
> Key: LUCENE-10662
> URL: https://issues.apache.org/jira/browse/LUCENE-10662
> Project: Lucene - Core
>  Issue Type: Test
>  Components: general/test
>Reporter: Marios Trivyzas
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since *LuceneTestCase* is a very useful abstract class that can be extended 
> and used by many projects, having it extending *org.junit.Assert* limits all 
> users to exclusively use the static methods of {*}org.junit.Assert{*}. In our 
> project we want to use [https://joel-costigliola.github.io/assertj] where the 
> main method to call is *org.assertj.core.api.Assertions.assertThat* which 
> conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized 
> by default by the compiler. So one can only use assertj if on every call uses 
> fully qualified name for the *assertThat* method, i.e.
>  
> {code:java}
> org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert

2022-07-26 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571469#comment-17571469
 ] 

Dawid Weiss commented on LUCENE-10662:
--

I think the compiler should be able to pick the most specific variant based on 
argument types, unless there really is ambiguity - I admit I haven't checked 
whether this is the case, for example here:

https://github.com/apache/lucene/pull/1049/files#diff-334836e7b61b74a76eec5aa18eacec6b14c1496f5595b684842ce05583a6df22L209-R213

> Make LuceneTestCase to not extend from org.junit.Assert
> ---
>
> Key: LUCENE-10662
> URL: https://issues.apache.org/jira/browse/LUCENE-10662
> Project: Lucene - Core
>  Issue Type: Test
>  Components: general/test
>Reporter: Marios Trivyzas
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since *LuceneTestCase* is a very useful abstract class that can be extended 
> and used by many projects, having it extending *org.junit.Assert* limits all 
> users to exclusively use the static methods of {*}org.junit.Assert{*}. In our 
> project we want to use [https://joel-costigliola.github.io/assertj] where the 
> main method to call is *org.assertj.core.api.Assertions.assertThat* which 
> conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized 
> by default by the compiler. So one can only use assertj if on every call uses 
> fully qualified name for the *assertThat* method, i.e.
>  
> {code:java}
> org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10662) Make LuceneTestCase not extending from org.junit.Assert

2022-07-26 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571418#comment-17571418
 ] 

Dawid Weiss commented on LUCENE-10662:
--

Changing these methods will require a huge follow-up and cleanup in any other 
project that uses LuceneTestCase (and there are many). I don't think people 
will be happy with it (even though my heart is with you on assertj - I also 
prefer it to what's in hamcrest/junit). 

Even if people agree to change it, looking at the patch, I wouldn't rename any 
methods (assertEquals becomes assertEquality) - this will be even more 
confusing for downstream users. I'd remove the extend and assertEquals* methods 
from LuceneTestCase and move those methods into a separate class (like 
LuceneAssertions or something) - then the upgrade would be about importing them 
statically from junit's Assert or LuceneAssertions.

Again, I'm not convinced this is a necessary improvement. I've lived with an 
explicit Assertions.* call from assertj - this is fine and explicit. And even 
used within Lucene code itself:

[https://github.com/apache/lucene/blob/main/lucene/distribution.tests/src/test/org/apache/lucene/distribution/TestModularLayer.java#L117]

> Make LuceneTestCase not extending from org.junit.Assert
> ---
>
> Key: LUCENE-10662
> URL: https://issues.apache.org/jira/browse/LUCENE-10662
> Project: Lucene - Core
>  Issue Type: Test
>  Components: general/test
>Reporter: Marios Trivyzas
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since *LuceneTestCase* is a very useful abstract class that can be extended 
> and used by many projects, having it extending *org.junit.Assert* limits all 
> users to exclusively use the static methods of {*}org.junit.Assert{*}. In our 
> project we want to use [https://joel-costigliola.github.io/assertj] where the 
> main method to call is *org.assertj.core.api.Assertions.assertThat* which 
> conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized 
> by default by the compiler. So one can only use assertj if on every call uses 
> fully qualified name for the *assertThat* method, i.e.
>  
> {code:java}
> org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10643) Lucene Jenkins CI - s390x support

2022-07-07 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563795#comment-17563795
 ] 

Dawid Weiss commented on LUCENE-10643:
--

The timeout is caused by a hard limit in jenkins that should be configurable 
via system properties -

 
[https://www.jenkins.io/doc/book/managing/system-properties/#hudson-filepath-validate_ant_file_mask_bound]
 
we never got around to locating how this can be done though.

> Lucene Jenkins CI - s390x support 
> --
>
> Key: LUCENE-10643
> URL: https://issues.apache.org/jira/browse/LUCENE-10643
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nayana Thorat
>Assignee: Uwe Schindler
>Priority: Major
>  Labels: jenkins
>
> This issue adds Lucene builds on ASF Jenkins with S390x architecture (big 
> endian).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10631) Consolidate java version numbers in one place and reuse them across build parts

2022-06-29 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Lucene - Core /  LUCENE-10631  
 
 
  Consolidate java version numbers in one place and reuse them across build parts   
 

  
 
 
 
 

 
Issue Type: 
  Sub-task  
 
 
Assignee: 
 Unassigned  
 
 
Created: 
 29/Jun/22 12:43  
 
 
Priority: 
  Minor  
 
 
Reporter: 
 Dawid Weiss  
 

  
 
 
 
 

 
 [R. Muir/ mailing list discussions] Ideally we could consolidate a lot of them in a simple .properties file that contains the min/max major version numbers. could be then sucked in by: 
 
gradle logic 
java logic such as checks done in WrapperDownloader 
bash logic such as error messaging in ./gradlew.sh 
python smoketester logic? 
  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-29 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 https://gitbox.apache.org/schemes.cgi?lucene-jira-archive   Something seems wrong. According to https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features, the update should be approved via an e-mail sent to private mailing list - I don't see any such email yet.   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9) 

 If image attachments aren't displayed, see 
this article.

[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-29 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Lucene - Core /  LUCENE-10557  
 
 
  Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
Change By: 
 Dawid Weiss  
 
 
Attachment: 
 image-2022-06-29-13-36-57-365.png  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-29 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-29 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 Done. 

 

Your repository has been created and will be available for use within a few minutes.
Your project is available on gitbox at: https://gitbox.apache.org/repos/asf/lucene-jira-archive.git
Your project is available on GitHub at: https://github.com/apache/lucene-jira-archive.git
User permissions should be set up within the next five minutes. If not, please let us know at: us...@infra.apache.org  

  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 I would only print the "(versions: 1)" if it's > 1.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 I would leave those issue numbers as they were - like I said, these issue numbers are widely mentioned everywhere (mailing list archives, etc.) and I don't think they should be replaced. Spring redirects Jira URLs to their corresponding ported github issues - this is a much better resolution, I think.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-27 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 > Do we need a git repository at all? We won't need version control for the files. Is a file storage sufficient and easy to handle if we can have one? My hope was that these attachments could be stored in the primary git repository for convenience - keeping the historical artifacts together and having them served for free via github's infrastructure. It's also just convenient as it can be modified/ updated by multiple people (and those same people can freeze the repository for updates, once the migration is complete). Having those artifacts elsewhere (on home.apache.org) lacks some of these conveniences but it's fine too, of course. Also, I don't think infra will have any problem in adding a repository called "lucene-archives" or something like this. I can ask if we decide to push in this direction.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-27 Thread Dawid Weiss (Jira)

Title: Message Title


 
 
 
 

 
 
 

 
   
 Dawid Weiss edited a comment on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 I toyed with attachments a bit. * I've modified Tomoko's code a bit so that it fetches attachments for each issue and places is under {{{}attachments/LUCENE-xyz/blob.ext{}}}. * I fetched about half of the attachments from Jira and they total ~350MB. So they're quite large but not unbearably large. * I created a separate test repository ( [ https://github.com/dweiss/lucene-jira-migration ] ), with a subset of attachment blobs and an example issue ( [ https://github.com/dweiss/lucene-jira-migration/issues/1 ] ) that links to them via gh-pages service URLs. Seems to work (mime types, etc.). * The test repository has an orphaned (separate root) branch for just the attachment blobs but they're still downloaded when you clone the master branch (which I kind of hoped could be avoided). This means that we'd have to either ask infra to create a separate repository for the ported attachments or keep those attachments in the main Lucene repository (and pay the price of an extra ~1GB of download size when doing a full clone). * I didn't check for multiple attachments with the same name (perhaps it's uncommon but definitely possible) - these would have to be saved under a subfolder or something, so that they can be distinguished. * A mapping of original attachment URLs and new attachment URLs could also be preserved/ written. * Since the attachments are a git repository, they should be searchable but for some reason it didn't work for me (maybe needs time to update the index). This is just an experiment, I don't mean to imply it has to be done (or should). I was just curious as to what's possible.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-27 Thread Dawid Weiss (Jira)

Title: Message Title

Dawid Weiss commented on LUCENE-10557

Re: Migrate to GitHub issue from Jira

I toyed with attachments a bit.

I've modified Tomoko's code a bit so that it fetches attachments for each issue and places is under attachments/LUCENE-xyz/blob.ext.
I fetched about half of the attachments from Jira and they total ~350MB. So they're quite large but not unbearably large.
I created a separate test repository (https://github.com/dweiss/lucene-jira-migration), with a subset of attachment blobs and an example issue (https://github.com/dweiss/lucene-jira-migration/issues/1) that links to them via gh-pages service URLs. Seems to work (mime types, etc.).
The test repository has an orphaned (separate root) branch for just the attachment blobs but they're still downloaded when you clone the master branch (which I kind of hoped could be avoided). This means that we'd have to either ask infra to create a separate repository for the ported attachments or keep those attachments in the main Lucene repository (and pay the price of an extra ~1GB of download size when doing a full clone).
I didn't check for multiple attachments with the same name (perhaps it's uncommon but definitely possible) - these would have to be saved under a subfolder or something, so that they can be distinguished.
A mapping of original attachment URLs and new attachment URLs could also be preserved/ written.
Since the attachments are a git repository, they should be searchable but for some reason it didn't work for me (maybe needs time to update the index).

Add Comment

This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)

[jira] [Resolved] (LUCENE-10607) NRTSuggesterBuilder扩展input时溢出

2022-06-22 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10607.
--
Fix Version/s: 9.3
   Resolution: Fixed

> NRTSuggesterBuilder扩展input时溢出
> -
>
> Key: LUCENE-10607
> URL: https://issues.apache.org/jira/browse/LUCENE-10607
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/FSTs
>Affects Versions: 9.2
>Reporter: chaseny
>Priority: Major
> Fix For: 9.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> suggest模块在创建索引时，调用NRTSuggestBuilder的finishTerm来写入suggest索引。
> 会调用maxNumArcsForDedupByte函数来扩展analyzed,向后扩展3 5 7  255。
> 当entries长度过长（900）时，调用maxNumArcsForDedupByte扩展时
>  
> private static int maxNumArcsForDedupByte(int currentNumDedupBytes) {
> int maxArcs = 1 + (2 * currentNumDedupBytes);
> if (currentNumDedupBytes > 5)
> { maxArcs *= currentNumDedupBytes;  
> //当currentNumDedupBytes大于等于32768时，int相乘会大于int最大值 }
> return Math.min(maxArcs, 255);
> }
>  
> 另外在扩展时，是否可以选择固定4字节来有序扩展。代替 3 5 7 ... 255的扩展方式
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-20 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556306#comment-17556306
 ] 

Dawid Weiss commented on LUCENE-10557:
--

I've verified that searches for old issue numbers seem to work:
https://github.com/mocobeta/sandbox-lucene-10557/search?q=%22LUCENE-1%22+in%3Atitle&type=issues

I'm more familiar with the "hierarchical" tags like "affects/xyz" or "type/bug" 
but I can live with the comma version. Good to have some of the metadata 
transferred as well, even as a plain text content in the issue description.

> Migrate to GitHub issue from Jira
> -
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * (/) Get a consensus about the migration among committers
>  * Choose issues that should be moved to GitHub
>  ** Discussion thread 
> [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
>  ** -Conclusion for now: We don't migrate any issues. Only new issues should 
> be opened on GitHub.-
>  ** Write a prototype migration script - the decision could be made on that. 
> Things to consider:
>  *** version numbers - labels or milestones?
>  *** add a comment/ prepend a link to the source Jira issue on github side,
>  *** add a comment/ prepend a link on the jira side to the new issue on 
> github side (for people who access jira from blogs, mailing list archives and 
> other sources that will have stale links),
>  *** convert cross-issue automatic links in comments/ descriptions (as 
> suggested by Robert),
>  *** strategy to deal with sub-issues (hierarchies),
>  *** maybe prefix (or postfix) the issue title on github side with the 
> original LUCENE-XYZ key so that it is easier to search for a particular issue 
> there?
>  *** how to deal with user IDs (author, reporter, commenters)? Do they have 
> to be github users? Will information about people not registered on github be 
> lost?
>  *** create an extra mapping file of old-issue-new-issue URLs for any 
> potential future uses. 
>  *** what to do with issue numbers in git/svn commits? These could be 
> rewritten but it'd change the entire git history tree - I don't think this is 
> practical, while doable.
>  * Build the convention for issue label/milestone management
>  ** Do some experiments on a sandbox repository 
> [https://github.com/mocobeta/sandbox-lucene-10557]
>  ** Make documentation for metadata (label/milestone) management 
>  * Enable Github issue on the lucene's repository
>  ** Raise an issue on INFRA
>  ** (Create an issue-only private repository for sensitive issues if it's 
> needed and allowed)
>  ** Set a mail hook to 
> [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to 
> the general mail group name)
>  * Set a schedule for migration
>  ** Give some time to committers to play around with issues/labels/milestones 
> before the actual migration
>  ** Make an announcement on the mail lists
>  ** Show some text messages when opening a new Jira issue (in issue template?)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-17 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10557:
-
Description: 
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** -Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.-
 ** Write a prototype migration script - the decision could be made on that. 
Things to consider:
 *** version numbers - labels or milestones?
 *** add a comment/ prepend a link to the source Jira issue on github side,
 *** add a comment/ prepend a link on the jira side to the new issue on github 
side (for people who access jira from blogs, mailing list archives and other 
sources that will have stale links),
 *** convert cross-issue automatic links in comments/ descriptions (as 
suggested by Robert),
 *** strategy to deal with sub-issues (hierarchies),
 *** maybe prefix (or postfix) the issue title on github side with the original 
LUCENE-XYZ key so that it is easier to search for a particular issue there?
 *** how to deal with user IDs (author, reporter, commenters)? Do they have to 
be github users? Will information about people not registered on github be lost?
 *** create an extra mapping file of old-issue-new-issue URLs for any potential 
future uses. 
 *** what to do with issue numbers in git/svn commits? These could be rewritten 
but it'd change the entire git history tree - I don't think this is practical, 
while doable.
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to 
[issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the 
general mail group name)
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)

  was:
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** -Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.-
 ** Write a prototype migration script - the decision could be made on that. 
Things to consider:
 *** version numbers - labels or milestones?
 *** add a comment/ prepend a link to the source Jira issue on github side,
 *** add a comment/ prepend a link on the jira side to the new issue on github 
side (for people who access jira from blogs, mailing list archives and other 
sources that will have stale links),
 *** convert cross-issue automatic links in comments/ descriptions (as 
suggested by Robert),
 *** maybe prefix (or postfix) the issue title on github side with the original 
LUCENE-XYZ key so that it is easier to search for a particular issue there?
 *** how to deal with user IDs (author, reporter, commenters)? Do they have to 
be github users? Will information about people not registered on github be lost?
 *** create an extra mapping file of old-issue-new-issue URLs for any potential 
future uses. 
 *** what to do with issue numbers in git/svn commits? These could be rewritten 
but it'd change the entire git history tree - I don't think this is practical, 
while doable.
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milesto

[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-17 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10557:
-
Description: 
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** -Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.-
 ** Write a prototype migration script - the decision could be made on that. 
Things to consider:
 *** version numbers - labels or milestones?
 *** add a comment/ prepend a link to the source Jira issue on github side,
 *** add a comment/ prepend a link on the jira side to the new issue on github 
side (for people who access jira from blogs, mailing list archives and other 
sources that will have stale links),
 *** convert cross-issue automatic links in comments/ descriptions (as 
suggested by Robert),
 *** maybe prefix (or postfix) the issue title on github side with the original 
LUCENE-XYZ key so that it is easier to search for a particular issue there?
 *** how to deal with user IDs (author, reporter, commenters)? Do they have to 
be github users? Will information about people not registered on github be lost?
 *** create an extra mapping file of old-issue-new-issue URLs for any potential 
future uses. 
 *** what to do with issue numbers in git/svn commits? These could be rewritten 
but it'd change the entire git history tree - I don't think this is practical, 
while doable.
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to 
[issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the 
general mail group name)
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)

  was:
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** -Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.-
 ** Write a prototype migration script - the decision could be made on that
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to 
[issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the 
general mail group name)
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)


> Migrate to GitHub issue from Jira
> -
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.co

[jira] [Commented] (LUCENE-10615) Add license information for SmartChineseAnalyzer to NOTICE.txt

2022-06-14 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554039#comment-17554039
 ] 

Dawid Weiss commented on LUCENE-10615:
--

I think the reference you're looking for is here:
https://github.com/apache/lucene/blob/main/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.java#L44-L45

although these web sites and their associated resources vanish over time.

> Add license information for SmartChineseAnalyzer to NOTICE.txt
> --
>
> Key: LUCENE-10615
> URL: https://issues.apache.org/jira/browse/LUCENE-10615
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Jan Dornseifer
>Priority: Trivial
>
> The Lucene NOTICE file contains the statement
> The SmartChineseAnalyzer source code (smartcn) was
> provided by Xiaoping Gao and copyright 2009 by 
> [www.imdict.net.|http://www.imdict.net./]
> without providing license information. Can this information be supplemented 
> or is it even outdated?
> We are using Apache Lucene v8.4.1. We are currently subject to a license 
> audit of our software, where also 3rd party FOSS components are checked for 
> usage. Among other things, this part came to our attention. I would be very 
> grateful for information.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10613) Clean up outdated NOTICE.txt information concerning morfologik

2022-06-13 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10613.
--
  Assignee: Dawid Weiss
Resolution: Fixed

> Clean up outdated NOTICE.txt information concerning morfologik
> --
>
> Key: LUCENE-10613
> URL: https://issues.apache.org/jira/browse/LUCENE-10613
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 9.3
>
>
> It's been pointed out to me that NOTICE.txt contains information about 
> licensing terms that are outdated with regard to what Lucene uses nowadays. 
> It's a trivial update.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10613) Clean up outdated NOTICE.txt information concerning morfologik

2022-06-13 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10613:
-
Fix Version/s: 9.3

> Clean up outdated NOTICE.txt information concerning morfologik
> --
>
> Key: LUCENE-10613
> URL: https://issues.apache.org/jira/browse/LUCENE-10613
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Priority: Trivial
> Fix For: 9.3
>
>
> It's been pointed out to me that NOTICE.txt contains information about 
> licensing terms that are outdated with regard to what Lucene uses nowadays. 
> It's a trivial update.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10613) Clean up outdated NOTICE.txt information concerning morfologik

2022-06-13 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10613:
-
Description: It's been pointed out to me that NOTICE.txt contains 
information about licensing terms that are outdated with regard to what Lucene 
uses nowadays. It's a trivial update.

> Clean up outdated NOTICE.txt information concerning morfologik
> --
>
> Key: LUCENE-10613
> URL: https://issues.apache.org/jira/browse/LUCENE-10613
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Priority: Trivial
>
> It's been pointed out to me that NOTICE.txt contains information about 
> licensing terms that are outdated with regard to what Lucene uses nowadays. 
> It's a trivial update.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10613) Clean up outdated NOTICE.txt information concerning morfologik

2022-06-13 Thread Dawid Weiss (Jira)

Dawid Weiss created LUCENE-10613:


 Summary: Clean up outdated NOTICE.txt information concerning 
morfologik
 Key: LUCENE-10613
 URL: https://issues.apache.org/jira/browse/LUCENE-10613
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Dawid Weiss






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10610) RunAutomaton#hashCode() can easily cause hash collision for different Automatons

2022-06-11 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553133#comment-17553133
 ] 

Dawid Weiss commented on LUCENE-10610:
--

> I do not think we need to discuss if equals/hashCode ensures that two 
> automatons are semantically equal (describe state machine with same behaviour)

This is, in general, a hard problem.

> RunAutomaton#hashCode() can easily cause hash collision for different 
> Automatons
> 
>
> Key: LUCENE-10610
> URL: https://issues.apache.org/jira/browse/LUCENE-10610
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Tomoko Uchida
>Priority: Minor
>
> Current RunAutomaton#hashCode() is:
> {code:java}
>   @Override
>   public int hashCode() {
> final int prime = 31;
> int result = 1;
> result = prime * result + alphabetSize;
> result = prime * result + points.length;
> result = prime * result + size;
> return result;
>   }
> {code}
> Since it does not take account of the contents of the {{points}} array, this 
> returns the same value for different automatons when their alphabet size and 
> state size are the same.
> For example, this test code passes.
> {code:java}
>   public void testHashCode() throws IOException {
> PrefixQuery q1 = new PrefixQuery(new Term("field", "aba"));
> PrefixQuery q2 = new PrefixQuery(new Term("field", "fee"));
> assert q1.compiled.runAutomaton.hashCode() == 
> q2.compiled.runAutomaton.hashCode();
>   }
> {code}
> I suspect this is a bug?
> Note that I think it's not a serious one; all callers of this {{hashCode()}} 
> take account of additional information when calculating their own hash value, 
> it seems there is no substantial impact on higher-level APIs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10607) NRTSuggesterBuilder扩展input

2022-06-09 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552125#comment-17552125
 ] 

Dawid Weiss commented on LUCENE-10607:
--

Could you provide a github pull request (or a patch), [~ChasenY]?

> NRTSuggesterBuilder扩展input
> --
>
> Key: LUCENE-10607
> URL: https://issues.apache.org/jira/browse/LUCENE-10607
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/FSTs
>Affects Versions: 9.2
>Reporter: ChasenYang
>Priority: Major
>
> suggest模块在创建索引时，调用NRTSuggestBuilder的finishTerm来写入suggest索引。
> 会调用maxNumArcsForDedupByte函数来扩展analyzed,向后扩展3 5 7  255。
> 当entries长度过长（900）时，调用maxNumArcsForDedupByte扩展时
>  
> private static int maxNumArcsForDedupByte(int currentNumDedupBytes) {
> int maxArcs = 1 + (2 * currentNumDedupBytes);
> if (currentNumDedupBytes > 5) {
> maxArcs *= currentNumDedupBytes;  
> //当currentNumDedupBytes大于等于32768时，int相乘会大于int最大值
> }
> return Math.min(maxArcs, 255);
> }
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10607) NRTSuggesterBuilder扩展input

2022-06-09 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552124#comment-17552124
 ] 

Dawid Weiss commented on LUCENE-10607:
--

Thank you,  ChasenYang (and Google translate...)  I think the message is about 
integer overflow in the maxArcs computation. Since it's capped by 255, we 
should use a long or change the logic so that overflow doesn't occur.

> NRTSuggesterBuilder扩展input
> --
>
> Key: LUCENE-10607
> URL: https://issues.apache.org/jira/browse/LUCENE-10607
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/FSTs
>Affects Versions: 9.2
>Reporter: ChasenYang
>Priority: Major
>
> suggest模块在创建索引时，调用NRTSuggestBuilder的finishTerm来写入suggest索引。
> 会调用maxNumArcsForDedupByte函数来扩展analyzed,向后扩展3 5 7  255。
> 当entries长度过长（900）时，调用maxNumArcsForDedupByte扩展时
>  
> private static int maxNumArcsForDedupByte(int currentNumDedupBytes) {
> int maxArcs = 1 + (2 * currentNumDedupBytes);
> if (currentNumDedupBytes > 5) {
> maxArcs *= currentNumDedupBytes;  
> //当currentNumDedupBytes大于等于32768时，int相乘会大于int最大值
> }
> return Math.min(maxArcs, 255);
> }
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?

2022-05-30 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543987#comment-17543987
 ] 

Dawid Weiss commented on LUCENE-10557:
--

I don't think this is a problem. You just create a description with a bullet 
list and reference related issues - they do show up in mentions, I think this 
is sufficient.

> Migrate to GitHub issue from Jira?
> --
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * Get a consensus about the migration among committers
>  * Enable Github issue on the lucene's repository (currently, it is disabled 
> on it)
>  * Build the convention or rules for issue label/milestone management
>  * Choose issues that should be moved to GitHub (I think too old or obsolete 
> issues can remain Jira.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks

2022-05-26 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542778#comment-17542778
 ] 

Dawid Weiss commented on LUCENE-10510:
--

This is caused by google formatter accessing JVM internals. The first tidy 
failure tries to actually explain why it's failed - this is the message you 
were getting:
{code}
* What went wrong:
Execution failed for task ':checkJdkInternalsExportedToGradle'.
> Certain gradle tasks and plugins require access to jdk.compiler internals, 
> your gradle.properties might have just been generated or could be out of sync 
> (see help/localSettings.txt)
{code}

I'm not sure what can be improved here but feel free to suggest something to 
your liking!

> Check module access prior to running gjf/spotless/errorprone tasks
> --
>
> Key: LUCENE-10510
> URL: https://issues.apache.org/jira/browse/LUCENE-10510
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 9.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PR at: [https://github.com/apache/lucene/pull/802]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks

2022-05-26 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542668#comment-17542668
 ] 

Dawid Weiss commented on LUCENE-10510:
--

Delete your gradle.properties and allow it to regenerate from scratch. This is 
explained in localSettings.txt:
{code}
The first invocation of any task in Lucene's gradle build will generate
and save a project-local 'gradle.properties' file. This file contains
the defaults you may (but don't have to) tweak for your particular hardware
(or taste). Note there are certain settings in that file that may
be _required_ at runtime for certain plugins (an example is the spotless/
google java format plugin, which requires adding custom exports to JVM 
modules). Gradle
build only generates this file if it's not already present (it never overwrites
the defaults) -- occasionally you may have to manually delete (or move) this
file and regenerate from scratch. 
{code}

> Check module access prior to running gjf/spotless/errorprone tasks
> --
>
> Key: LUCENE-10510
> URL: https://issues.apache.org/jira/browse/LUCENE-10510
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 9.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PR at: [https://github.com/apache/lucene/pull/802]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe

2022-05-23 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541100#comment-17541100
 ] 

Dawid Weiss commented on LUCENE-10590:
--

Love the title, [~sokolov]. Very Douglas-y Adams-y.

> Indexing all zero vectors leads to heat death of the universe
> -
>
> Key: LUCENE-10590
> URL: https://issues.apache.org/jira/browse/LUCENE-10590
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael Sokolov
>Priority: Major
>
> By accident while testing something else, I ran a luceneutil test indexing 1M 
> 100d vectors where all the vectors were all zeroes. This caused indexing to 
> take a very long time (~40x normal - it did eventually complete) and the 
> search performance was similarly bad.  We should not degrade by orders of 
> magnitude with even the worst data though.
> I'm not entirely sure what the issue is, but perhaps as long as we keep 
> finding hits that are "better" we keep exploring the graph, where better 
> means (score, -docid) >= (lowest score, -docid). If that's right and all docs 
> have the same score, then we probably need to either switch to > (but this 
> could lead to poorer recall in normal cases) or introduce some kind of 
> minimum score threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10589) Fix corner case in TestKnnVectorQuery.testRandomWithFilter

2022-05-23 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540935#comment-17540935
 ] 

Dawid Weiss commented on LUCENE-10589:
--

I don't know anything about this code area but thank you for following up on 
jenkins failures, [~tomoko]!

> Fix corner case in TestKnnVectorQuery.testRandomWithFilter
> --
>
> Key: LUCENE-10589
> URL: https://issues.apache.org/jira/browse/LUCENE-10589
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{TestKnnVectorQuery.testRandomWithFilter}} can fail with 
> java.lang.UnsupportedOperationException.
> Reproducible command
> {code:java}
> ./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter 
> -Dtests.seed=1DA39B92702DAC45 -Dtests.multiplier=3
> {code}
> {code:java}
> org.apache.lucene.search.TestKnnVectorQuery > testRandomWithFilter FAILED
> java.lang.UnsupportedOperationException: exact search is not supported
> at 
> __randomizedtesting.SeedInfo.seed([1DA39B92702DAC45:6BEAC2197AD96AE0]:0)
> at 
> org.apache.lucene.search.TestKnnVectorQuery$ThrowingKnnVectorQuery.exactSearch(TestKnnVectorQuery.java:715)
> at 
> org.apache.lucene.search.KnnVectorQuery.searchLeaf(KnnVectorQuery.java:151)
> at 
> org.apache.lucene.search.KnnVectorQuery.rewrite(KnnVectorQuery.java:108)
> at 
> org.apache.lucene.search.ConstantScoreQuery.rewrite(ConstantScoreQuery.java:44)
> at 
> org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:789)
> at 
> org.apache.lucene.tests.search.AssertingIndexSearcher.rewrite(AssertingIndexSearcher.java:69)
> at 
> org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:803)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:685)
> at 
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:667)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:584)
> at 
> org.apache.lucene.search.TestKnnVectorQuery.testRandomWithFilter(TestKnnVectorQuery.java:556)
> {code}
> In some edge cases (depending on the random seed), 
> [KnnVectorQuery.java#147|https://github.com/apache/lucene/blob/fe9d26178d033f585c08a5e86708063ac0ec0c9e/lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java#L147]
>  becomes false, and then `exactSearch()` is called.
> The upper bound of [the test range query 
> (filter)|https://github.com/apache/lucene/blob/fe9d26178d033f585c08a5e86708063ac0ec0c9e/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L554]
>  could be 200 (the max value of "tag" field + 1) instead of lower + 150 to 
> make it "unrestrictive"?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10587) Rename "master seed" to "root seed" or "main seed" or so?

2022-05-22 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540614#comment-17540614
 ] 

Dawid Weiss commented on LUCENE-10587:
--

I think this message is still present in the ant task in randomized testing, 
actually. This particular word has no negative historical or emotional 
connotation to me but when I get to the code there, I'll modify it - costs me 
nothing and maybe it'll make somebody happier.

> Rename "master seed" to "root seed" or "main seed" or so?
> -
>
> Key: LUCENE-10587
> URL: https://issues.apache.org/jira/browse/LUCENE-10587
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>
> I noticed that Lucene's test infrastructure (or perhaps it's in 
> R{{{}andomizedTesting{}}} dependency?) still says things like this:
> {noformat}
> > [junit4:junit4]  says Привет! Master seed: 3296009A5B3B7A05 
> > {noformat}
> Let's rename away from the term {{{}master{}}}?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10370) Fix classpath/module path of tests forking their own Java (TestNRTReplication)

2022-05-20 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10370.
--
Resolution: Fixed

> Fix classpath/module path of tests forking their own Java (TestNRTReplication)
> --
>
> Key: LUCENE-10370
> URL: https://issues.apache.org/jira/browse/LUCENE-10370
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> TestNRTReplication fails because it assumes classpath can just be copied to a 
> sub-process - this is no longer the case.
> PR at:
> https://github.com/apache/lucene/pull/909



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10370) Fix classpath/module path of tests forking their own Java (TestNRTReplication)

2022-05-20 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10370:
-
Fix Version/s: 9.3

> Fix classpath/module path of tests forking their own Java (TestNRTReplication)
> --
>
> Key: LUCENE-10370
> URL: https://issues.apache.org/jira/browse/LUCENE-10370
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> TestNRTReplication fails because it assumes classpath can just be copied to a 
> sub-process - this is no longer the case.
> PR at:
> https://github.com/apache/lucene/pull/909



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10370) Fix classpath/module path of tests forking their own Java (TestNRTReplication)

2022-05-20 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10370:
-
Description: 
TestNRTReplication fails because it assumes classpath can just be copied to a 
sub-process - this is no longer the case.

PR at:
https://github.com/apache/lucene/pull/909

  was:TestNRTReplication fails because it assumes classpath can just be copied 
to a sub-process - this is no longer the case.


> Fix classpath/module path of tests forking their own Java (TestNRTReplication)
> --
>
> Key: LUCENE-10370
> URL: https://issues.apache.org/jira/browse/LUCENE-10370
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestNRTReplication fails because it assumes classpath can just be copied to a 
> sub-process - this is no longer the case.
> PR at:
> https://github.com/apache/lucene/pull/909



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-10370) Fix classpath/module path of tests forking their own Java (TestNRTReplication)

2022-05-20 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned LUCENE-10370:


Assignee: Dawid Weiss

> Fix classpath/module path of tests forking their own Java (TestNRTReplication)
> --
>
> Key: LUCENE-10370
> URL: https://issues.apache.org/jira/browse/LUCENE-10370
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>
> TestNRTReplication fails because it assumes classpath can just be copied to a 
> sub-process - this is no longer the case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9634) Highlighting of degenerate spans on fields with offsets doesn't work properly

2022-05-19 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9634.
-
Resolution: Fixed

> Highlighting of degenerate spans on fields *with offsets* doesn't work 
> properly
> ---
>
> Key: LUCENE-9634
> URL: https://issues.apache.org/jira/browse/LUCENE-9634
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Match highlighter works fine with degenerate interval positions when 
> {{OffsetsFromPositions}} strategy is used to compute offsets but will show 
> incorrect offset ranges if offsets are read from directly from the 
> {{MatchIterator}} ({{OffsetsFromMatchIterator}}).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10574) Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't do this

2022-05-18 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538835#comment-17538835
 ] 

Dawid Weiss commented on LUCENE-10574:
--

I like [~jpountz]'s solution... even if it's not perfect!

Merge strategies would indeed benefit from some algorithmic love - the problem 
in my experience is that no single strategy fits all types of loads. In reality 
the merge strategy, the merge scheduler and the balance between searches and 
indexing all play a key role and finding the best performing solution is a 
combination of all these factors. 

> Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't 
> do this
> ---
>
> Key: LUCENE-10574
> URL: https://issues.apache.org/jira/browse/LUCENE-10574
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Remove {{floorSegmentBytes}} parameter, or change lucene's default to a merge 
> policy that doesn't merge in an O(n^2) way.
> I have the feeling it might have to be the latter, as folks seem really wed 
> to this crazy O(n^2) behavior.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?

2022-05-16 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537410#comment-17537410
 ] 

Dawid Weiss commented on LUCENE-10572:
--

> Nevertheless, the main limiting factor of the BytesRefHash is the equals 
> (although vectorized) because it always needs to be verified

Right. This strikes the nostalgic note of the strlen performance between pascal 
and C, doesn't it?... :) This is such a hot code section that indeed storing 
the length along the string itself may be worth it. I still use that "offset 
difference" strategy in non-Lucene code where it performs quite well but it's 
really a matter of trying and I bet the results will vary depending on the 
context (terms, caches, etc.).

> we can lookup offset of next entry - offset of entry to be looked up. The 
> only special case is the very last item.

This can be solved elegantly and efficiently - the offsets array stores the 
end+1 of each element, with the initial 0-offset index initially set to zero. 
So, the length of entry i is a constant expression (offsets[i + 1] - 
offsets[i]) and this invariant is maintained upon additions of new elements 
like so:

bytePool.add(ref.bytes, ref.offset, ref.length);
offsets.add(bytePool.size());

This invariant makes all the remaining functions simpler too, for example 
element-comparing method is something like this (code copy-pasted from ours, 
but you'll get the gist):
{code}
 public int compare(int elementA, int elementB) {
assert elementA >= 0 && elementA < size() && elementB >= 0 && elementB < 
size();

int off1 = offsets.get(elementA);
int len1 = offsets.get(elementA + 1) - off1;

int off2 = offsets.get(elementB);
int len2 = offsets.get(elementB + 1) - off2;

return Bytes.compare(blocks.buffer, off1, len1, blocks.buffer, off2, len2);
  }
{code}

The caveat here is that the offsets array is an int[] so the storage size 
required for the hashes is slightly higher. Overall this was never a problem in 
practice though. 

> Can we optimize BytesRefHash?
> -
>
> Key: LUCENE-10572
> URL: https://issues.apache.org/jira/browse/LUCENE-10572
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I was poking around in our nightly benchmarks 
> ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR 
> profiling that the hottest method is this:
> {noformat}
> PERCENT   CPU SAMPLES   STACK
> 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals()
>   at 
> org.apache.lucene.util.BytesRefHash#findHash()
>   at org.apache.lucene.util.BytesRefHash#add()
>   at 
> org.apache.lucene.index.TermsHashPerField#add()
>   at 
> org.apache.lucene.index.IndexingChain$PerField#invert()
>   at 
> org.apache.lucene.index.IndexingChain#processField()
>   at 
> org.apache.lucene.index.IndexingChain#processDocument()
>   at 
> org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat}
> This is kinda crazy – comparing if the term to be inserted into the inverted 
> index hash equals the term already added to {{BytesRefHash}} is the hottest 
> method during nightly benchmarks.
> Discussing offline with [~rcmuir] and [~jpountz] they noticed a few 
> questionable things about our current implementation:
>  * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the 
> inserted term into the hash?  Let's just use two bytes always, since IW 
> limits term length to 32 K (< 64K that an unsigned short can cover)
>  * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} 
> (BitUtil.VH_BE_SHORT.get)
>  * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not 
> aggressive enough?  Or the initial sizing of the hash is too small?
>  * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too 
> many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible 
> "upgrades"?
>  * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version 
> ({{{}murmurhash3_x86_32{}}})?
>  * Are we using the JVM's intrinsics to compare multiple bytes in a single 
> SIMD instruction ([~rcmuir] is quite sure we are indeed)?
>  * [~jpountz] suggested maybe the hash insert is simply memory bound
>  * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total 
> CPU cost)
> I pulled these observations from a recent (5/6/22) profiler output: 
> [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html]
> Maybe we can improve our performance on this craz

[jira] [Resolved] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-05-16 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10541.
--
Fix Version/s: 9.2
   Resolution: Fixed

> What to do about massive terms in our Wikipedia EN LineFileDocs?
> 
>
> Key: LUCENE-10541
> URL: https://issues.apache.org/jira/browse/LUCENE-10541
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Spinoff from this fun build failure that [~dweiss] root caused: 
> [https://lucene.markmail.org/thread/pculfuazll4oebra]
> Thank you and sorry [~dweiss]!!
> This test failure happened because the test case randomly indexed a chunk of 
> the nightly (many GBs) LineFileDocs Wikipedia file that had a massive (> IW's 
> ~32 KB limit) term, and IW threw an {{IllegalArgumentException}} failing the 
> test.
> It's crazy that it took so long for Lucene's randomized tests to discover 
> this too-massive term in Lucene's nightly benchmarks.  It's like searching 
> for Nessie, or 
> [SETI|https://en.wikipedia.org/wiki/Search_for_extraterrestrial_intelligence].
> We need to prevent such false failures, somehow, and there are multiple 
> options: fix this test to not use {{{}LineFileDocs{}}}, remove all "massive" 
> terms from all tests (nightly and git) {{{}LineFileDocs{}}}, fix 
> {{MockTokenizer}} to trim such ridiculous terms (I think this is the best 
> option?), ...



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?

2022-05-15 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537354#comment-17537354
 ] 

Dawid Weiss commented on LUCENE-10572:
--

This is the issue I filed it under, actually - note it's old, old... but the 
ideas may be worth revisiting.
https://issues.apache.org/jira/browse/LUCENE-5854

> Can we optimize BytesRefHash?
> -
>
> Key: LUCENE-10572
> URL: https://issues.apache.org/jira/browse/LUCENE-10572
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I was poking around in our nightly benchmarks 
> ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR 
> profiling that the hottest method is this:
> {noformat}
> PERCENT   CPU SAMPLES   STACK
> 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals()
>   at 
> org.apache.lucene.util.BytesRefHash#findHash()
>   at org.apache.lucene.util.BytesRefHash#add()
>   at 
> org.apache.lucene.index.TermsHashPerField#add()
>   at 
> org.apache.lucene.index.IndexingChain$PerField#invert()
>   at 
> org.apache.lucene.index.IndexingChain#processField()
>   at 
> org.apache.lucene.index.IndexingChain#processDocument()
>   at 
> org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat}
> This is kinda crazy – comparing if the term to be inserted into the inverted 
> index hash equals the term already added to {{BytesRefHash}} is the hottest 
> method during nightly benchmarks.
> Discussing offline with [~rcmuir] and [~jpountz] they noticed a few 
> questionable things about our current implementation:
>  * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the 
> inserted term into the hash?  Let's just use two bytes always, since IW 
> limits term length to 32 K (< 64K that an unsigned short can cover)
>  * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} 
> (BitUtil.VH_BE_SHORT.get)
>  * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not 
> aggressive enough?  Or the initial sizing of the hash is too small?
>  * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too 
> many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible 
> "upgrades"?
>  * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version 
> ({{{}murmurhash3_x86_32{}}})?
>  * Are we using the JVM's intrinsics to compare multiple bytes in a single 
> SIMD instruction ([~rcmuir] is quite sure we are indeed)?
>  * [~jpountz] suggested maybe the hash insert is simply memory bound
>  * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total 
> CPU cost)
> I pulled these observations from a recent (5/6/22) profiler output: 
> [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html]
> Maybe we can improve our performance on this crazy hotspot?
> Or maybe this is a "healthy" hotspot and we should leave it be!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?

2022-05-15 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537353#comment-17537353
 ] 

Dawid Weiss commented on LUCENE-10572:
--

As much as I love BE (long live M68k), I think it's practically dead, so I 
think LE is a fine choice. 

> Ever tried to type a single word that is 128 chars long?

One thing I'd be afraid of is that users index all sorts of non-language tokens 
and these can grow longer than the default of 128 chars. I have implemented a 
similar byte-fragment storage class in the past without using explicit length 
fragments at all - the difference in consecutive element offsets was used to 
compute the length. This does have potential drawbacks but it was fast in 
practice. I can dig it out from the closet if you like.

> Can we optimize BytesRefHash?
> -
>
> Key: LUCENE-10572
> URL: https://issues.apache.org/jira/browse/LUCENE-10572
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I was poking around in our nightly benchmarks 
> ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR 
> profiling that the hottest method is this:
> {noformat}
> PERCENT   CPU SAMPLES   STACK
> 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals()
>   at 
> org.apache.lucene.util.BytesRefHash#findHash()
>   at org.apache.lucene.util.BytesRefHash#add()
>   at 
> org.apache.lucene.index.TermsHashPerField#add()
>   at 
> org.apache.lucene.index.IndexingChain$PerField#invert()
>   at 
> org.apache.lucene.index.IndexingChain#processField()
>   at 
> org.apache.lucene.index.IndexingChain#processDocument()
>   at 
> org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat}
> This is kinda crazy – comparing if the term to be inserted into the inverted 
> index hash equals the term already added to {{BytesRefHash}} is the hottest 
> method during nightly benchmarks.
> Discussing offline with [~rcmuir] and [~jpountz] they noticed a few 
> questionable things about our current implementation:
>  * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the 
> inserted term into the hash?  Let's just use two bytes always, since IW 
> limits term length to 32 K (< 64K that an unsigned short can cover)
>  * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} 
> (BitUtil.VH_BE_SHORT.get)
>  * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not 
> aggressive enough?  Or the initial sizing of the hash is too small?
>  * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too 
> many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible 
> "upgrades"?
>  * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version 
> ({{{}murmurhash3_x86_32{}}})?
>  * Are we using the JVM's intrinsics to compare multiple bytes in a single 
> SIMD instruction ([~rcmuir] is quite sure we are indeed)?
>  * [~jpountz] suggested maybe the hash insert is simply memory bound
>  * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total 
> CPU cost)
> I pulled these observations from a recent (5/6/22) profiler output: 
> [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html]
> Maybe we can improve our performance on this crazy hotspot?
> Or maybe this is a "healthy" hotspot and we should leave it be!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10539) Return a stream of completions from FSTCompletion

2022-04-29 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10539.
--
Resolution: Fixed

> Return a stream of completions from FSTCompletion
> -
>
> Key: LUCENE-10539
> URL: https://issues.apache.org/jira/browse/LUCENE-10539
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> FSTLookup currently has a "num" parameter which limits the number of 
> completions from the underlying automaton. But this has severe disadvantages 
> if you need to collect completions that need to fulfill a secondary condition 
> (for example, collect only verbs or terms that contain a certain infix). Then 
> you can't determine the 'num' parameter easily because the number of filtered 
> completions is unknown.
> I also think implementation-wise it's also much nicer to provide a stream 
> that iterates over completions rather than a fixed-size list. This allows for 
> much more elegant code (stream.filter, stream.limit).
> The provided patch adds a single {{Stream lookup(key)}} method 
> and modifies the existing lookup methods to use it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10539) Return a stream of completions from FSTCompletion

2022-04-29 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530211#comment-17530211
 ] 

Dawid Weiss commented on LUCENE-10539:
--

I applied to branch_9x and main.

> Return a stream of completions from FSTCompletion
> -
>
> Key: LUCENE-10539
> URL: https://issues.apache.org/jira/browse/LUCENE-10539
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> FSTLookup currently has a "num" parameter which limits the number of 
> completions from the underlying automaton. But this has severe disadvantages 
> if you need to collect completions that need to fulfill a secondary condition 
> (for example, collect only verbs or terms that contain a certain infix). Then 
> you can't determine the 'num' parameter easily because the number of filtered 
> completions is unknown.
> I also think implementation-wise it's also much nicer to provide a stream 
> that iterates over completions rather than a fixed-size list. This allows for 
> much more elegant code (stream.filter, stream.limit).
> The provided patch adds a single {{Stream lookup(key)}} method 
> and modifies the existing lookup methods to use it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10548) Weird errors launching gradlew (Caused by: groovy.lang.MissingMethodException: No signature of method: java.lang.Object.clone() is applicable for argument types: () v

2022-04-29 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530204#comment-17530204
 ] 

Dawid Weiss commented on LUCENE-10548:
--

A PR is here (and on the source branch in my repo).
https://github.com/apache/lucene/pull/857

> Weird errors launching gradlew (Caused by: 
> groovy.lang.MissingMethodException: No signature of method: 
> java.lang.Object.clone() is applicable for argument types: () values: [])
> 
>
> Key: LUCENE-10548
> URL: https://issues.apache.org/jira/browse/LUCENE-10548
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://bugs.openjdk.java.net/browse/JDK-8285835
> I can't reproduce it anywhere, with the same JDK Tobias is using. Seems like 
> clone() is the cause - let's see if we can just get rid of that code and if 
> it helps.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10549) Upgrade to gradle 7.3.3

2022-04-29 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10549:
-
Fix Version/s: 10.0 (main)

> Upgrade to gradle 7.3.3
> ---
>
> Key: LUCENE-10549
> URL: https://issues.apache.org/jira/browse/LUCENE-10549
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Priority: Major
> Fix For: 10.0 (main), 9.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are newer gradle versions but this is a low-hanging fruit that has 
> official support for Java 17.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10549) Upgrade to gradle 7.3.3

2022-04-29 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10549:
-
Fix Version/s: 9.1.1

> Upgrade to gradle 7.3.3
> ---
>
> Key: LUCENE-10549
> URL: https://issues.apache.org/jira/browse/LUCENE-10549
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Priority: Major
> Fix For: 10.0 (main), 9.2, 9.1.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are newer gradle versions but this is a low-hanging fruit that has 
> official support for Java 17.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10549) Upgrade to gradle 7.3.3

2022-04-29 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10549.
--
Fix Version/s: 9.2
   Resolution: Fixed

> Upgrade to gradle 7.3.3
> ---
>
> Key: LUCENE-10549
> URL: https://issues.apache.org/jira/browse/LUCENE-10549
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are newer gradle versions but this is a low-hanging fruit that has 
> official support for Java 17.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10549) Upgrade to gradle 7.3.3

2022-04-29 Thread Dawid Weiss (Jira)

Dawid Weiss created LUCENE-10549:


 Summary: Upgrade to gradle 7.3.3
 Key: LUCENE-10549
 URL: https://issues.apache.org/jira/browse/LUCENE-10549
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Dawid Weiss


There are newer gradle versions but this is a low-hanging fruit that has 
official support for Java 17.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10548) Weird errors launching gradlew (Caused by: groovy.lang.MissingMethodException: No signature of method: java.lang.Object.clone() is applicable for argument types: () val

2022-04-29 Thread Dawid Weiss (Jira)

Dawid Weiss created LUCENE-10548:


 Summary: Weird errors launching gradlew (Caused by: 
groovy.lang.MissingMethodException: No signature of method: 
java.lang.Object.clone() is applicable for argument types: () values: [])
 Key: LUCENE-10548
 URL: https://issues.apache.org/jira/browse/LUCENE-10548
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Dawid Weiss


https://bugs.openjdk.java.net/browse/JDK-8285835

I can't reproduce it anywhere, with the same JDK Tobias is using. Seems like 
clone() is the cause - let's see if we can just get rid of that code and if it 
helps.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-29 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529845#comment-17529845
 ] 

Dawid Weiss commented on LUCENE-10541:
--

I've applied the PR - we can close this issue (for now)?

> What to do about massive terms in our Wikipedia EN LineFileDocs?
> 
>
> Key: LUCENE-10541
> URL: https://issues.apache.org/jira/browse/LUCENE-10541
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Spinoff from this fun build failure that [~dweiss] root caused: 
> [https://lucene.markmail.org/thread/pculfuazll4oebra]
> Thank you and sorry [~dweiss]!!
> This test failure happened because the test case randomly indexed a chunk of 
> the nightly (many GBs) LineFileDocs Wikipedia file that had a massive (> IW's 
> ~32 KB limit) term, and IW threw an {{IllegalArgumentException}} failing the 
> test.
> It's crazy that it took so long for Lucene's randomized tests to discover 
> this too-massive term in Lucene's nightly benchmarks.  It's like searching 
> for Nessie, or 
> [SETI|https://en.wikipedia.org/wiki/Search_for_extraterrestrial_intelligence].
> We need to prevent such false failures, somehow, and there are multiple 
> options: fix this test to not use {{{}LineFileDocs{}}}, remove all "massive" 
> terms from all tests (nightly and git) {{{}LineFileDocs{}}}, fix 
> {{MockTokenizer}} to trim such ridiculous terms (I think this is the best 
> option?), ...



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()

2022-04-29 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529831#comment-17529831
 ] 

Dawid Weiss commented on LUCENE-10292:
--

> All I was really trying to do with these tests was demonstrate that data you 
> get out of the Lookup before you call build(), can still be gotten from the 
> Lookup while build() is incrementally consuming an iterator (which may take a 
> long time if you are building up from a long iterator) and that this behavior 
> is consistent across Lookup impls (as opposed to before i filed this issue, 
> when most Lookups worked that way, but AnalyzingInfixSuggester would throw an 
> ugly exception – which was certainly confusing to users who might switch from 
> one impl to another).

I guess I am not comfortable with the fact that this test works only by a lucky 
coincidence and tests the behavior that isn't guaranteed or documented by the 
Lookup class - this got me confused and I guess it'll confuse people looking at 
this code after me. It's not a personal stab at you, it's just something that 
smells fishy around this code in general.

When I was looking at the failure and tried to debug the test, I didn't see the 
reason why this test was necessary (I looked at the Lookup class 
documentation). When I understood what the test did, I looked at the 
implementations and they seemed to be designed with a single-thread model in 
mind (external synchronization between lookups and rebuilds).

For example, even now, if you had a tight loop in one thread calling lookup on 
an FSTCompletionLookup and this loop got compiled, then there's nothing 
preventing the compiler from reading higherWeightsCompletion and 
normalCompletion fields once and never again (they're regular fields in 
FSTCompletionLookup), even if you call build there multiple times in between... 
Is this likely to happen? I don't know. Is this possible? Sure. Maybe I'm 
oversensitive because I grew up on machines with much less strict cache 
coherency protocols but code like this makes me itchy.

> I didn't set out to make any hard & fast guarantee about the thread safety of 
> all lookups – just improve the one that awas obviously inconsistent with the 
> others (progress, not perfection)

That's my point. Either we should make the Lookup interface explicitly state 
that it's safe to call the build method from another thread or we shouldn't 
really guarantee (or test) this behavior. I don't want you to revert the 
changes you made but my gut feeling is that lookup implementations should be 
designed to be single-threaded or at least immutable (one publisher-multiple 
readers model) as it makes implementing them much easier - no volatiles, 
synchronization blocks, etc. 

Concurrency concerns should be handled by the code that uses Lookups - this 
code should know whether synchronization or two concurrent instances are 
required (one doing the lookups, potentially via multiple threads, one 
rebuilding). Perhaps a change in the API is needed to separate those two phases 
(build-use) and then the downstream code has to take care of handling/ swapping 
out Lookup reference where they're used - I don't know, I just state what I 
think.

> AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
> 
>
> Key: LUCENE-10292
> URL: https://issues.apache.org/jira/browse/LUCENE-10292
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: 10.0 (main), 9.2
>
> Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, 
> LUCENE-10292-3.patch, LUCENE-10292.patch
>
>
> I'm filing this based on anecdotal information from a Solr user w/o 
> experiencing it first hand (and I don't have a test case to demonstrate it) 
> but based on a reading of the code the underlying problem seems self 
> evident...
> With all other Lookup implementations I've examined, it is possible to call 
> {{lookup()}} regardless of whether another thread is concurrently calling 
> {{build()}} – in all cases I've seen, it is even possible to call 
> {{lookup()}} even if {{build()}} has never been called: the result is just an 
> "empty" {{List}} 
> Typically this is works because the {{build()}} method uses temporary 
> datastructures until it's "build logic" is complete, at which point it 
> atomically replaces the datastructures used by the {{lookup()}} method.   In 
> the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method 
> starts by closing & null'ing out the {{protected SearcherManager 
> searcherMgr}} (which it only populates again once it's completed building up 
> it's index) and then the lookup method starts with...
> {code:java}
> if (searcherMgr

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-28 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529261#comment-17529261
 ] 

Dawid Weiss commented on LUCENE-10541:
--

Filed a PR at https://github.com/apache/lucene/pull/850. Picked the default 
from CharTokenizer.DEFAULT_MAX_WORD_LEN, although can't reference that directly 
(not accessible from the test framework). Had to tweak the defaults in one or 
two failing tests that expected the tokenizer to return longer tokens, so a 
second set of eyes would be good.

enwiki lines contains 2 million lines. It'd be nice to calculate the 
probability of any of the k faulty (long-term) lines being drawn in n tries and 
distribute it over time - this would address Mike's question about why it took 
so long to discover them. :)

> What to do about massive terms in our Wikipedia EN LineFileDocs?
> 
>
> Key: LUCENE-10541
> URL: https://issues.apache.org/jira/browse/LUCENE-10541
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>
> Spinoff from this fun build failure that [~dweiss] root caused: 
> [https://lucene.markmail.org/thread/pculfuazll4oebra]
> Thank you and sorry [~dweiss]!!
> This test failure happened because the test case randomly indexed a chunk of 
> the nightly (many GBs) LineFileDocs Wikipedia file that had a massive (> IW's 
> ~32 KB limit) term, and IW threw an {{IllegalArgumentException}} failing the 
> test.
> It's crazy that it took so long for Lucene's randomized tests to discover 
> this too-massive term in Lucene's nightly benchmarks.  It's like searching 
> for Nessie, or 
> [SETI|https://en.wikipedia.org/wiki/Search_for_extraterrestrial_intelligence].
> We need to prevent such false failures, somehow, and there are multiple 
> options: fix this test to not use {{{}LineFileDocs{}}}, remove all "massive" 
> terms from all tests (nightly and git) {{{}LineFileDocs{}}}, fix 
> {{MockTokenizer}} to trim such ridiculous terms (I think this is the best 
> option?), ...



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10543) Achieve contribution workflow perfection (with progress)

2022-04-28 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529255#comment-17529255
 ] 

Dawid Weiss commented on LUCENE-10543:
--

("with progress"... yeah, that's why LUCENE-9871 is still open :) )

> Achieve contribution workflow perfection (with progress)
> 
>
> Key: LUCENE-10543
> URL: https://issues.apache.org/jira/browse/LUCENE-10543
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>
> Inspired by Dawid's build issue which has worked out for us: LUCENE-9871
> He hasn't even linked 10% of the issues/subtasks involved in that work 
> either, but we know.
> I think we need a similar approach for the contribution workflow. There has 
> been some major improvements recently, a couple that come to mind:
> * Tomoko made a CONTRIBUTING.md file which github recognizes and is way 
> better than the wiki stuff
> * Some hazards/error messages/mazes in the build process and so on have 
> gotten fixed.
> But there is more to do in my opinion, here is 3 ideas:
> * Creating a PR still has a massive checklist template. But now this template 
> links to CONTRIBUTING.md, so why include the other stuff/checklist? Isn't it 
> enough to just link to CONTRIBUTING.md and fix that as needed?
> * Creating a PR still requires signing up for Apache JIRA and creating a JIRA 
> issue. There is zero value to this additional process. We often end out with 
> either JIRAs and/or PRs that have zero content, or maybe conflicting/outdated 
> content. This is just an unnecessary dance, can we use github issues instead?
> * Haven't dug into the github actions or configs very deeply. Maybe there's 
> simple stuff we can do such as give useful notifications if checks fail. Try 
> to guide the user to run ./gradlew check and fix it. It sucks to have to 
> review, look at logs, and manually add comments to do this stuff.
> So let's have an issue to improve this area.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()

2022-04-27 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529223#comment-17529223
 ] 

Dawid Weiss commented on LUCENE-10292:
--

Thanks Chris. I'm still not sure whether these tests make sense without 
explicitly stating that build() can be called on Lookup to dynamically (and 
concurrently) replace its internals... For example, FSTCompletionLookup:
{code}
  // The two FSTCompletions share the same automaton.
  this.higherWeightsCompletion = builder.build();
  this.normalCompletion =
  new FSTCompletion(higherWeightsCompletion.getFST(), false, 
exactMatchFirst);
  this.count = newCount;
{code}

none of these fields are volatile or under a monitor, so no guaranteed flush 
occurs anywhere. I understand eventually they'll get consistent by piggybacking 
on some other synchronization/ memfence but it's weird to rely on this 
behavior. I think it'd be a much more user-friendly API if Lookup was actually 
detached entirely from its build process (for example by replacing the current 
build method with a builder() that would return a new immutable Lookup 
instance). This would be less confusing and would also allow for a cleaner 
implementation (no synchronization at all required - just regular assignments, 
maybe even with final fields).

I'm not saying this should be implemented here - perhaps it's worth a new issue 
to do this refactoring.

Separately from the above, if the test fails, it'll leak threads - this:

+  acquireOnNext.acquireUninterruptibly();

literally blocks forever. It should be replaced with a try/catch that rethrows 
an unchecked exception when the iterator thread is interrupted.

> AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
> 
>
> Key: LUCENE-10292
> URL: https://issues.apache.org/jira/browse/LUCENE-10292
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: 10.0 (main), 9.2
>
> Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, 
> LUCENE-10292-3.patch, LUCENE-10292.patch
>
>
> I'm filing this based on anecdotal information from a Solr user w/o 
> experiencing it first hand (and I don't have a test case to demonstrate it) 
> but based on a reading of the code the underlying problem seems self 
> evident...
> With all other Lookup implementations I've examined, it is possible to call 
> {{lookup()}} regardless of whether another thread is concurrently calling 
> {{build()}} – in all cases I've seen, it is even possible to call 
> {{lookup()}} even if {{build()}} has never been called: the result is just an 
> "empty" {{List}} 
> Typically this is works because the {{build()}} method uses temporary 
> datastructures until it's "build logic" is complete, at which point it 
> atomically replaces the datastructures used by the {{lookup()}} method.   In 
> the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method 
> starts by closing & null'ing out the {{protected SearcherManager 
> searcherMgr}} (which it only populates again once it's completed building up 
> it's index) and then the lookup method starts with...
> {code:java}
> if (searcherMgr == null) {
>   throw new IllegalStateException("suggester was not built");
> }
> {code}
> ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any 
> situation where another thread may be calling 
> {{AnalyzingInfixSuggester.build()}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10531) Mark testLukeCanBeLaunched @Nightly test and make a dedicated Github CI workflow for it

2022-04-27 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529218#comment-17529218
 ] 

Dawid Weiss commented on LUCENE-10531:
--

Fine with me.

> Mark testLukeCanBeLaunched @Nightly test and make a dedicated Github CI 
> workflow for it
> ---
>
> Key: LUCENE-10531
> URL: https://issues.apache.org/jira/browse/LUCENE-10531
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/test
>Reporter: Tomoko Uchida
>Priority: Minor
>
> We are going to allow running the test on Xvfb (a virtual display that speaks 
> X protocol) in [LUCENE-10528], this tweak is available only on Linux.
> I'm just guessing but it could confuse or bother also Mac and Windows users 
> (we can't know what window manager developers are using); it may be better to 
> make it opt-in by marking it as slow tests. 
> Instead, I think we can enable a dedicated Github actions workflow for the 
> distribution test that is triggered only when the related files are changed. 
> Besides Linux, we could run it both on Mac and Windows which most users run 
> the app on - it'd be slow, but if we limit the scope of the test I suppose it 
> works functionally just fine (I'm running actions workflows on mac and 
> windows elsewhere).
> To make it "slow test", we could add the same {{@Slow}} annotation as the 
> {{test-framework}} to the distribution tests, for consistency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()

2022-04-27 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529018#comment-17529018
 ] 

Dawid Weiss commented on LUCENE-10292:
--

I don't see any evidence in implementations of Lookup that build() can be 
called in a thread-safe manner.

Those testLookupsDuringReBuild are only working by a lucky chance (and rarely 
still fail!). The code typically releases semaphore permissions quickly here:
{code}
// at every stage of the slow rebuild, we should still be able to get our 
original suggestions
for (int i = 0; i < data.size(); i++) {
  initialChecks.check(suggester);
  rebuildGate.release();
}
{code}
while the build() method is not even invoked yet because this line:
{code}
suggester.build(
new InputArrayIterator(new DelayedIterator<>(suggester, 
rebuildGate, data.iterator(;
{code}
is semaphore-blocked in the constructor parameters (InputArrayIterator). So the 
result is that for suggester.build() is typically entered a long time after the 
check look has finished. It is enough to modify the code to:
{code}
// at every stage of the slow rebuild, we should still be able to get our 
original suggestions
for (int i = 0; i < data.size(); i++) {
  rebuildGate.release();
  Thread.sleep(100);
  initialChecks.check(suggester);
}
{code}

to cause repeatable failures (this isn't a suggested fix but a demonstration 
that the code is currently broken).

> AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
> 
>
> Key: LUCENE-10292
> URL: https://issues.apache.org/jira/browse/LUCENE-10292
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: 10.0 (main), 9.2
>
> Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, 
> LUCENE-10292-3.patch, LUCENE-10292.patch
>
>
> I'm filing this based on anecdotal information from a Solr user w/o 
> experiencing it first hand (and I don't have a test case to demonstrate it) 
> but based on a reading of the code the underlying problem seems self 
> evident...
> With all other Lookup implementations I've examined, it is possible to call 
> {{lookup()}} regardless of whether another thread is concurrently calling 
> {{build()}} – in all cases I've seen, it is even possible to call 
> {{lookup()}} even if {{build()}} has never been called: the result is just an 
> "empty" {{List}} 
> Typically this is works because the {{build()}} method uses temporary 
> datastructures until it's "build logic" is complete, at which point it 
> atomically replaces the datastructures used by the {{lookup()}} method.   In 
> the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method 
> starts by closing & null'ing out the {{protected SearcherManager 
> searcherMgr}} (which it only populates again once it's completed building up 
> it's index) and then the lookup method starts with...
> {code:java}
> if (searcherMgr == null) {
>   throw new IllegalStateException("suggester was not built");
> }
> {code}
> ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any 
> situation where another thread may be calling 
> {{AnalyzingInfixSuggester.build()}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()

2022-04-27 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528998#comment-17528998
 ] 

Dawid Weiss commented on LUCENE-10292:
--

[~hossman] - don't know if you saw the recent discussion on the mailing list - 
how did you arrive at the conclusion that Lookup.build can be called 
concurrently? I don't think this is mentioned anywhere in Lookup documentation 
and I don't think the implementation is thread-safe (at least not the 
TestFreeTextSuggester)?

> AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
> 
>
> Key: LUCENE-10292
> URL: https://issues.apache.org/jira/browse/LUCENE-10292
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: 10.0 (main), 9.2
>
> Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, 
> LUCENE-10292-3.patch, LUCENE-10292.patch
>
>
> I'm filing this based on anecdotal information from a Solr user w/o 
> experiencing it first hand (and I don't have a test case to demonstrate it) 
> but based on a reading of the code the underlying problem seems self 
> evident...
> With all other Lookup implementations I've examined, it is possible to call 
> {{lookup()}} regardless of whether another thread is concurrently calling 
> {{build()}} – in all cases I've seen, it is even possible to call 
> {{lookup()}} even if {{build()}} has never been called: the result is just an 
> "empty" {{List}} 
> Typically this is works because the {{build()}} method uses temporary 
> datastructures until it's "build logic" is complete, at which point it 
> atomically replaces the datastructures used by the {{lookup()}} method.   In 
> the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method 
> starts by closing & null'ing out the {{protected SearcherManager 
> searcherMgr}} (which it only populates again once it's completed building up 
> it's index) and then the lookup method starts with...
> {code:java}
> if (searcherMgr == null) {
>   throw new IllegalStateException("suggester was not built");
> }
> {code}
> ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any 
> situation where another thread may be calling 
> {{AnalyzingInfixSuggester.build()}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-27 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528991#comment-17528991
 ] 

Dawid Weiss commented on LUCENE-10541:
--

I agree - we should fix mock analyzer to not return such long terms. 

> What to do about massive terms in our Wikipedia EN LineFileDocs?
> 
>
> Key: LUCENE-10541
> URL: https://issues.apache.org/jira/browse/LUCENE-10541
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>
> Spinoff from this fun build failure that [~dweiss] root caused: 
> [https://lucene.markmail.org/thread/pculfuazll4oebra]
> Thank you and sorry [~dweiss]!!
> This test failure happened because the test case randomly indexed a chunk of 
> the nightly (many GBs) LineFileDocs Wikipedia file that had a massive (> IW's 
> ~32 KB limit) term, and IW threw an {{IllegalArgumentException}} failing the 
> test.
> It's crazy that it took so long for Lucene's randomized tests to discover 
> this too-massive term in Lucene's nightly benchmarks.  It's like searching 
> for Nessie, or 
> [SETI|https://en.wikipedia.org/wiki/Search_for_extraterrestrial_intelligence].
> We need to prevent such false failures, somehow, and there are multiple 
> options: fix this test to not use {{{}LineFileDocs{}}}, remove all "massive" 
> terms from all tests (nightly and git) {{{}LineFileDocs{}}}, fix 
> {{MockTokenizer}} to trim such ridiculous terms (I think this is the best 
> option?), ...



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10540) Remove alphabetically ordered completions from FSTCompletion

2022-04-27 Thread Dawid Weiss (Jira)

Dawid Weiss created LUCENE-10540:


 Summary: Remove alphabetically ordered completions from 
FSTCompletion
 Key: LUCENE-10540
 URL: https://issues.apache.org/jira/browse/LUCENE-10540
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Dawid Weiss


The code cheats internally by sorting completions that are always 
weight-ordered. If this is needed, it should be done up the call stack, not in 
FSTCompletion - this provides an illusion of something that doesn't exist and 
is potentially quite expensive to compute.

{code}
if (!higherWeightsFirst && rootArcs.length > 1) {
  // We could emit a warning here (?). An optimal strategy for
  // alphabetically sorted
  // suggestions would be to add them with a constant weight -- this saves
  // unnecessary
  // traversals and sorting.
  return lookup(key).sorted().limit(num).collect(Collectors.toList());
} else {
  return lookup(key).limit(num).collect(Collectors.toList());
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10539) Return a stream of completions from FSTCompletion

2022-04-27 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528635#comment-17528635
 ] 

Dawid Weiss commented on LUCENE-10539:
--

PR is at: https://github.com/apache/lucene/pull/844

> Return a stream of completions from FSTCompletion
> -
>
> Key: LUCENE-10539
> URL: https://issues.apache.org/jira/browse/LUCENE-10539
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>
> FSTLookup currently has a "num" parameter which limits the number of 
> completions from the underlying automaton. But this has severe disadvantages 
> if you need to collect completions that need to fulfill a secondary condition 
> (for example, collect only verbs or terms that contain a certain infix). Then 
> you can't determine the 'num' parameter easily because the number of filtered 
> completions is unknown.
> I also think implementation-wise it's also much nicer to provide a stream 
> that iterates over completions rather than a fixed-size list. This allows for 
> much more elegant code (stream.filter, stream.limit).
> The provided patch adds a single {{Stream lookup(key)}} method 
> and modifies the existing lookup methods to use it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10539) Return a stream of completions from FSTCompletion

2022-04-27 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10539:
-
Fix Version/s: 9.2

> Return a stream of completions from FSTCompletion
> -
>
> Key: LUCENE-10539
> URL: https://issues.apache.org/jira/browse/LUCENE-10539
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.2
>
>
> FSTLookup currently has a "num" parameter which limits the number of 
> completions from the underlying automaton. But this has severe disadvantages 
> if you need to collect completions that need to fulfill a secondary condition 
> (for example, collect only verbs or terms that contain a certain infix). Then 
> you can't determine the 'num' parameter easily because the number of filtered 
> completions is unknown.
> I also think implementation-wise it's also much nicer to provide a stream 
> that iterates over completions rather than a fixed-size list. This allows for 
> much more elegant code (stream.filter, stream.limit).
> The provided patch adds a single {{Stream lookup(key)}} method 
> and modifies the existing lookup methods to use it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10539) Return a stream of completions from FSTCompletion

2022-04-27 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10539:
-
Summary: Return a stream of completions from FSTCompletion  (was: return a 
stream of completions from FSTCompletion)

> Return a stream of completions from FSTCompletion
> -
>
> Key: LUCENE-10539
> URL: https://issues.apache.org/jira/browse/LUCENE-10539
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>
> FSTLookup currently has a "num" parameter which limits the number of 
> completions from the underlying automaton. But this has severe disadvantages 
> if you need to collect completions that need to fulfill a secondary condition 
> (for example, collect only verbs or terms that contain a certain infix). Then 
> you can't determine the 'num' parameter easily because the number of filtered 
> completions is unknown.
> I also think implementation-wise it's also much nicer to provide a stream 
> that iterates over completions rather than a fixed-size list. This allows for 
> much more elegant code (stream.filter, stream.limit).
> The provided patch adds a single {{Stream lookup(key)}} method 
> and modifies the existing lookup methods to use it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10539) return a stream of completions from FSTCompletion

2022-04-27 Thread Dawid Weiss (Jira)

Dawid Weiss created LUCENE-10539:


 Summary: return a stream of completions from FSTCompletion
 Key: LUCENE-10539
 URL: https://issues.apache.org/jira/browse/LUCENE-10539
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Dawid Weiss
Assignee: Dawid Weiss


FSTLookup currently has a "num" parameter which limits the number of 
completions from the underlying automaton. But this has severe disadvantages if 
you need to collect completions that need to fulfill a secondary condition (for 
example, collect only verbs or terms that contain a certain infix). Then you 
can't determine the 'num' parameter easily because the number of filtered 
completions is unknown.

I also think implementation-wise it's also much nicer to provide a stream that 
iterates over completions rather than a fixed-size list. This allows for much 
more elegant code (stream.filter, stream.limit).

The provided patch adds a single {{Stream lookup(key)}} method and 
modifies the existing lookup methods to use it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10386) Add BOM module for ease of dependency management in dependent projects

2022-04-27 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528596#comment-17528596
 ] 

Dawid Weiss commented on LUCENE-10386:
--

Hi Petr. 

So, I did have a look. TL;DR; version is that I honestly think this kind of 
thing should be moved to downstream projects to handle in whatever way they 
fancy. It introduces an additional level of overhead to Lucene maintenance and 
a potential for problems that I don't think justifies the gains (see below for 
specific examples). BOMs are not the only way to avoid adding consistent 
version numbers to projects (Lucene uses Palantir's version consistency plugin, 
for example) and the diversity here means it'll be hard to please everyone. If 
you need a BOM - you can create a subproject in your own project (with all the 
dependencies needed) and treat it as a platform... So it's not that difficult.

Here is what I noticed when I applied your patch (and it motivates my above 
opinion):

1) the diff of poms in the release (gradlew -p lucene/distribution 
assembleRelease) shows the description and name have changed:
{code}
  Apache Lucene (module: lucene-root)
  Grandparent project for Apache Lucene Core
{code}

The refactoring you made to extract configurePublicationMetadata has a side 
effect in that the lazy provider resolves project reference to the root instead 
of the context properly. 

2) the code for constraints in the BOM submodule includes all the exported 
Lucene subprojects. But in reality many people will be using just a subset of 
those - the constraints imposed by the BOM (including transitive dependencies?) 
will have to be downloaded and will be effective for those dependencies the 
bom-importing project is not touching at all. I see this as a problem than a 
benefit, actually.

> Add BOM module for ease of dependency management in dependent projects
> --
>
> Key: LUCENE-10386
> URL: https://issues.apache.org/jira/browse/LUCENE-10386
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: general/build
>Affects Versions: 9.0, 8.4, 8.11.1
>Reporter: Petr Portnov
>Priority: Trivial
>  Labels: BOM, Dependencies
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h1. Short description
> Add a Bill-of-Materials (BOM) module to Lucene to allow foreign projects to 
> use it for dependency management.
> h1. Reasoning
> [A lot of|https://mvnrepository.com/search?q=bom] multi-module projects are 
> providing BOMs in order to simplify dependency management. This allows 
> dependant projects to only specify the version of the BOM module while 
> declaring the dependencies without them (as the will be provided by BOM).
> For example:
> {code:groovy}
> dependencies {
> // Only specify the version of the BOM
> implementation platform('com.fasterxml.jackson:jackson-bom:2.13.1')
> // Don't specify dependency versions as they are provided by the BOM
> implementation "com.fasterxml.jackson.core:jackson-annotations"
> implementation "com.fasterxml.jackson.core:jackson-core"
> implementation "com.fasterxml.jackson.core:jackson-databind"
> implementation "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-guava"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jdk8"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jsr310"
> implementation 
> "com.fasterxml.jackson.module:jackson-module-parameter-names"
> }{code}
>  
> Not only is this approach "popular" but it also has the following pros:
>  * there is no need to declare a variable (via Maven properties or Gradle 
> ext) to hold the version
>  * this is more automation-friendly because tools like Dependabot only have 
> to update the single version per dependency group
> h1. Other suggestions
> It may be reasonable to also publish BOMs for old versions so that the 
> projects which currently rely on older Lucene versions (such as 8.4) can 
> migrate to the BOM approach without migrating to Lucene 9.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10386) Add BOM module for ease of dependency management in dependent projects

2022-04-26 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528377#comment-17528377
 ] 

Dawid Weiss commented on LUCENE-10386:
--

Hi Petr. Sorry for the delay. I'll try to go through this tomorrow morning and 
see if I have any doubts. 

> Add BOM module for ease of dependency management in dependent projects
> --
>
> Key: LUCENE-10386
> URL: https://issues.apache.org/jira/browse/LUCENE-10386
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: general/build
>Affects Versions: 9.0, 8.4, 8.11.1
>Reporter: Petr Portnov
>Priority: Trivial
>  Labels: BOM, Dependencies
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h1. Short description
> Add a Bill-of-Materials (BOM) module to Lucene to allow foreign projects to 
> use it for dependency management.
> h1. Reasoning
> [A lot of|https://mvnrepository.com/search?q=bom] multi-module projects are 
> providing BOMs in order to simplify dependency management. This allows 
> dependant projects to only specify the version of the BOM module while 
> declaring the dependencies without them (as the will be provided by BOM).
> For example:
> {code:groovy}
> dependencies {
> // Only specify the version of the BOM
> implementation platform('com.fasterxml.jackson:jackson-bom:2.13.1')
> // Don't specify dependency versions as they are provided by the BOM
> implementation "com.fasterxml.jackson.core:jackson-annotations"
> implementation "com.fasterxml.jackson.core:jackson-core"
> implementation "com.fasterxml.jackson.core:jackson-databind"
> implementation "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-guava"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jdk8"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jsr310"
> implementation 
> "com.fasterxml.jackson.module:jackson-module-parameter-names"
> }{code}
>  
> Not only is this approach "popular" but it also has the following pros:
>  * there is no need to declare a variable (via Maven properties or Gradle 
> ext) to hold the version
>  * this is more automation-friendly because tools like Dependabot only have 
> to update the single version per dependency group
> h1. Other suggestions
> It may be reasonable to also publish BOMs for old versions so that the 
> projects which currently rely on older Lucene versions (such as 8.4) can 
> migrate to the BOM approach without migrating to Lucene 9.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10535) The build fails in :checkUnusedConstraints (ConcurrentModificationException)

2022-04-25 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10535.
--
Fix Version/s: 10.0 (main)
   Resolution: Fixed

> The build fails in :checkUnusedConstraints (ConcurrentModificationException)
> 
>
> Key: LUCENE-10535
> URL: https://issues.apache.org/jira/browse/LUCENE-10535
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: 10.0 (main)
>
>
> {code}
> * What went wrong:
> Execution failed for task ':checkUnusedConstraints'.
> > Error while evaluating property 'classpath' of task 
> > ':checkUnusedConstraints'
>> Failed to calculate the value of task ':checkUnusedConstraints' property 
> 'classpath'.
>   > java.util.ConcurrentModificationException (no error message)
> {code}
> Seems to be related to this:
> https://github.com/palantir/gradle-consistent-versions/issues/450



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-10535) The build fails in :checkUnusedConstraints (ConcurrentModificationException)

2022-04-25 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned LUCENE-10535:


Assignee: Dawid Weiss

> The build fails in :checkUnusedConstraints (ConcurrentModificationException)
> 
>
> Key: LUCENE-10535
> URL: https://issues.apache.org/jira/browse/LUCENE-10535
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
>
> {code}
> * What went wrong:
> Execution failed for task ':checkUnusedConstraints'.
> > Error while evaluating property 'classpath' of task 
> > ':checkUnusedConstraints'
>> Failed to calculate the value of task ':checkUnusedConstraints' property 
> 'classpath'.
>   > java.util.ConcurrentModificationException (no error message)
> {code}
> Seems to be related to this:
> https://github.com/palantir/gradle-consistent-versions/issues/450



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10535) The build fails in :checkUnusedConstraints (ConcurrentModificationException)

2022-04-25 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527924#comment-17527924
 ] 

Dawid Weiss commented on LUCENE-10535:
--

Upgraded the plugin to 2.10.0 - the build passes for me locally, let's see if 
this helps.

> The build fails in :checkUnusedConstraints (ConcurrentModificationException)
> 
>
> Key: LUCENE-10535
> URL: https://issues.apache.org/jira/browse/LUCENE-10535
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
>
> {code}
> * What went wrong:
> Execution failed for task ':checkUnusedConstraints'.
> > Error while evaluating property 'classpath' of task 
> > ':checkUnusedConstraints'
>> Failed to calculate the value of task ':checkUnusedConstraints' property 
> 'classpath'.
>   > java.util.ConcurrentModificationException (no error message)
> {code}
> Seems to be related to this:
> https://github.com/palantir/gradle-consistent-versions/issues/450



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10535) The build fails in :checkUnusedConstraints (ConcurrentModificationException)

2022-04-25 Thread Dawid Weiss (Jira)

Dawid Weiss created LUCENE-10535:


 Summary: The build fails in :checkUnusedConstraints 
(ConcurrentModificationException)
 Key: LUCENE-10535
 URL: https://issues.apache.org/jira/browse/LUCENE-10535
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Dawid Weiss


{code}
* What went wrong:
Execution failed for task ':checkUnusedConstraints'.
> Error while evaluating property 'classpath' of task ':checkUnusedConstraints'
   > Failed to calculate the value of task ':checkUnusedConstraints' property 
'classpath'.
  > java.util.ConcurrentModificationException (no error message)
{code}

Seems to be related to this:
https://github.com/palantir/gradle-consistent-versions/issues/450



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10386) Add BOM module for ease of dependency management in dependent projects

2022-04-25 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527459#comment-17527459
 ] 

Dawid Weiss commented on LUCENE-10386:
--

Hi Petr. I saw the PR but I'm not following all the changes happening there. I 
honestly just prefer dead simple verbosity... Will take another look in a spare 
minute though, unless somebody beats me to it.

> Add BOM module for ease of dependency management in dependent projects
> --
>
> Key: LUCENE-10386
> URL: https://issues.apache.org/jira/browse/LUCENE-10386
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: general/build
>Affects Versions: 9.0, 8.4, 8.11.1
>Reporter: Petr Portnov
>Priority: Trivial
>  Labels: BOM, Dependencies
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h1. Short description
> Add a Bill-of-Materials (BOM) module to Lucene to allow foreign projects to 
> use it for dependency management.
> h1. Reasoning
> [A lot of|https://mvnrepository.com/search?q=bom] multi-module projects are 
> providing BOMs in order to simplify dependency management. This allows 
> dependant projects to only specify the version of the BOM module while 
> declaring the dependencies without them (as the will be provided by BOM).
> For example:
> {code:groovy}
> dependencies {
> // Only specify the version of the BOM
> implementation platform('com.fasterxml.jackson:jackson-bom:2.13.1')
> // Don't specify dependency versions as they are provided by the BOM
> implementation "com.fasterxml.jackson.core:jackson-annotations"
> implementation "com.fasterxml.jackson.core:jackson-core"
> implementation "com.fasterxml.jackson.core:jackson-databind"
> implementation "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-guava"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jdk8"
> implementation "com.fasterxml.jackson.datatype:jackson-datatype-jsr310"
> implementation 
> "com.fasterxml.jackson.module:jackson-module-parameter-names"
> }{code}
>  
> Not only is this approach "popular" but it also has the following pros:
>  * there is no need to declare a variable (via Maven properties or Gradle 
> ext) to hold the version
>  * this is more automation-friendly because tools like Dependabot only have 
> to update the single version per dependency group
> h1. Other suggestions
> It may be reasonable to also publish BOMs for old versions so that the 
> projects which currently rely on older Lucene versions (such as 8.4) can 
> migrate to the BOM approach without migrating to Lucene 9.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10528) TestScripts.testLukeCanBeLaunched creates X Window when running the tests

2022-04-22 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526769#comment-17526769
 ] 

Dawid Weiss commented on LUCENE-10528:
--

I agree - let's mark it slow. I run slow tests occasionally too and I'm on 
windows.

> TestScripts.testLukeCanBeLaunched creates X Window when running the tests
> -
>
> Key: LUCENE-10528
> URL: https://issues.apache.org/jira/browse/LUCENE-10528
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When running the tests, this one causes my entire desktop to "flicker" when 
> it creates some kind of X-Window very quickly and then destroys it. I use 
> tiling window manager, so whole desktop gets rearranged for a split second, 
> and I'd rather it not happen :)
> I first tried adding -Djava.awt.headless=true to both org.gradle.jvmargs and 
> tests.jvmargs in my .gradle/gradle.properties. doesn't work, as the test 
> doesnt use these when launching luke.
> I next tried hacking the test by adding this to the ProcessBuilderThingy, but 
> it didn't help either:
> {noformat}
> .envvar("LAUNCH_OPTS", "-Djava.awt.headless=true")
> {noformat}
> One way I can work around it, is to unset {{DISPLAY}} env var so that it 
> won't create this window. test still passes:
> {noformat}
> $ unset DISPLAY
> $ ./gradlew :lucene:distribution.tests:test
> ... (no window gets created)
> {noformat}
> So maybe as a workaround, we can just not pass DISPLAY environment variable 
> through to this test?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test

2022-04-22 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526611#comment-17526611
 ] 

Dawid Weiss commented on LUCENE-10521:
--

You can use a custom IndexDeletionPolicy - one that never deletes and previous 
commit, for example. Then create two (or more) commits and each will have a 
different set of files. You can open a reader over any arbitrary commit so this 
should be simple and consistent?

> Tests in windows are failing for the new 
> testAlwaysRefreshDirectoryTaxonomyReader test
> --
>
> Key: LUCENE-10521
> URL: https://issues.apache.org/jira/browse/LUCENE-10521
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
> Environment: Windows 10
>Reporter: Gautam Worah
>Priority: Minor
>
> Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is 
> failing.
>  
> Specifically, the loop which checks if any files still remain to be deleted 
> is not ending.
> We have added an exception to the main test class to not run the test on 
> WindowsFS (not sure if this is related).
>  
> ```
> SEVERE: 1 thread leaked from SUITE scope at 
> org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader:
>  1) Thread[id=19, 
> name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959],
>  state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native 
> Method) at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390)
>  at 
> java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307)
>  at 
> java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251)
>  at 
> java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at java.base@18/java.nio.file.Files.delete(Files.java:1152) at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121)
>  at 
> app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97)
> ```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10528) TestScripts.testLukeCanBeLaunched creates X Window when running the tests

2022-04-21 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526207#comment-17526207
 ] 

Dawid Weiss commented on LUCENE-10528:
--

Should we do it for an entire gradle process though? Why not just for the 
forked jvm in that test?

> TestScripts.testLukeCanBeLaunched creates X Window when running the tests
> -
>
> Key: LUCENE-10528
> URL: https://issues.apache.org/jira/browse/LUCENE-10528
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When running the tests, this one causes my entire desktop to "flicker" when 
> it creates some kind of X-Window very quickly and then destroys it. I use 
> tiling window manager, so whole desktop gets rearranged for a split second, 
> and I'd rather it not happen :)
> I first tried adding -Djava.awt.headless=true to both org.gradle.jvmargs and 
> tests.jvmargs in my .gradle/gradle.properties. doesn't work, as the test 
> doesnt use these when launching luke.
> I next tried hacking the test by adding this to the ProcessBuilderThingy, but 
> it didn't help either:
> {noformat}
> .envvar("LAUNCH_OPTS", "-Djava.awt.headless=true")
> {noformat}
> One way I can work around it, is to unset {{DISPLAY}} env var so that it 
> won't create this window. test still passes:
> {noformat}
> $ unset DISPLAY
> $ ./gradlew :lucene:distribution.tests:test
> ... (no window gets created)
> {noformat}
> So maybe as a workaround, we can just not pass DISPLAY environment variable 
> through to this test?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10528) TestScripts.testLukeCanBeLaunched creates X Window when running the tests

2022-04-21 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526064#comment-17526064
 ] 

Dawid Weiss commented on LUCENE-10528:
--

There are so many layers to awt/swing support that it should actually run on 
the defaults Java provides. I've seen weird things with virtualized graphics 
environments (arguably, it's been a while so things might have improved). 
Running with xvfb on github jobs is a good idea and is better than nothing (I 
don't know much about setting up xvfb but I can take a look). 

We can make it opt-in but I'm afraid it'd just bury the test forever and nobody 
would ever run it. An alternative is to make it opt-out (via gradle.properties) 
or we can mark it slow, which would disable it for many folks who don't 
explicitly run slow tests. 


> TestScripts.testLukeCanBeLaunched creates X Window when running the tests
> -
>
> Key: LUCENE-10528
> URL: https://issues.apache.org/jira/browse/LUCENE-10528
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>
> When running the tests, this one causes my entire desktop to "flicker" when 
> it creates some kind of X-Window very quickly and then destroys it. I use 
> tiling window manager, so whole desktop gets rearranged for a split second, 
> and I'd rather it not happen :)
> I first tried adding -Djava.awt.headless=true to both org.gradle.jvmargs and 
> tests.jvmargs in my .gradle/gradle.properties. doesn't work, as the test 
> doesnt use these when launching luke.
> I next tried hacking the test by adding this to the ProcessBuilderThingy, but 
> it didn't help either:
> {noformat}
> .envvar("LAUNCH_OPTS", "-Djava.awt.headless=true")
> {noformat}
> One way I can work around it, is to unset {{DISPLAY}} env var so that it 
> won't create this window. test still passes:
> {noformat}
> $ unset DISPLAY
> $ ./gradlew :lucene:distribution.tests:test
> ... (no window gets created)
> {noformat}
> So maybe as a workaround, we can just not pass DISPLAY environment variable 
> through to this test?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10528) TestScripts.testLukeCanBeLaunched creates X Window when running the tests

2022-04-21 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526050#comment-17526050
 ] 

Dawid Weiss commented on LUCENE-10528:
--

Hmm... But we want to run this test occasionally, don't we? If we disable it 
completely then it will stop testing if Luke can be actually launched (and 
nothing fails). The reason why it passes in headless mode is because in 
headless mode LukeMain exits if it detects it:
{code}
if (sanityCheck && GraphicsEnvironment.isHeadless()) {
  Logger.getGlobal().log(Level.SEVERE, "[Vader] Hello, Luke. Can't do much 
in headless mode.");
  Runtime.getRuntime().exit(0);
}
{code}

We can provide a test annotation group that would be enabled by default but 
could be explicitly turned off via gradle.properties. Something like 
RequiresGraphicsEnvironment?


> TestScripts.testLukeCanBeLaunched creates X Window when running the tests
> -
>
> Key: LUCENE-10528
> URL: https://issues.apache.org/jira/browse/LUCENE-10528
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>
> When running the tests, this one causes my entire desktop to "flicker" when 
> it creates some kind of X-Window very quickly and then destroys it. I use 
> tiling window manager, so whole desktop gets rearranged for a split second, 
> and I'd rather it not happen :)
> I first tried adding -Djava.awt.headless=true to both org.gradle.jvmargs and 
> tests.jvmargs in my .gradle/gradle.properties. doesn't work, as the test 
> doesnt use these when launching luke.
> I next tried hacking the test by adding this to the ProcessBuilderThingy, but 
> it didn't help either:
> {noformat}
> .envvar("LAUNCH_OPTS", "-Djava.awt.headless=true")
> {noformat}
> One way I can work around it, is to unset {{DISPLAY}} env var so that it 
> won't create this window. test still passes:
> {noformat}
> $ unset DISPLAY
> $ ./gradlew :lucene:distribution.tests:test
> ... (no window gets created)
> {noformat}
> So maybe as a workaround, we can just not pass DISPLAY environment variable 
> through to this test?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test

2022-04-21 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526044#comment-17526044
 ] 

Dawid Weiss commented on LUCENE-10521:
--

I'm not familiar with the code (or the test) but to me something seems off 
here. To me the deletion of an elsewhere open file seems awkward, even in a 
test, and relying on this behavior seems strange. Why is the list of files in a 
directory treated as a state ("commit" in the test)? Does it have to be? 
Wouldn't a proper Lucene's IndexCommit.getFileNames be more adequate? Sorry if 
this doesn't make sense in the context but it just feels fishy somehow.

> Tests in windows are failing for the new 
> testAlwaysRefreshDirectoryTaxonomyReader test
> --
>
> Key: LUCENE-10521
> URL: https://issues.apache.org/jira/browse/LUCENE-10521
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
> Environment: Windows 10
>Reporter: Gautam Worah
>Priority: Minor
>
> Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is 
> failing.
>  
> Specifically, the loop which checks if any files still remain to be deleted 
> is not ending.
> We have added an exception to the main test class to not run the test on 
> WindowsFS (not sure if this is related).
>  
> ```
> SEVERE: 1 thread leaked from SUITE scope at 
> org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader:
>  1) Thread[id=19, 
> name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959],
>  state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native 
> Method) at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390)
>  at 
> java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307)
>  at 
> java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251)
>  at 
> java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at java.base@18/java.nio.file.Files.delete(Files.java:1152) at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121)
>  at 
> app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97)
> ```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks

2022-04-13 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521850#comment-17521850
 ] 

Dawid Weiss commented on LUCENE-10510:
--

I suspected it might have been the nightly runs. We could try to detect whether 
the JVM would run with an unexported jdk package (anything up until jdk16?) but 
I think it buries the problem rather than solves it. I think it's easy to run a 
first pass that generates those required JVM settings. If you for some reason 
can't do it, pass them via command-line (or environment variables) directly to 
gradle - 
https://docs.gradle.org/current/userguide/build_environment.html#sec:gradle_environment_variables

this will also work, even in the absence of gradle.properties, as the task 
verifies whether the required modules are open (not how or where they were 
opened). Sorry for the complications - not my fault. :) 

> Check module access prior to running gjf/spotless/errorprone tasks
> --
>
> Key: LUCENE-10510
> URL: https://issues.apache.org/jira/browse/LUCENE-10510
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 9.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PR at: [https://github.com/apache/lucene/pull/802]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks

2022-04-12 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521323#comment-17521323
 ] 

Dawid Weiss commented on LUCENE-10510:
--

Hi Alan. The task graph is fine. When you run 'gradlew clean test' the new task 
would not be included. If you take a look at the dependencies, it is only 
included if either spotless is actually part of the execution graph or you run 
java compilation with -Ptests.slow=true (in which case it is needed because 
error-prone does require those vm opening settings). I think everything is set 
up correctly. I believe your CI jobs were passing on 9x with JDKs older than 17 
because those JDKs emitted a warning about package accesses. The right way to 
fix the problem would be to add the right exports or, even better, run gradlew 
help or an explicit gradlew localSettings to make sure everything is set up 
correctly in gradle.properties.

> Check module access prior to running gjf/spotless/errorprone tasks
> --
>
> Key: LUCENE-10510
> URL: https://issues.apache.org/jira/browse/LUCENE-10510
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 9.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PR at: [https://github.com/apache/lucene/pull/802]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users

2022-04-11 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520643#comment-17520643
 ] 

Dawid Weiss commented on LUCENE-10513:
--

Perhaps you could add a line to:

[https://github.com/apache/lucene/blob/main/help/workflow.txt]

and mention the tidy task that reformats the code prior to check.

> Make it more obvious how to fix Spotless issues for new users
> -
>
> Key: LUCENE-10513
> URL: https://issues.apache.org/jira/browse/LUCENE-10513
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Rich Bowen
>Priority: Minor
>
> I just made my first PR to Lucene (yay me!) and in the process stumbled on 
> various things that were non-obvious.
> I request, for The Next Person, that the error messaging in `gradlew` make it 
> more obvious that one should run `./gradlew tidy` the first time around, so 
> as to avoid the low-hanging formatting problems that cause everything else to 
> fail.
> During the course of my fumbling around, I was encouraged to run:
> ./gradlew :lucene:suggest:spotlessJavaCheck
> ./gradlew :lucene:suggest:spotlessApply
> ./gradlew :lucene:test-framework:spotlessApply
> and
> ./gradlew check -Ptests.nightly=true
> various times, by the error messages in `./gradlew check`, and while I got 
> there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew 
> tidy` first may have saved some frustration.
> That said, I cannot overstate how impressed I am with the thoroughness of the 
> testing/verification tools, and wish more projects had this kind of tooling. 
> Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users

2022-04-11 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520641#comment-17520641
 ] 

Dawid Weiss commented on LUCENE-10513:
--

You should make yourself familiar with various help files under help/, here is 
one of them explicitly talking about formatting:

[https://github.com/apache/lucene/blob/main/help/formatting.txt]

I don't think more can be done about it, to be honest.

> Make it more obvious how to fix Spotless issues for new users
> -
>
> Key: LUCENE-10513
> URL: https://issues.apache.org/jira/browse/LUCENE-10513
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Rich Bowen
>Priority: Minor
>
> I just made my first PR to Lucene (yay me!) and in the process stumbled on 
> various things that were non-obvious.
> I request, for The Next Person, that the error messaging in `gradlew` make it 
> more obvious that one should run `./gradlew tidy` the first time around, so 
> as to avoid the low-hanging formatting problems that cause everything else to 
> fail.
> During the course of my fumbling around, I was encouraged to run:
> ./gradlew :lucene:suggest:spotlessJavaCheck
> ./gradlew :lucene:suggest:spotlessApply
> ./gradlew :lucene:test-framework:spotlessApply
> and
> ./gradlew check -Ptests.nightly=true
> various times, by the error messages in `./gradlew check`, and while I got 
> there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew 
> tidy` first may have saved some frustration.
> That said, I cannot overstate how impressed I am with the thoroughness of the 
> testing/verification tools, and wish more projects had this kind of tooling. 
> Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets

2022-04-11 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10229:
-
Priority: Minor  (was: Major)

> Match offsets should be consistent for fields with positions and fields with 
> offsets
> 
>
> Key: LUCENE-10229
> URL: https://issues.apache.org/jira/browse/LUCENE-10229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up of LUCENE-10223 in which it was discovered that fields 
> with
> offsets don't highlight some more complex interval queries properly.  Alan 
> says:
> {quote}
> It's because it returns the position of the inner match, but the offsets of 
> the outer.  And so if you're re-analyzing and retrieving offsets by looking 
> at the positions, you get the 'right' thing.  It's not obvious to me what the 
> correct response is here, but thinking about it the current behaviour is kind 
> of the worst of both worlds, and perhaps we should change it so that you get 
> offsets of the inner match as standard, and then the outer match is returned 
> as part of the sub matches.
> {quote}
> Intervals are nicely separated into "basic intervals" and "filters" which 
> restrict some other source of intervals, here is the original documentation:
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50
> My experience from an extended period of using interval queries in a frontend 
> where they're highlighted is that filters are restrictions that should not be 
> highlighted - it's the source intervals that people care about. Filters are 
> what you remove or where you give proper context to source intervals.
> The test code contributed in LUCENE-10223 contains numerous query-highlight 
> examples (on fields with positions) where this intuition is demonstrated on 
> all kinds of interval functions:
> https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542
> This issue is about making the internals work consistently for fields with 
> positions and fields with offsets.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets

2022-04-11 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned LUCENE-10229:


Assignee: Dawid Weiss

> Match offsets should be consistent for fields with positions and fields with 
> offsets
> 
>
> Key: LUCENE-10229
> URL: https://issues.apache.org/jira/browse/LUCENE-10229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up of LUCENE-10223 in which it was discovered that fields 
> with
> offsets don't highlight some more complex interval queries properly.  Alan 
> says:
> {quote}
> It's because it returns the position of the inner match, but the offsets of 
> the outer.  And so if you're re-analyzing and retrieving offsets by looking 
> at the positions, you get the 'right' thing.  It's not obvious to me what the 
> correct response is here, but thinking about it the current behaviour is kind 
> of the worst of both worlds, and perhaps we should change it so that you get 
> offsets of the inner match as standard, and then the outer match is returned 
> as part of the sub matches.
> {quote}
> Intervals are nicely separated into "basic intervals" and "filters" which 
> restrict some other source of intervals, here is the original documentation:
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50
> My experience from an extended period of using interval queries in a frontend 
> where they're highlighted is that filters are restrictions that should not be 
> highlighted - it's the source intervals that people care about. Filters are 
> what you remove or where you give proper context to source intervals.
> The test code contributed in LUCENE-10223 contains numerous query-highlight 
> examples (on fields with positions) where this intuition is demonstrated on 
> all kinds of interval functions:
> https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542
> This issue is about making the internals work consistently for fields with 
> positions and fields with offsets.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1707 matches

Mail list logo