[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert
[ https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581246#comment-17581246 ] Dawid Weiss commented on LUCENE-10662: -- Yep, closed it just now, thanks. > Make LuceneTestCase to not extend from org.junit.Assert > --- > > Key: LUCENE-10662 > URL: https://issues.apache.org/jira/browse/LUCENE-10662 > Project: Lucene - Core > Issue Type: Test > Components: general/test >Reporter: Marios Trivyzas >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Since *LuceneTestCase* is a very useful abstract class that can be extended > and used by many projects, having it extending *org.junit.Assert* limits all > users to exclusively use the static methods of {*}org.junit.Assert{*}. In our > project we want to use [https://joel-costigliola.github.io/assertj] where the > main method to call is *org.assertj.core.api.Assertions.assertThat* which > conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized > by default by the compiler. So one can only use assertj if on every call uses > fully qualified name for the *assertThat* method, i.e. > > {code:java} > org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert
[ https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10662. -- Resolution: Won't Do > Make LuceneTestCase to not extend from org.junit.Assert > --- > > Key: LUCENE-10662 > URL: https://issues.apache.org/jira/browse/LUCENE-10662 > Project: Lucene - Core > Issue Type: Test > Components: general/test >Reporter: Marios Trivyzas >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Since *LuceneTestCase* is a very useful abstract class that can be extended > and used by many projects, having it extending *org.junit.Assert* limits all > users to exclusively use the static methods of {*}org.junit.Assert{*}. In our > project we want to use [https://joel-costigliola.github.io/assertj] where the > main method to call is *org.assertj.core.api.Assertions.assertThat* which > conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized > by default by the compiler. So one can only use assertj if on every call uses > fully qualified name for the *assertThat* method, i.e. > > {code:java} > org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert
[ https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579703#comment-17579703 ] Dawid Weiss commented on LUCENE-10662: -- Hi [~matriv] . I don't think we'll integrate this change. You may have to prefix your assertj static methods in your code or derive your own base class based on LuceneTestCase. Thanks for bringing the problem to our attention though. I agree assertj outputs are much nicer to read (especially for collections). > Make LuceneTestCase to not extend from org.junit.Assert > --- > > Key: LUCENE-10662 > URL: https://issues.apache.org/jira/browse/LUCENE-10662 > Project: Lucene - Core > Issue Type: Test > Components: general/test >Reporter: Marios Trivyzas >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Since *LuceneTestCase* is a very useful abstract class that can be extended > and used by many projects, having it extending *org.junit.Assert* limits all > users to exclusively use the static methods of {*}org.junit.Assert{*}. In our > project we want to use [https://joel-costigliola.github.io/assertj] where the > main method to call is *org.assertj.core.api.Assertions.assertThat* which > conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized > by default by the compiler. So one can only use assertj if on every call uses > fully qualified name for the *assertThat* method, i.e. > > {code:java} > org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10677) Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale
[ https://issues.apache.org/jira/browse/LUCENE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577859#comment-17577859 ] Dawid Weiss commented on LUCENE-10677: -- > It looks like we might be able to intercept the relevant calls to > `DataInput#readString` ourselves, although adding support for compound > segments introduces an enormous amount of extra complexity to that approach. With the right tools it shouldn't be a problem. A hot-mode aspectj aspect that would deduplicate those strings selectively, where it matters, comes to mind. This said, perhaps there are cleaner solutions to solve this elegantly. Feel free to propose a patch (but no String.intern, please...). > Duplicate strings in FieldInfo#attributes contribute significantly to heap > usage at scale > - > > Key: LUCENE-10677 > URL: https://issues.apache.org/jira/browse/LUCENE-10677 > Project: Lucene - Core > Issue Type: Bug > Components: core/codecs >Affects Versions: 9.3 >Reporter: Armin Braun >Priority: Minor > Labels: heap, scalability > Attachments: lucene_duplicate_fields.png > > > This has the same origin as issue LUCENE-10676 . Running a single process > with thousands of fields across many indexes will lead to a lot of duplicate > strings retained as keys and values in the `attributes` map. This can amount > to GBs of heap for thousands of fields across a few thousand segments. The > strings in the below heap dump analysis account for more than half (roughly > 2/3 and the field names are somewhat unusually long in this example) the > duplicate strings from `FieldInfo` instances. > If we could deduplicate theses obvious known strings when reading `FieldInfo` > we could save GBs of heap for use cases like this. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10677) Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale
[ https://issues.apache.org/jira/browse/LUCENE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577468#comment-17577468 ] Dawid Weiss commented on LUCENE-10677: -- String.intern is evil for many reasons and your use case is indeed, ahem, atypical. I don't think adding "a few known strings" is an elegant solution since hacks like this one tend to become stale quickly... You could try the JVM's UseStringDeduplication option - an ugly workaround but easy one - but I think you'll run into other problems soon enough with this number of concurrent indices/segments/fields. If you have to live with this then it's likely that you'll have to follow Rob's advice sooner or later. > Duplicate strings in FieldInfo#attributes contribute significantly to heap > usage at scale > - > > Key: LUCENE-10677 > URL: https://issues.apache.org/jira/browse/LUCENE-10677 > Project: Lucene - Core > Issue Type: Bug > Components: core/codecs >Affects Versions: 9.3 >Reporter: Armin Braun >Priority: Minor > Labels: heap, scalability > Attachments: lucene_duplicate_fields.png > > > This has the same origin as issue LUCENE-10676 . Running a single process > with thousands of fields across many indexes will lead to a lot of duplicate > strings retained as keys and values in the `attributes` map. This can amount > to GBs of heap for thousands of fields across a few thousand segments. The > strings in the below heap dump analysis account for more than half (roughly > 2/3 and the field names are somewhat unusually long in this example) the > duplicate strings from `FieldInfo` instances. > If we could deduplicate theses obvious known strings when reading `FieldInfo` > we could save GBs of heap for use cases like this. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10677) Duplicate strings in FieldInfo#attributes contribute significantly to heap usage at scale
[ https://issues.apache.org/jira/browse/LUCENE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577408#comment-17577408 ] Dawid Weiss commented on LUCENE-10677: -- It would help a lot if you could provide an example of how you ended up with 25 million FieldInfo objects that cannot be garbage collected. This is weird and unexpected, I'd say. > Duplicate strings in FieldInfo#attributes contribute significantly to heap > usage at scale > - > > Key: LUCENE-10677 > URL: https://issues.apache.org/jira/browse/LUCENE-10677 > Project: Lucene - Core > Issue Type: Bug > Components: core/codecs >Affects Versions: 9.3 >Reporter: Armin Braun >Priority: Minor > Labels: heap, scalability > Attachments: lucene_duplicate_fields.png > > > This has the same origin as issue LUCENE-10676 . Running a single process > with thousands of fields across many indexes will lead to a lot of duplicate > strings retained as keys and values in the `attributes` map. This can amount > to GBs of heap for thousands of fields across a few thousand segments. The > strings in the below heap dump analysis account for more than half (roughly > 2/3 and the field names are somewhat unusually long in this example) the > duplicate strings from `FieldInfo` instances. > If we could deduplicate theses obvious known strings when reading `FieldInfo` > we could save GBs of heap for use cases like this. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10386) Add BOM module for ease of dependency management in dependent projects
[ https://issues.apache.org/jira/browse/LUCENE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576342#comment-17576342 ] Dawid Weiss commented on LUCENE-10386: -- I'm for closing this. > Add BOM module for ease of dependency management in dependent projects > -- > > Key: LUCENE-10386 > URL: https://issues.apache.org/jira/browse/LUCENE-10386 > Project: Lucene - Core > Issue Type: Wish > Components: general/build >Affects Versions: 9.0, 8.4, 8.11.1 >Reporter: Petr Portnov >Priority: Trivial > Labels: BOM, Dependencies > Time Spent: 10m > Remaining Estimate: 0h > > h1. Short description > Add a Bill-of-Materials (BOM) module to Lucene to allow foreign projects to > use it for dependency management. > h1. Reasoning > [A lot of|https://mvnrepository.com/search?q=bom] multi-module projects are > providing BOMs in order to simplify dependency management. This allows > dependant projects to only specify the version of the BOM module while > declaring the dependencies without them (as the will be provided by BOM). > For example: > {code:groovy} > dependencies { > // Only specify the version of the BOM > implementation platform('com.fasterxml.jackson:jackson-bom:2.13.1') > // Don't specify dependency versions as they are provided by the BOM > implementation "com.fasterxml.jackson.core:jackson-annotations" > implementation "com.fasterxml.jackson.core:jackson-core" > implementation "com.fasterxml.jackson.core:jackson-databind" > implementation "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-guava" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-jdk8" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-jsr310" > implementation > "com.fasterxml.jackson.module:jackson-module-parameter-names" > }{code} > > Not only is this approach "popular" but it also has the following pros: > * there is no need to declare a variable (via Maven properties or Gradle > ext) to hold the version > * this is more automation-friendly because tools like Dependabot only have > to update the single version per dependency group > h1. Other suggestions > It may be reasonable to also publish BOMs for old versions so that the > projects which currently rely on older Lucene versions (such as 8.4) can > migrate to the BOM approach without migrating to Lucene 9.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] (LUCENE-10671) Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10671 ] Dawid Weiss deleted comment on LUCENE-10671: -- was (Author: JIRAUSER293699): https://allnewcracksoftwares.com/typing-master-pro-11-crack-with-serial-keys-download/ > Lucene > -- > > Key: LUCENE-10671 > URL: https://issues.apache.org/jira/browse/LUCENE-10671 > Project: Lucene - Core > Issue Type: Bug > Components: core/hnsw >Affects Versions: 8.11.2 >Reporter: allnewcracksoftwares >Priority: Minor > > [link title|http://example.com] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10671) Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573638#comment-17573638 ] Dawid Weiss commented on LUCENE-10671: -- Spammer. > Lucene > -- > > Key: LUCENE-10671 > URL: https://issues.apache.org/jira/browse/LUCENE-10671 > Project: Lucene - Core > Issue Type: Bug > Components: core/hnsw >Affects Versions: 8.11.2 >Reporter: allnewcracksoftwares >Priority: Minor > > [link title|http://example.com] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10671) Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10671: - Environment: (was: https://allnewcracksoftwares.com/avast-secure-line-vpn-crack-download-with-key-latest-version/) > Lucene > -- > > Key: LUCENE-10671 > URL: https://issues.apache.org/jira/browse/LUCENE-10671 > Project: Lucene - Core > Issue Type: Bug > Components: core/hnsw >Affects Versions: 8.11.2 >Reporter: allnewcracksoftwares >Priority: Minor > > [link title|http://example.com] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Closed] (LUCENE-10671) Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss closed LUCENE-10671. > Lucene > -- > > Key: LUCENE-10671 > URL: https://issues.apache.org/jira/browse/LUCENE-10671 > Project: Lucene - Core > Issue Type: Bug > Components: core/hnsw >Affects Versions: 8.11.2 > Environment: > https://allnewcracksoftwares.com/avast-secure-line-vpn-crack-download-with-key-latest-version/ >Reporter: allnewcracksoftwares >Priority: Minor > > [link title|http://example.com] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10671) Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10671. -- Resolution: Invalid > Lucene > -- > > Key: LUCENE-10671 > URL: https://issues.apache.org/jira/browse/LUCENE-10671 > Project: Lucene - Core > Issue Type: Bug > Components: core/hnsw >Affects Versions: 8.11.2 > Environment: > https://allnewcracksoftwares.com/avast-secure-line-vpn-crack-download-with-key-latest-version/ >Reporter: allnewcracksoftwares >Priority: Minor > > [link title|http://example.com] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10669) The build should be more helpful when generated resources are touched
[ https://issues.apache.org/jira/browse/LUCENE-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10669. -- Fix Version/s: 9.4 Resolution: Fixed > The build should be more helpful when generated resources are touched > - > > Key: LUCENE-10669 > URL: https://issues.apache.org/jira/browse/LUCENE-10669 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 10.0 (main) >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 9.4 > > Time Spent: 10m > Remaining Estimate: 0h > > As per discussion at [https://github.com/apache/lucene/pull/1016,] it'd be > good if a build failure could point at the sources and generated files of the > task for which checksums are mismatched (signaling either modified templates > or accidentally modified generated files). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert
[ https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573181#comment-17573181 ] Dawid Weiss commented on LUCENE-10662: -- > Why is it necessary to break the inheritance in order to achieve what is > wanted here? I think it's because static imports won't be resolved properly in a subclass if there's an "assertThat" method in a superclass, which would require the kind of delegation trickery you mentioned, Mike. I'm a bit torn on this one, actually. I like aspectj but it does seem like changing LuceneTestCase's inheritance may be too invasive for both Lucene and existing downstream projects that rely on it. > Make LuceneTestCase to not extend from org.junit.Assert > --- > > Key: LUCENE-10662 > URL: https://issues.apache.org/jira/browse/LUCENE-10662 > Project: Lucene - Core > Issue Type: Test > Components: general/test >Reporter: Marios Trivyzas >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Since *LuceneTestCase* is a very useful abstract class that can be extended > and used by many projects, having it extending *org.junit.Assert* limits all > users to exclusively use the static methods of {*}org.junit.Assert{*}. In our > project we want to use [https://joel-costigliola.github.io/assertj] where the > main method to call is *org.assertj.core.api.Assertions.assertThat* which > conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized > by default by the compiler. So one can only use assertj if on every call uses > fully qualified name for the *assertThat* method, i.e. > > {code:java} > org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10669) The build should be more helpful when generated resources are touched
[ https://issues.apache.org/jira/browse/LUCENE-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573059#comment-17573059 ] Dawid Weiss commented on LUCENE-10669: -- PR is at: [https://github.com/apache/lucene/pull/1053] Essentially it prints the inputs and outputs of the regeneration task (from which checksums are computed). It won't help if the sources for generation are non-files (only properties) but it's better than before? {code} > Task :lucene:core:utilGenPackedChecksumCheck FAILED FAILURE: Build failed with an exception. * Where: Script 'C:\Work\apache\lucene\main\gradle\generation\regenerate.gradle' line: 186 * What went wrong: Execution failed for task ':lucene:core:utilGenPackedChecksumCheck'. > Checksums mismatch for derived resources; you might have modified a generated > resource (regenerate task: utilGenPacked): Current: lucene/core/src/java/org/apache/lucene/util/packed/Packed64SingleBlock.java=14326081c8c6a281051f9ffe94695d2a467f3db8 Expected: lucene/core/src/java/org/apache/lucene/util/packed/Packed64SingleBlock.java=2680e0a7c7207ddf615f50fd22465c809904ac42 Input files for this task are: C:\Work\apache\lucene\main\gradle\generation\moman\gen_BulkOperation.py C:\Work\apache\lucene\main\gradle\generation\moman\gen_Packed64SingleBlock.py Files generated by this task are: C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperation.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked1.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked10.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked11.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked12.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked13.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked14.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked15.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked16.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked17.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked18.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked19.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked2.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked20.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked21.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked22.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked23.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked24.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked3.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked4.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked5.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked6.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked7.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked8.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPacked9.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\BulkOperationPackedSingleBlock.java C:\Work\apache\lucene\main\lucene\core\src\java\org\apache\lucene\util\packed\Packed64SingleBlock.java {code} > The build should be more helpful when generated resources are touched > - > > Key: LUCENE-10669 > URL: https://issues.apache.org/jira/browse/LUCENE-10669 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 10.0 (main) >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > As per discussion at
[jira] [Created] (LUCENE-10669) The build should be more helpful when generated resources are touched
Dawid Weiss created LUCENE-10669: Summary: The build should be more helpful when generated resources are touched Key: LUCENE-10669 URL: https://issues.apache.org/jira/browse/LUCENE-10669 Project: Lucene - Core Issue Type: Improvement Affects Versions: 10.0 (main) Reporter: Dawid Weiss Assignee: Dawid Weiss As per discussion at [https://github.com/apache/lucene/pull/1016,] it'd be good if a build failure could point at the sources and generated files of the task for which checksums are mismatched (signaling either modified templates or accidentally modified generated files). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert
[ https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572733#comment-17572733 ] Dawid Weiss commented on LUCENE-10662: -- I guess so. But you'd still have to go through the code and change the inheritance hierarchy. > Make LuceneTestCase to not extend from org.junit.Assert > --- > > Key: LUCENE-10662 > URL: https://issues.apache.org/jira/browse/LUCENE-10662 > Project: Lucene - Core > Issue Type: Test > Components: general/test >Reporter: Marios Trivyzas >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Since *LuceneTestCase* is a very useful abstract class that can be extended > and used by many projects, having it extending *org.junit.Assert* limits all > users to exclusively use the static methods of {*}org.junit.Assert{*}. In our > project we want to use [https://joel-costigliola.github.io/assertj] where the > main method to call is *org.assertj.core.api.Assertions.assertThat* which > conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized > by default by the compiler. So one can only use assertj if on every call uses > fully qualified name for the *assertThat* method, i.e. > > {code:java} > org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert
[ https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572636#comment-17572636 ] Dawid Weiss commented on LUCENE-10662: -- I'll send a heads up email to the mailing list so that this issue gets some attention. > Make LuceneTestCase to not extend from org.junit.Assert > --- > > Key: LUCENE-10662 > URL: https://issues.apache.org/jira/browse/LUCENE-10662 > Project: Lucene - Core > Issue Type: Test > Components: general/test >Reporter: Marios Trivyzas >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Since *LuceneTestCase* is a very useful abstract class that can be extended > and used by many projects, having it extending *org.junit.Assert* limits all > users to exclusively use the static methods of {*}org.junit.Assert{*}. In our > project we want to use [https://joel-costigliola.github.io/assertj] where the > main method to call is *org.assertj.core.api.Assertions.assertThat* which > conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized > by default by the compiler. So one can only use assertj if on every call uses > fully qualified name for the *assertThat* method, i.e. > > {code:java} > org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10662) Make LuceneTestCase to not extend from org.junit.Assert
[ https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571469#comment-17571469 ] Dawid Weiss commented on LUCENE-10662: -- I think the compiler should be able to pick the most specific variant based on argument types, unless there really is ambiguity - I admit I haven't checked whether this is the case, for example here: https://github.com/apache/lucene/pull/1049/files#diff-334836e7b61b74a76eec5aa18eacec6b14c1496f5595b684842ce05583a6df22L209-R213 > Make LuceneTestCase to not extend from org.junit.Assert > --- > > Key: LUCENE-10662 > URL: https://issues.apache.org/jira/browse/LUCENE-10662 > Project: Lucene - Core > Issue Type: Test > Components: general/test >Reporter: Marios Trivyzas >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Since *LuceneTestCase* is a very useful abstract class that can be extended > and used by many projects, having it extending *org.junit.Assert* limits all > users to exclusively use the static methods of {*}org.junit.Assert{*}. In our > project we want to use [https://joel-costigliola.github.io/assertj] where the > main method to call is *org.assertj.core.api.Assertions.assertThat* which > conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized > by default by the compiler. So one can only use assertj if on every call uses > fully qualified name for the *assertThat* method, i.e. > > {code:java} > org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10662) Make LuceneTestCase not extending from org.junit.Assert
[ https://issues.apache.org/jira/browse/LUCENE-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571418#comment-17571418 ] Dawid Weiss commented on LUCENE-10662: -- Changing these methods will require a huge follow-up and cleanup in any other project that uses LuceneTestCase (and there are many). I don't think people will be happy with it (even though my heart is with you on assertj - I also prefer it to what's in hamcrest/junit). Even if people agree to change it, looking at the patch, I wouldn't rename any methods (assertEquals becomes assertEquality) - this will be even more confusing for downstream users. I'd remove the extend and assertEquals* methods from LuceneTestCase and move those methods into a separate class (like LuceneAssertions or something) - then the upgrade would be about importing them statically from junit's Assert or LuceneAssertions. Again, I'm not convinced this is a necessary improvement. I've lived with an explicit Assertions.* call from assertj - this is fine and explicit. And even used within Lucene code itself: [https://github.com/apache/lucene/blob/main/lucene/distribution.tests/src/test/org/apache/lucene/distribution/TestModularLayer.java#L117] > Make LuceneTestCase not extending from org.junit.Assert > --- > > Key: LUCENE-10662 > URL: https://issues.apache.org/jira/browse/LUCENE-10662 > Project: Lucene - Core > Issue Type: Test > Components: general/test >Reporter: Marios Trivyzas >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Since *LuceneTestCase* is a very useful abstract class that can be extended > and used by many projects, having it extending *org.junit.Assert* limits all > users to exclusively use the static methods of {*}org.junit.Assert{*}. In our > project we want to use [https://joel-costigliola.github.io/assertj] where the > main method to call is *org.assertj.core.api.Assertions.assertThat* which > conflicts with the deprecated {*}org.junit.Assert.assertThat{*}, recognized > by default by the compiler. So one can only use assertj if on every call uses > fully qualified name for the *assertThat* method, i.e. > > {code:java} > org.assertj.core.api.Assertions.assertThat(myObj.name()).isEqualTo(expectedName) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10643) Lucene Jenkins CI - s390x support
[ https://issues.apache.org/jira/browse/LUCENE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563795#comment-17563795 ] Dawid Weiss commented on LUCENE-10643: -- The timeout is caused by a hard limit in jenkins that should be configurable via system properties - [https://www.jenkins.io/doc/book/managing/system-properties/#hudson-filepath-validate_ant_file_mask_bound] we never got around to locating how this can be done though. > Lucene Jenkins CI - s390x support > -- > > Key: LUCENE-10643 > URL: https://issues.apache.org/jira/browse/LUCENE-10643 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nayana Thorat >Assignee: Uwe Schindler >Priority: Major > Labels: jenkins > > This issue adds Lucene builds on ASF Jenkins with S390x architecture (big > endian). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10631) Consolidate java version numbers in one place and reuse them across build parts
Title: Message Title Dawid Weiss created an issue Lucene - Core / LUCENE-10631 Consolidate java version numbers in one place and reuse them across build parts Issue Type: Sub-task Assignee: Unassigned Created: 29/Jun/22 12:43 Priority: Minor Reporter: Dawid Weiss [R. Muir/ mailing list discussions] Ideally we could consolidate a lot of them in a simple .properties file that contains the min/max major version numbers. could be then sucked in by: gradle logic java logic such as checks done in WrapperDownloader bash logic such as error messaging in ./gradlew.sh python smoketester logic? Add Comment
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira https://gitbox.apache.org/schemes.cgi?lucene-jira-archive Something seems wrong. According to https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features, the update should be approved via an e-mail sent to private mailing list - I don't see any such email yet. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9) If image attachments aren't displayed, see this article.
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss updated an issue Lucene - Core / LUCENE-10557 Migrate to GitHub issue from Jira Change By: Dawid Weiss Attachment: image-2022-06-29-13-36-57-365.png Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Done. Your repository has been created and will be available for use within a few minutes. Your project is available on gitbox at: https://gitbox.apache.org/repos/asf/lucene-jira-archive.git Your project is available on GitHub at: https://github.com/apache/lucene-jira-archive.git User permissions should be set up within the next five minutes. If not, please let us know at: us...@infra.apache.org Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I would only print the "(versions: 1)" if it's > 1. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I would leave those issue numbers as they were - like I said, these issue numbers are widely mentioned everywhere (mailing list archives, etc.) and I don't think they should be replaced. Spring redirects Jira URLs to their corresponding ported github issues - this is a much better resolution, I think. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira > Do we need a git repository at all? We won't need version control for the files. Is a file storage sufficient and easy to handle if we can have one? My hope was that these attachments could be stored in the primary git repository for convenience - keeping the historical artifacts together and having them served for free via github's infrastructure. It's also just convenient as it can be modified/ updated by multiple people (and those same people can freeze the repository for updates, once the migration is complete). Having those artifacts elsewhere (on home.apache.org) lacks some of these conveniences but it's fine too, of course. Also, I don't think infra will have any problem in adding a repository called "lucene-archives" or something like this. I can ask if we decide to push in this direction. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss edited a comment on LUCENE-10557 Re: Migrate to GitHub issue from Jira I toyed with attachments a bit. * I've modified Tomoko's code a bit so that it fetches attachments for each issue and places is under {{{}attachments/LUCENE-xyz/blob.ext{}}}. * I fetched about half of the attachments from Jira and they total ~350MB. So they're quite large but not unbearably large. * I created a separate test repository ( [ https://github.com/dweiss/lucene-jira-migration ] ), with a subset of attachment blobs and an example issue ( [ https://github.com/dweiss/lucene-jira-migration/issues/1 ] ) that links to them via gh-pages service URLs. Seems to work (mime types, etc.). * The test repository has an orphaned (separate root) branch for just the attachment blobs but they're still downloaded when you clone the master branch (which I kind of hoped could be avoided). This means that we'd have to either ask infra to create a separate repository for the ported attachments or keep those attachments in the main Lucene repository (and pay the price of an extra ~1GB of download size when doing a full clone). * I didn't check for multiple attachments with the same name (perhaps it's uncommon but definitely possible) - these would have to be saved under a subfolder or something, so that they can be distinguished. * A mapping of original attachment URLs and new attachment URLs could also be preserved/ written. * Since the attachments are a git repository, they should be searchable but for some reason it didn't work for me (maybe needs time to update the index). This is just an experiment, I don't mean to imply it has to be done (or should). I was just curious as to what's possible. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I toyed with attachments a bit. I've modified Tomoko's code a bit so that it fetches attachments for each issue and places is under attachments/LUCENE-xyz/blob.ext. I fetched about half of the attachments from Jira and they total ~350MB. So they're quite large but not unbearably large. I created a separate test repository (https://github.com/dweiss/lucene-jira-migration), with a subset of attachment blobs and an example issue (https://github.com/dweiss/lucene-jira-migration/issues/1) that links to them via gh-pages service URLs. Seems to work (mime types, etc.). The test repository has an orphaned (separate root) branch for just the attachment blobs but they're still downloaded when you clone the master branch (which I kind of hoped could be avoided). This means that we'd have to either ask infra to create a separate repository for the ported attachments or keep those attachments in the main Lucene repository (and pay the price of an extra ~1GB of download size when doing a full clone). I didn't check for multiple attachments with the same name (perhaps it's uncommon but definitely possible) - these would have to be saved under a subfolder or something, so that they can be distinguished. A mapping of original attachment URLs and new attachment URLs could also be preserved/ written. Since the attachments are a git repository, they should be searchable but for some reason it didn't work for me (maybe needs time to update the index). Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Resolved] (LUCENE-10607) NRTSuggesterBuilder扩展input时溢出
[ https://issues.apache.org/jira/browse/LUCENE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10607. -- Fix Version/s: 9.3 Resolution: Fixed > NRTSuggesterBuilder扩展input时溢出 > - > > Key: LUCENE-10607 > URL: https://issues.apache.org/jira/browse/LUCENE-10607 > Project: Lucene - Core > Issue Type: Bug > Components: core/FSTs >Affects Versions: 9.2 >Reporter: chaseny >Priority: Major > Fix For: 9.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > suggest模块在创建索引时,调用NRTSuggestBuilder的finishTerm来写入suggest索引。 > 会调用maxNumArcsForDedupByte函数来扩展analyzed,向后扩展3 5 7 255。 > 当entries长度过长(900)时,调用maxNumArcsForDedupByte扩展时 > > private static int maxNumArcsForDedupByte(int currentNumDedupBytes) { > int maxArcs = 1 + (2 * currentNumDedupBytes); > if (currentNumDedupBytes > 5) > { maxArcs *= currentNumDedupBytes; > //当currentNumDedupBytes大于等于32768时,int相乘会大于int最大值 } > return Math.min(maxArcs, 255); > } > > 另外在扩展时,是否可以选择固定4字节来有序扩展。代替 3 5 7 ... 255的扩展方式 > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556306#comment-17556306 ] Dawid Weiss commented on LUCENE-10557: -- I've verified that searches for old issue numbers seem to work: https://github.com/mocobeta/sandbox-lucene-10557/search?q=%22LUCENE-1%22+in%3Atitle&type=issues I'm more familiar with the "hierarchical" tags like "affects/xyz" or "type/bug" but I can live with the comma version. Good to have some of the metadata transferred as well, even as a plain text content in the issue description. > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * Choose issues that should be moved to GitHub > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Build the convention for issue label/milestone management > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10557: - Description: A few (not the majority) Apache projects already use the GitHub issue instead of Jira. For example, Airflow: [https://github.com/apache/airflow/issues] BookKeeper: [https://github.com/apache/bookkeeper/issues] So I think it'd be technically possible that we move to GitHub issue. I have little knowledge of how to proceed with it, I'd like to discuss whether we should migrate to it, and if so, how to smoothly handle the migration. The major tasks would be: * (/) Get a consensus about the migration among committers * Choose issues that should be moved to GitHub ** Discussion thread [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] ** -Conclusion for now: We don't migrate any issues. Only new issues should be opened on GitHub.- ** Write a prototype migration script - the decision could be made on that. Things to consider: *** version numbers - labels or milestones? *** add a comment/ prepend a link to the source Jira issue on github side, *** add a comment/ prepend a link on the jira side to the new issue on github side (for people who access jira from blogs, mailing list archives and other sources that will have stale links), *** convert cross-issue automatic links in comments/ descriptions (as suggested by Robert), *** strategy to deal with sub-issues (hierarchies), *** maybe prefix (or postfix) the issue title on github side with the original LUCENE-XYZ key so that it is easier to search for a particular issue there? *** how to deal with user IDs (author, reporter, commenters)? Do they have to be github users? Will information about people not registered on github be lost? *** create an extra mapping file of old-issue-new-issue URLs for any potential future uses. *** what to do with issue numbers in git/svn commits? These could be rewritten but it'd change the entire git history tree - I don't think this is practical, while doable. * Build the convention for issue label/milestone management ** Do some experiments on a sandbox repository [https://github.com/mocobeta/sandbox-lucene-10557] ** Make documentation for metadata (label/milestone) management * Enable Github issue on the lucene's repository ** Raise an issue on INFRA ** (Create an issue-only private repository for sensitive issues if it's needed and allowed) ** Set a mail hook to [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the general mail group name) * Set a schedule for migration ** Give some time to committers to play around with issues/labels/milestones before the actual migration ** Make an announcement on the mail lists ** Show some text messages when opening a new Jira issue (in issue template?) was: A few (not the majority) Apache projects already use the GitHub issue instead of Jira. For example, Airflow: [https://github.com/apache/airflow/issues] BookKeeper: [https://github.com/apache/bookkeeper/issues] So I think it'd be technically possible that we move to GitHub issue. I have little knowledge of how to proceed with it, I'd like to discuss whether we should migrate to it, and if so, how to smoothly handle the migration. The major tasks would be: * (/) Get a consensus about the migration among committers * Choose issues that should be moved to GitHub ** Discussion thread [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] ** -Conclusion for now: We don't migrate any issues. Only new issues should be opened on GitHub.- ** Write a prototype migration script - the decision could be made on that. Things to consider: *** version numbers - labels or milestones? *** add a comment/ prepend a link to the source Jira issue on github side, *** add a comment/ prepend a link on the jira side to the new issue on github side (for people who access jira from blogs, mailing list archives and other sources that will have stale links), *** convert cross-issue automatic links in comments/ descriptions (as suggested by Robert), *** maybe prefix (or postfix) the issue title on github side with the original LUCENE-XYZ key so that it is easier to search for a particular issue there? *** how to deal with user IDs (author, reporter, commenters)? Do they have to be github users? Will information about people not registered on github be lost? *** create an extra mapping file of old-issue-new-issue URLs for any potential future uses. *** what to do with issue numbers in git/svn commits? These could be rewritten but it'd change the entire git history tree - I don't think this is practical, while doable. * Build the convention for issue label/milestone management ** Do some experiments on a sandbox repository [https://github.com/mocobeta/sandbox-lucene-10557] ** Make documentation for metadata (label/milesto
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10557: - Description: A few (not the majority) Apache projects already use the GitHub issue instead of Jira. For example, Airflow: [https://github.com/apache/airflow/issues] BookKeeper: [https://github.com/apache/bookkeeper/issues] So I think it'd be technically possible that we move to GitHub issue. I have little knowledge of how to proceed with it, I'd like to discuss whether we should migrate to it, and if so, how to smoothly handle the migration. The major tasks would be: * (/) Get a consensus about the migration among committers * Choose issues that should be moved to GitHub ** Discussion thread [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] ** -Conclusion for now: We don't migrate any issues. Only new issues should be opened on GitHub.- ** Write a prototype migration script - the decision could be made on that. Things to consider: *** version numbers - labels or milestones? *** add a comment/ prepend a link to the source Jira issue on github side, *** add a comment/ prepend a link on the jira side to the new issue on github side (for people who access jira from blogs, mailing list archives and other sources that will have stale links), *** convert cross-issue automatic links in comments/ descriptions (as suggested by Robert), *** maybe prefix (or postfix) the issue title on github side with the original LUCENE-XYZ key so that it is easier to search for a particular issue there? *** how to deal with user IDs (author, reporter, commenters)? Do they have to be github users? Will information about people not registered on github be lost? *** create an extra mapping file of old-issue-new-issue URLs for any potential future uses. *** what to do with issue numbers in git/svn commits? These could be rewritten but it'd change the entire git history tree - I don't think this is practical, while doable. * Build the convention for issue label/milestone management ** Do some experiments on a sandbox repository [https://github.com/mocobeta/sandbox-lucene-10557] ** Make documentation for metadata (label/milestone) management * Enable Github issue on the lucene's repository ** Raise an issue on INFRA ** (Create an issue-only private repository for sensitive issues if it's needed and allowed) ** Set a mail hook to [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the general mail group name) * Set a schedule for migration ** Give some time to committers to play around with issues/labels/milestones before the actual migration ** Make an announcement on the mail lists ** Show some text messages when opening a new Jira issue (in issue template?) was: A few (not the majority) Apache projects already use the GitHub issue instead of Jira. For example, Airflow: [https://github.com/apache/airflow/issues] BookKeeper: [https://github.com/apache/bookkeeper/issues] So I think it'd be technically possible that we move to GitHub issue. I have little knowledge of how to proceed with it, I'd like to discuss whether we should migrate to it, and if so, how to smoothly handle the migration. The major tasks would be: * (/) Get a consensus about the migration among committers * Choose issues that should be moved to GitHub ** Discussion thread [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] ** -Conclusion for now: We don't migrate any issues. Only new issues should be opened on GitHub.- ** Write a prototype migration script - the decision could be made on that * Build the convention for issue label/milestone management ** Do some experiments on a sandbox repository [https://github.com/mocobeta/sandbox-lucene-10557] ** Make documentation for metadata (label/milestone) management * Enable Github issue on the lucene's repository ** Raise an issue on INFRA ** (Create an issue-only private repository for sensitive issues if it's needed and allowed) ** Set a mail hook to [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the general mail group name) * Set a schedule for migration ** Give some time to committers to play around with issues/labels/milestones before the actual migration ** Make an announcement on the mail lists ** Show some text messages when opening a new Jira issue (in issue template?) > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.co
[jira] [Commented] (LUCENE-10615) Add license information for SmartChineseAnalyzer to NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554039#comment-17554039 ] Dawid Weiss commented on LUCENE-10615: -- I think the reference you're looking for is here: https://github.com/apache/lucene/blob/main/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.java#L44-L45 although these web sites and their associated resources vanish over time. > Add license information for SmartChineseAnalyzer to NOTICE.txt > -- > > Key: LUCENE-10615 > URL: https://issues.apache.org/jira/browse/LUCENE-10615 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Jan Dornseifer >Priority: Trivial > > The Lucene NOTICE file contains the statement > The SmartChineseAnalyzer source code (smartcn) was > provided by Xiaoping Gao and copyright 2009 by > [www.imdict.net.|http://www.imdict.net./] > without providing license information. Can this information be supplemented > or is it even outdated? > We are using Apache Lucene v8.4.1. We are currently subject to a license > audit of our software, where also 3rd party FOSS components are checked for > usage. Among other things, this part came to our attention. I would be very > grateful for information. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10613) Clean up outdated NOTICE.txt information concerning morfologik
[ https://issues.apache.org/jira/browse/LUCENE-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10613. -- Assignee: Dawid Weiss Resolution: Fixed > Clean up outdated NOTICE.txt information concerning morfologik > -- > > Key: LUCENE-10613 > URL: https://issues.apache.org/jira/browse/LUCENE-10613 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 9.3 > > > It's been pointed out to me that NOTICE.txt contains information about > licensing terms that are outdated with regard to what Lucene uses nowadays. > It's a trivial update. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10613) Clean up outdated NOTICE.txt information concerning morfologik
[ https://issues.apache.org/jira/browse/LUCENE-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10613: - Fix Version/s: 9.3 > Clean up outdated NOTICE.txt information concerning morfologik > -- > > Key: LUCENE-10613 > URL: https://issues.apache.org/jira/browse/LUCENE-10613 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Priority: Trivial > Fix For: 9.3 > > > It's been pointed out to me that NOTICE.txt contains information about > licensing terms that are outdated with regard to what Lucene uses nowadays. > It's a trivial update. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10613) Clean up outdated NOTICE.txt information concerning morfologik
[ https://issues.apache.org/jira/browse/LUCENE-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10613: - Description: It's been pointed out to me that NOTICE.txt contains information about licensing terms that are outdated with regard to what Lucene uses nowadays. It's a trivial update. > Clean up outdated NOTICE.txt information concerning morfologik > -- > > Key: LUCENE-10613 > URL: https://issues.apache.org/jira/browse/LUCENE-10613 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Priority: Trivial > > It's been pointed out to me that NOTICE.txt contains information about > licensing terms that are outdated with regard to what Lucene uses nowadays. > It's a trivial update. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10613) Clean up outdated NOTICE.txt information concerning morfologik
Dawid Weiss created LUCENE-10613: Summary: Clean up outdated NOTICE.txt information concerning morfologik Key: LUCENE-10613 URL: https://issues.apache.org/jira/browse/LUCENE-10613 Project: Lucene - Core Issue Type: Improvement Reporter: Dawid Weiss -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10610) RunAutomaton#hashCode() can easily cause hash collision for different Automatons
[ https://issues.apache.org/jira/browse/LUCENE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553133#comment-17553133 ] Dawid Weiss commented on LUCENE-10610: -- > I do not think we need to discuss if equals/hashCode ensures that two > automatons are semantically equal (describe state machine with same behaviour) This is, in general, a hard problem. > RunAutomaton#hashCode() can easily cause hash collision for different > Automatons > > > Key: LUCENE-10610 > URL: https://issues.apache.org/jira/browse/LUCENE-10610 > Project: Lucene - Core > Issue Type: Bug >Reporter: Tomoko Uchida >Priority: Minor > > Current RunAutomaton#hashCode() is: > {code:java} > @Override > public int hashCode() { > final int prime = 31; > int result = 1; > result = prime * result + alphabetSize; > result = prime * result + points.length; > result = prime * result + size; > return result; > } > {code} > Since it does not take account of the contents of the {{points}} array, this > returns the same value for different automatons when their alphabet size and > state size are the same. > For example, this test code passes. > {code:java} > public void testHashCode() throws IOException { > PrefixQuery q1 = new PrefixQuery(new Term("field", "aba")); > PrefixQuery q2 = new PrefixQuery(new Term("field", "fee")); > assert q1.compiled.runAutomaton.hashCode() == > q2.compiled.runAutomaton.hashCode(); > } > {code} > I suspect this is a bug? > Note that I think it's not a serious one; all callers of this {{hashCode()}} > take account of additional information when calculating their own hash value, > it seems there is no substantial impact on higher-level APIs. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10607) NRTSuggesterBuilder扩展input
[ https://issues.apache.org/jira/browse/LUCENE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552125#comment-17552125 ] Dawid Weiss commented on LUCENE-10607: -- Could you provide a github pull request (or a patch), [~ChasenY]? > NRTSuggesterBuilder扩展input > -- > > Key: LUCENE-10607 > URL: https://issues.apache.org/jira/browse/LUCENE-10607 > Project: Lucene - Core > Issue Type: Bug > Components: core/FSTs >Affects Versions: 9.2 >Reporter: ChasenYang >Priority: Major > > suggest模块在创建索引时,调用NRTSuggestBuilder的finishTerm来写入suggest索引。 > 会调用maxNumArcsForDedupByte函数来扩展analyzed,向后扩展3 5 7 255。 > 当entries长度过长(900)时,调用maxNumArcsForDedupByte扩展时 > > private static int maxNumArcsForDedupByte(int currentNumDedupBytes) { > int maxArcs = 1 + (2 * currentNumDedupBytes); > if (currentNumDedupBytes > 5) { > maxArcs *= currentNumDedupBytes; > //当currentNumDedupBytes大于等于32768时,int相乘会大于int最大值 > } > return Math.min(maxArcs, 255); > } > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10607) NRTSuggesterBuilder扩展input
[ https://issues.apache.org/jira/browse/LUCENE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552124#comment-17552124 ] Dawid Weiss commented on LUCENE-10607: -- Thank you, ChasenYang (and Google translate...) I think the message is about integer overflow in the maxArcs computation. Since it's capped by 255, we should use a long or change the logic so that overflow doesn't occur. > NRTSuggesterBuilder扩展input > -- > > Key: LUCENE-10607 > URL: https://issues.apache.org/jira/browse/LUCENE-10607 > Project: Lucene - Core > Issue Type: Bug > Components: core/FSTs >Affects Versions: 9.2 >Reporter: ChasenYang >Priority: Major > > suggest模块在创建索引时,调用NRTSuggestBuilder的finishTerm来写入suggest索引。 > 会调用maxNumArcsForDedupByte函数来扩展analyzed,向后扩展3 5 7 255。 > 当entries长度过长(900)时,调用maxNumArcsForDedupByte扩展时 > > private static int maxNumArcsForDedupByte(int currentNumDedupBytes) { > int maxArcs = 1 + (2 * currentNumDedupBytes); > if (currentNumDedupBytes > 5) { > maxArcs *= currentNumDedupBytes; > //当currentNumDedupBytes大于等于32768时,int相乘会大于int最大值 > } > return Math.min(maxArcs, 255); > } > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543987#comment-17543987 ] Dawid Weiss commented on LUCENE-10557: -- I don't think this is a problem. You just create a description with a bullet list and reference related issues - they do show up in mentions, I think this is sufficient. > Migrate to GitHub issue from Jira? > -- > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * Get a consensus about the migration among committers > * Enable Github issue on the lucene's repository (currently, it is disabled > on it) > * Build the convention or rules for issue label/milestone management > * Choose issues that should be moved to GitHub (I think too old or obsolete > issues can remain Jira.) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks
[ https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542778#comment-17542778 ] Dawid Weiss commented on LUCENE-10510: -- This is caused by google formatter accessing JVM internals. The first tidy failure tries to actually explain why it's failed - this is the message you were getting: {code} * What went wrong: Execution failed for task ':checkJdkInternalsExportedToGradle'. > Certain gradle tasks and plugins require access to jdk.compiler internals, > your gradle.properties might have just been generated or could be out of sync > (see help/localSettings.txt) {code} I'm not sure what can be improved here but feel free to suggest something to your liking! > Check module access prior to running gjf/spotless/errorprone tasks > -- > > Key: LUCENE-10510 > URL: https://issues.apache.org/jira/browse/LUCENE-10510 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 9.2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > PR at: [https://github.com/apache/lucene/pull/802] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks
[ https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542668#comment-17542668 ] Dawid Weiss commented on LUCENE-10510: -- Delete your gradle.properties and allow it to regenerate from scratch. This is explained in localSettings.txt: {code} The first invocation of any task in Lucene's gradle build will generate and save a project-local 'gradle.properties' file. This file contains the defaults you may (but don't have to) tweak for your particular hardware (or taste). Note there are certain settings in that file that may be _required_ at runtime for certain plugins (an example is the spotless/ google java format plugin, which requires adding custom exports to JVM modules). Gradle build only generates this file if it's not already present (it never overwrites the defaults) -- occasionally you may have to manually delete (or move) this file and regenerate from scratch. {code} > Check module access prior to running gjf/spotless/errorprone tasks > -- > > Key: LUCENE-10510 > URL: https://issues.apache.org/jira/browse/LUCENE-10510 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 9.2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > PR at: [https://github.com/apache/lucene/pull/802] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe
[ https://issues.apache.org/jira/browse/LUCENE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541100#comment-17541100 ] Dawid Weiss commented on LUCENE-10590: -- Love the title, [~sokolov]. Very Douglas-y Adams-y. > Indexing all zero vectors leads to heat death of the universe > - > > Key: LUCENE-10590 > URL: https://issues.apache.org/jira/browse/LUCENE-10590 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael Sokolov >Priority: Major > > By accident while testing something else, I ran a luceneutil test indexing 1M > 100d vectors where all the vectors were all zeroes. This caused indexing to > take a very long time (~40x normal - it did eventually complete) and the > search performance was similarly bad. We should not degrade by orders of > magnitude with even the worst data though. > I'm not entirely sure what the issue is, but perhaps as long as we keep > finding hits that are "better" we keep exploring the graph, where better > means (score, -docid) >= (lowest score, -docid). If that's right and all docs > have the same score, then we probably need to either switch to > (but this > could lead to poorer recall in normal cases) or introduce some kind of > minimum score threshold? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10589) Fix corner case in TestKnnVectorQuery.testRandomWithFilter
[ https://issues.apache.org/jira/browse/LUCENE-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540935#comment-17540935 ] Dawid Weiss commented on LUCENE-10589: -- I don't know anything about this code area but thank you for following up on jenkins failures, [~tomoko]! > Fix corner case in TestKnnVectorQuery.testRandomWithFilter > -- > > Key: LUCENE-10589 > URL: https://issues.apache.org/jira/browse/LUCENE-10589 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > {{TestKnnVectorQuery.testRandomWithFilter}} can fail with > java.lang.UnsupportedOperationException. > Reproducible command > {code:java} > ./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter > -Dtests.seed=1DA39B92702DAC45 -Dtests.multiplier=3 > {code} > {code:java} > org.apache.lucene.search.TestKnnVectorQuery > testRandomWithFilter FAILED > java.lang.UnsupportedOperationException: exact search is not supported > at > __randomizedtesting.SeedInfo.seed([1DA39B92702DAC45:6BEAC2197AD96AE0]:0) > at > org.apache.lucene.search.TestKnnVectorQuery$ThrowingKnnVectorQuery.exactSearch(TestKnnVectorQuery.java:715) > at > org.apache.lucene.search.KnnVectorQuery.searchLeaf(KnnVectorQuery.java:151) > at > org.apache.lucene.search.KnnVectorQuery.rewrite(KnnVectorQuery.java:108) > at > org.apache.lucene.search.ConstantScoreQuery.rewrite(ConstantScoreQuery.java:44) > at > org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:789) > at > org.apache.lucene.tests.search.AssertingIndexSearcher.rewrite(AssertingIndexSearcher.java:69) > at > org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:803) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:685) > at > org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:667) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:584) > at > org.apache.lucene.search.TestKnnVectorQuery.testRandomWithFilter(TestKnnVectorQuery.java:556) > {code} > In some edge cases (depending on the random seed), > [KnnVectorQuery.java#147|https://github.com/apache/lucene/blob/fe9d26178d033f585c08a5e86708063ac0ec0c9e/lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java#L147] > becomes false, and then `exactSearch()` is called. > The upper bound of [the test range query > (filter)|https://github.com/apache/lucene/blob/fe9d26178d033f585c08a5e86708063ac0ec0c9e/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L554] > could be 200 (the max value of "tag" field + 1) instead of lower + 150 to > make it "unrestrictive"? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10587) Rename "master seed" to "root seed" or "main seed" or so?
[ https://issues.apache.org/jira/browse/LUCENE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540614#comment-17540614 ] Dawid Weiss commented on LUCENE-10587: -- I think this message is still present in the ant task in randomized testing, actually. This particular word has no negative historical or emotional connotation to me but when I get to the code there, I'll modify it - costs me nothing and maybe it'll make somebody happier. > Rename "master seed" to "root seed" or "main seed" or so? > - > > Key: LUCENE-10587 > URL: https://issues.apache.org/jira/browse/LUCENE-10587 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > > I noticed that Lucene's test infrastructure (or perhaps it's in > R{{{}andomizedTesting{}}} dependency?) still says things like this: > {noformat} > > [junit4:junit4] says Привет! Master seed: 3296009A5B3B7A05 > > {noformat} > Let's rename away from the term {{{}master{}}}? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10370) Fix classpath/module path of tests forking their own Java (TestNRTReplication)
[ https://issues.apache.org/jira/browse/LUCENE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10370. -- Resolution: Fixed > Fix classpath/module path of tests forking their own Java (TestNRTReplication) > -- > > Key: LUCENE-10370 > URL: https://issues.apache.org/jira/browse/LUCENE-10370 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 9.3 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > TestNRTReplication fails because it assumes classpath can just be copied to a > sub-process - this is no longer the case. > PR at: > https://github.com/apache/lucene/pull/909 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10370) Fix classpath/module path of tests forking their own Java (TestNRTReplication)
[ https://issues.apache.org/jira/browse/LUCENE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10370: - Fix Version/s: 9.3 > Fix classpath/module path of tests forking their own Java (TestNRTReplication) > -- > > Key: LUCENE-10370 > URL: https://issues.apache.org/jira/browse/LUCENE-10370 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 9.3 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > TestNRTReplication fails because it assumes classpath can just be copied to a > sub-process - this is no longer the case. > PR at: > https://github.com/apache/lucene/pull/909 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10370) Fix classpath/module path of tests forking their own Java (TestNRTReplication)
[ https://issues.apache.org/jira/browse/LUCENE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10370: - Description: TestNRTReplication fails because it assumes classpath can just be copied to a sub-process - this is no longer the case. PR at: https://github.com/apache/lucene/pull/909 was:TestNRTReplication fails because it assumes classpath can just be copied to a sub-process - this is no longer the case. > Fix classpath/module path of tests forking their own Java (TestNRTReplication) > -- > > Key: LUCENE-10370 > URL: https://issues.apache.org/jira/browse/LUCENE-10370 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > TestNRTReplication fails because it assumes classpath can just be copied to a > sub-process - this is no longer the case. > PR at: > https://github.com/apache/lucene/pull/909 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10370) Fix classpath/module path of tests forking their own Java (TestNRTReplication)
[ https://issues.apache.org/jira/browse/LUCENE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-10370: Assignee: Dawid Weiss > Fix classpath/module path of tests forking their own Java (TestNRTReplication) > -- > > Key: LUCENE-10370 > URL: https://issues.apache.org/jira/browse/LUCENE-10370 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > TestNRTReplication fails because it assumes classpath can just be copied to a > sub-process - this is no longer the case. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9634) Highlighting of degenerate spans on fields *with offsets* doesn't work properly
[ https://issues.apache.org/jira/browse/LUCENE-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-9634. - Resolution: Fixed > Highlighting of degenerate spans on fields *with offsets* doesn't work > properly > --- > > Key: LUCENE-9634 > URL: https://issues.apache.org/jira/browse/LUCENE-9634 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Match highlighter works fine with degenerate interval positions when > {{OffsetsFromPositions}} strategy is used to compute offsets but will show > incorrect offset ranges if offsets are read from directly from the > {{MatchIterator}} ({{OffsetsFromMatchIterator}}). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10574) Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't do this
[ https://issues.apache.org/jira/browse/LUCENE-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538835#comment-17538835 ] Dawid Weiss commented on LUCENE-10574: -- I like [~jpountz]'s solution... even if it's not perfect! Merge strategies would indeed benefit from some algorithmic love - the problem in my experience is that no single strategy fits all types of loads. In reality the merge strategy, the merge scheduler and the balance between searches and indexing all play a key role and finding the best performing solution is a combination of all these factors. > Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't > do this > --- > > Key: LUCENE-10574 > URL: https://issues.apache.org/jira/browse/LUCENE-10574 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Remove {{floorSegmentBytes}} parameter, or change lucene's default to a merge > policy that doesn't merge in an O(n^2) way. > I have the feeling it might have to be the latter, as folks seem really wed > to this crazy O(n^2) behavior. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537410#comment-17537410 ] Dawid Weiss commented on LUCENE-10572: -- > Nevertheless, the main limiting factor of the BytesRefHash is the equals > (although vectorized) because it always needs to be verified Right. This strikes the nostalgic note of the strlen performance between pascal and C, doesn't it?... :) This is such a hot code section that indeed storing the length along the string itself may be worth it. I still use that "offset difference" strategy in non-Lucene code where it performs quite well but it's really a matter of trying and I bet the results will vary depending on the context (terms, caches, etc.). > we can lookup offset of next entry - offset of entry to be looked up. The > only special case is the very last item. This can be solved elegantly and efficiently - the offsets array stores the end+1 of each element, with the initial 0-offset index initially set to zero. So, the length of entry i is a constant expression (offsets[i + 1] - offsets[i]) and this invariant is maintained upon additions of new elements like so: bytePool.add(ref.bytes, ref.offset, ref.length); offsets.add(bytePool.size()); This invariant makes all the remaining functions simpler too, for example element-comparing method is something like this (code copy-pasted from ours, but you'll get the gist): {code} public int compare(int elementA, int elementB) { assert elementA >= 0 && elementA < size() && elementB >= 0 && elementB < size(); int off1 = offsets.get(elementA); int len1 = offsets.get(elementA + 1) - off1; int off2 = offsets.get(elementB); int len2 = offsets.get(elementB + 1) - off2; return Bytes.compare(blocks.buffer, off1, len1, blocks.buffer, off2, len2); } {code} The caveat here is that the offsets array is an int[] so the storage size required for the hashes is slightly higher. Overall this was never a problem in practice though. > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K that an unsigned short can cover) > * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} > (BitUtil.VH_BE_SHORT.get) > * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not > aggressive enough? Or the initial sizing of the hash is too small? > * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too > many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible > "upgrades"? > * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version > ({{{}murmurhash3_x86_32{}}})? > * Are we using the JVM's intrinsics to compare multiple bytes in a single > SIMD instruction ([~rcmuir] is quite sure we are indeed)? > * [~jpountz] suggested maybe the hash insert is simply memory bound > * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total > CPU cost) > I pulled these observations from a recent (5/6/22) profiler output: > [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html] > Maybe we can improve our performance on this craz
[jira] [Resolved] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10541. -- Fix Version/s: 9.2 Resolution: Fixed > What to do about massive terms in our Wikipedia EN LineFileDocs? > > > Key: LUCENE-10541 > URL: https://issues.apache.org/jira/browse/LUCENE-10541 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Fix For: 9.2 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Spinoff from this fun build failure that [~dweiss] root caused: > [https://lucene.markmail.org/thread/pculfuazll4oebra] > Thank you and sorry [~dweiss]!! > This test failure happened because the test case randomly indexed a chunk of > the nightly (many GBs) LineFileDocs Wikipedia file that had a massive (> IW's > ~32 KB limit) term, and IW threw an {{IllegalArgumentException}} failing the > test. > It's crazy that it took so long for Lucene's randomized tests to discover > this too-massive term in Lucene's nightly benchmarks. It's like searching > for Nessie, or > [SETI|https://en.wikipedia.org/wiki/Search_for_extraterrestrial_intelligence]. > We need to prevent such false failures, somehow, and there are multiple > options: fix this test to not use {{{}LineFileDocs{}}}, remove all "massive" > terms from all tests (nightly and git) {{{}LineFileDocs{}}}, fix > {{MockTokenizer}} to trim such ridiculous terms (I think this is the best > option?), ... -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537354#comment-17537354 ] Dawid Weiss commented on LUCENE-10572: -- This is the issue I filed it under, actually - note it's old, old... but the ideas may be worth revisiting. https://issues.apache.org/jira/browse/LUCENE-5854 > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K that an unsigned short can cover) > * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} > (BitUtil.VH_BE_SHORT.get) > * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not > aggressive enough? Or the initial sizing of the hash is too small? > * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too > many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible > "upgrades"? > * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version > ({{{}murmurhash3_x86_32{}}})? > * Are we using the JVM's intrinsics to compare multiple bytes in a single > SIMD instruction ([~rcmuir] is quite sure we are indeed)? > * [~jpountz] suggested maybe the hash insert is simply memory bound > * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total > CPU cost) > I pulled these observations from a recent (5/6/22) profiler output: > [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html] > Maybe we can improve our performance on this crazy hotspot? > Or maybe this is a "healthy" hotspot and we should leave it be! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537353#comment-17537353 ] Dawid Weiss commented on LUCENE-10572: -- As much as I love BE (long live M68k), I think it's practically dead, so I think LE is a fine choice. > Ever tried to type a single word that is 128 chars long? One thing I'd be afraid of is that users index all sorts of non-language tokens and these can grow longer than the default of 128 chars. I have implemented a similar byte-fragment storage class in the past without using explicit length fragments at all - the difference in consecutive element offsets was used to compute the length. This does have potential drawbacks but it was fast in practice. I can dig it out from the closet if you like. > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K that an unsigned short can cover) > * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} > (BitUtil.VH_BE_SHORT.get) > * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not > aggressive enough? Or the initial sizing of the hash is too small? > * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too > many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible > "upgrades"? > * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version > ({{{}murmurhash3_x86_32{}}})? > * Are we using the JVM's intrinsics to compare multiple bytes in a single > SIMD instruction ([~rcmuir] is quite sure we are indeed)? > * [~jpountz] suggested maybe the hash insert is simply memory bound > * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total > CPU cost) > I pulled these observations from a recent (5/6/22) profiler output: > [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html] > Maybe we can improve our performance on this crazy hotspot? > Or maybe this is a "healthy" hotspot and we should leave it be! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10539) Return a stream of completions from FSTCompletion
[ https://issues.apache.org/jira/browse/LUCENE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10539. -- Resolution: Fixed > Return a stream of completions from FSTCompletion > - > > Key: LUCENE-10539 > URL: https://issues.apache.org/jira/browse/LUCENE-10539 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 9.2 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > FSTLookup currently has a "num" parameter which limits the number of > completions from the underlying automaton. But this has severe disadvantages > if you need to collect completions that need to fulfill a secondary condition > (for example, collect only verbs or terms that contain a certain infix). Then > you can't determine the 'num' parameter easily because the number of filtered > completions is unknown. > I also think implementation-wise it's also much nicer to provide a stream > that iterates over completions rather than a fixed-size list. This allows for > much more elegant code (stream.filter, stream.limit). > The provided patch adds a single {{Stream lookup(key)}} method > and modifies the existing lookup methods to use it. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10539) Return a stream of completions from FSTCompletion
[ https://issues.apache.org/jira/browse/LUCENE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530211#comment-17530211 ] Dawid Weiss commented on LUCENE-10539: -- I applied to branch_9x and main. > Return a stream of completions from FSTCompletion > - > > Key: LUCENE-10539 > URL: https://issues.apache.org/jira/browse/LUCENE-10539 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 9.2 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > FSTLookup currently has a "num" parameter which limits the number of > completions from the underlying automaton. But this has severe disadvantages > if you need to collect completions that need to fulfill a secondary condition > (for example, collect only verbs or terms that contain a certain infix). Then > you can't determine the 'num' parameter easily because the number of filtered > completions is unknown. > I also think implementation-wise it's also much nicer to provide a stream > that iterates over completions rather than a fixed-size list. This allows for > much more elegant code (stream.filter, stream.limit). > The provided patch adds a single {{Stream lookup(key)}} method > and modifies the existing lookup methods to use it. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10548) Weird errors launching gradlew (Caused by: groovy.lang.MissingMethodException: No signature of method: java.lang.Object.clone() is applicable for argument types: () v
[ https://issues.apache.org/jira/browse/LUCENE-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530204#comment-17530204 ] Dawid Weiss commented on LUCENE-10548: -- A PR is here (and on the source branch in my repo). https://github.com/apache/lucene/pull/857 > Weird errors launching gradlew (Caused by: > groovy.lang.MissingMethodException: No signature of method: > java.lang.Object.clone() is applicable for argument types: () values: []) > > > Key: LUCENE-10548 > URL: https://issues.apache.org/jira/browse/LUCENE-10548 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > https://bugs.openjdk.java.net/browse/JDK-8285835 > I can't reproduce it anywhere, with the same JDK Tobias is using. Seems like > clone() is the cause - let's see if we can just get rid of that code and if > it helps. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10549) Upgrade to gradle 7.3.3
[ https://issues.apache.org/jira/browse/LUCENE-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10549: - Fix Version/s: 10.0 (main) > Upgrade to gradle 7.3.3 > --- > > Key: LUCENE-10549 > URL: https://issues.apache.org/jira/browse/LUCENE-10549 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Priority: Major > Fix For: 10.0 (main), 9.2 > > Time Spent: 10m > Remaining Estimate: 0h > > There are newer gradle versions but this is a low-hanging fruit that has > official support for Java 17. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10549) Upgrade to gradle 7.3.3
[ https://issues.apache.org/jira/browse/LUCENE-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10549: - Fix Version/s: 9.1.1 > Upgrade to gradle 7.3.3 > --- > > Key: LUCENE-10549 > URL: https://issues.apache.org/jira/browse/LUCENE-10549 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Priority: Major > Fix For: 10.0 (main), 9.2, 9.1.1 > > Time Spent: 10m > Remaining Estimate: 0h > > There are newer gradle versions but this is a low-hanging fruit that has > official support for Java 17. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10549) Upgrade to gradle 7.3.3
[ https://issues.apache.org/jira/browse/LUCENE-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10549. -- Fix Version/s: 9.2 Resolution: Fixed > Upgrade to gradle 7.3.3 > --- > > Key: LUCENE-10549 > URL: https://issues.apache.org/jira/browse/LUCENE-10549 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Priority: Major > Fix For: 9.2 > > Time Spent: 10m > Remaining Estimate: 0h > > There are newer gradle versions but this is a low-hanging fruit that has > official support for Java 17. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10549) Upgrade to gradle 7.3.3
Dawid Weiss created LUCENE-10549: Summary: Upgrade to gradle 7.3.3 Key: LUCENE-10549 URL: https://issues.apache.org/jira/browse/LUCENE-10549 Project: Lucene - Core Issue Type: Improvement Reporter: Dawid Weiss There are newer gradle versions but this is a low-hanging fruit that has official support for Java 17. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10548) Weird errors launching gradlew (Caused by: groovy.lang.MissingMethodException: No signature of method: java.lang.Object.clone() is applicable for argument types: () val
Dawid Weiss created LUCENE-10548: Summary: Weird errors launching gradlew (Caused by: groovy.lang.MissingMethodException: No signature of method: java.lang.Object.clone() is applicable for argument types: () values: []) Key: LUCENE-10548 URL: https://issues.apache.org/jira/browse/LUCENE-10548 Project: Lucene - Core Issue Type: Bug Reporter: Dawid Weiss https://bugs.openjdk.java.net/browse/JDK-8285835 I can't reproduce it anywhere, with the same JDK Tobias is using. Seems like clone() is the cause - let's see if we can just get rid of that code and if it helps. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529845#comment-17529845 ] Dawid Weiss commented on LUCENE-10541: -- I've applied the PR - we can close this issue (for now)? > What to do about massive terms in our Wikipedia EN LineFileDocs? > > > Key: LUCENE-10541 > URL: https://issues.apache.org/jira/browse/LUCENE-10541 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Spinoff from this fun build failure that [~dweiss] root caused: > [https://lucene.markmail.org/thread/pculfuazll4oebra] > Thank you and sorry [~dweiss]!! > This test failure happened because the test case randomly indexed a chunk of > the nightly (many GBs) LineFileDocs Wikipedia file that had a massive (> IW's > ~32 KB limit) term, and IW threw an {{IllegalArgumentException}} failing the > test. > It's crazy that it took so long for Lucene's randomized tests to discover > this too-massive term in Lucene's nightly benchmarks. It's like searching > for Nessie, or > [SETI|https://en.wikipedia.org/wiki/Search_for_extraterrestrial_intelligence]. > We need to prevent such false failures, somehow, and there are multiple > options: fix this test to not use {{{}LineFileDocs{}}}, remove all "massive" > terms from all tests (nightly and git) {{{}LineFileDocs{}}}, fix > {{MockTokenizer}} to trim such ridiculous terms (I think this is the best > option?), ... -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
[ https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529831#comment-17529831 ] Dawid Weiss commented on LUCENE-10292: -- > All I was really trying to do with these tests was demonstrate that data you > get out of the Lookup before you call build(), can still be gotten from the > Lookup while build() is incrementally consuming an iterator (which may take a > long time if you are building up from a long iterator) and that this behavior > is consistent across Lookup impls (as opposed to before i filed this issue, > when most Lookups worked that way, but AnalyzingInfixSuggester would throw an > ugly exception – which was certainly confusing to users who might switch from > one impl to another). I guess I am not comfortable with the fact that this test works only by a lucky coincidence and tests the behavior that isn't guaranteed or documented by the Lookup class - this got me confused and I guess it'll confuse people looking at this code after me. It's not a personal stab at you, it's just something that smells fishy around this code in general. When I was looking at the failure and tried to debug the test, I didn't see the reason why this test was necessary (I looked at the Lookup class documentation). When I understood what the test did, I looked at the implementations and they seemed to be designed with a single-thread model in mind (external synchronization between lookups and rebuilds). For example, even now, if you had a tight loop in one thread calling lookup on an FSTCompletionLookup and this loop got compiled, then there's nothing preventing the compiler from reading higherWeightsCompletion and normalCompletion fields once and never again (they're regular fields in FSTCompletionLookup), even if you call build there multiple times in between... Is this likely to happen? I don't know. Is this possible? Sure. Maybe I'm oversensitive because I grew up on machines with much less strict cache coherency protocols but code like this makes me itchy. > I didn't set out to make any hard & fast guarantee about the thread safety of > all lookups – just improve the one that awas obviously inconsistent with the > others (progress, not perfection) That's my point. Either we should make the Lookup interface explicitly state that it's safe to call the build method from another thread or we shouldn't really guarantee (or test) this behavior. I don't want you to revert the changes you made but my gut feeling is that lookup implementations should be designed to be single-threaded or at least immutable (one publisher-multiple readers model) as it makes implementing them much easier - no volatiles, synchronization blocks, etc. Concurrency concerns should be handled by the code that uses Lookups - this code should know whether synchronization or two concurrent instances are required (one doing the lookups, potentially via multiple threads, one rebuilding). Perhaps a change in the API is needed to separate those two phases (build-use) and then the downstream code has to take care of handling/ swapping out Lookup reference where they're used - I don't know, I just state what I think. > AnalyzingInfixSuggester thread safety: lookup() fails during (re)build() > > > Key: LUCENE-10292 > URL: https://issues.apache.org/jira/browse/LUCENE-10292 > Project: Lucene - Core > Issue Type: Bug >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Fix For: 10.0 (main), 9.2 > > Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, > LUCENE-10292-3.patch, LUCENE-10292.patch > > > I'm filing this based on anecdotal information from a Solr user w/o > experiencing it first hand (and I don't have a test case to demonstrate it) > but based on a reading of the code the underlying problem seems self > evident... > With all other Lookup implementations I've examined, it is possible to call > {{lookup()}} regardless of whether another thread is concurrently calling > {{build()}} – in all cases I've seen, it is even possible to call > {{lookup()}} even if {{build()}} has never been called: the result is just an > "empty" {{List}} > Typically this is works because the {{build()}} method uses temporary > datastructures until it's "build logic" is complete, at which point it > atomically replaces the datastructures used by the {{lookup()}} method. In > the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method > starts by closing & null'ing out the {{protected SearcherManager > searcherMgr}} (which it only populates again once it's completed building up > it's index) and then the lookup method starts with... > {code:java} > if (searcherMgr
[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529261#comment-17529261 ] Dawid Weiss commented on LUCENE-10541: -- Filed a PR at https://github.com/apache/lucene/pull/850. Picked the default from CharTokenizer.DEFAULT_MAX_WORD_LEN, although can't reference that directly (not accessible from the test framework). Had to tweak the defaults in one or two failing tests that expected the tokenizer to return longer tokens, so a second set of eyes would be good. enwiki lines contains 2 million lines. It'd be nice to calculate the probability of any of the k faulty (long-term) lines being drawn in n tries and distribute it over time - this would address Mike's question about why it took so long to discover them. :) > What to do about massive terms in our Wikipedia EN LineFileDocs? > > > Key: LUCENE-10541 > URL: https://issues.apache.org/jira/browse/LUCENE-10541 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > > Spinoff from this fun build failure that [~dweiss] root caused: > [https://lucene.markmail.org/thread/pculfuazll4oebra] > Thank you and sorry [~dweiss]!! > This test failure happened because the test case randomly indexed a chunk of > the nightly (many GBs) LineFileDocs Wikipedia file that had a massive (> IW's > ~32 KB limit) term, and IW threw an {{IllegalArgumentException}} failing the > test. > It's crazy that it took so long for Lucene's randomized tests to discover > this too-massive term in Lucene's nightly benchmarks. It's like searching > for Nessie, or > [SETI|https://en.wikipedia.org/wiki/Search_for_extraterrestrial_intelligence]. > We need to prevent such false failures, somehow, and there are multiple > options: fix this test to not use {{{}LineFileDocs{}}}, remove all "massive" > terms from all tests (nightly and git) {{{}LineFileDocs{}}}, fix > {{MockTokenizer}} to trim such ridiculous terms (I think this is the best > option?), ... -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10543) Achieve contribution workflow perfection (with progress)
[ https://issues.apache.org/jira/browse/LUCENE-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529255#comment-17529255 ] Dawid Weiss commented on LUCENE-10543: -- ("with progress"... yeah, that's why LUCENE-9871 is still open :) ) > Achieve contribution workflow perfection (with progress) > > > Key: LUCENE-10543 > URL: https://issues.apache.org/jira/browse/LUCENE-10543 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > > Inspired by Dawid's build issue which has worked out for us: LUCENE-9871 > He hasn't even linked 10% of the issues/subtasks involved in that work > either, but we know. > I think we need a similar approach for the contribution workflow. There has > been some major improvements recently, a couple that come to mind: > * Tomoko made a CONTRIBUTING.md file which github recognizes and is way > better than the wiki stuff > * Some hazards/error messages/mazes in the build process and so on have > gotten fixed. > But there is more to do in my opinion, here is 3 ideas: > * Creating a PR still has a massive checklist template. But now this template > links to CONTRIBUTING.md, so why include the other stuff/checklist? Isn't it > enough to just link to CONTRIBUTING.md and fix that as needed? > * Creating a PR still requires signing up for Apache JIRA and creating a JIRA > issue. There is zero value to this additional process. We often end out with > either JIRAs and/or PRs that have zero content, or maybe conflicting/outdated > content. This is just an unnecessary dance, can we use github issues instead? > * Haven't dug into the github actions or configs very deeply. Maybe there's > simple stuff we can do such as give useful notifications if checks fail. Try > to guide the user to run ./gradlew check and fix it. It sucks to have to > review, look at logs, and manually add comments to do this stuff. > So let's have an issue to improve this area. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
[ https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529223#comment-17529223 ] Dawid Weiss commented on LUCENE-10292: -- Thanks Chris. I'm still not sure whether these tests make sense without explicitly stating that build() can be called on Lookup to dynamically (and concurrently) replace its internals... For example, FSTCompletionLookup: {code} // The two FSTCompletions share the same automaton. this.higherWeightsCompletion = builder.build(); this.normalCompletion = new FSTCompletion(higherWeightsCompletion.getFST(), false, exactMatchFirst); this.count = newCount; {code} none of these fields are volatile or under a monitor, so no guaranteed flush occurs anywhere. I understand eventually they'll get consistent by piggybacking on some other synchronization/ memfence but it's weird to rely on this behavior. I think it'd be a much more user-friendly API if Lookup was actually detached entirely from its build process (for example by replacing the current build method with a builder() that would return a new immutable Lookup instance). This would be less confusing and would also allow for a cleaner implementation (no synchronization at all required - just regular assignments, maybe even with final fields). I'm not saying this should be implemented here - perhaps it's worth a new issue to do this refactoring. Separately from the above, if the test fails, it'll leak threads - this: + acquireOnNext.acquireUninterruptibly(); literally blocks forever. It should be replaced with a try/catch that rethrows an unchecked exception when the iterator thread is interrupted. > AnalyzingInfixSuggester thread safety: lookup() fails during (re)build() > > > Key: LUCENE-10292 > URL: https://issues.apache.org/jira/browse/LUCENE-10292 > Project: Lucene - Core > Issue Type: Bug >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Fix For: 10.0 (main), 9.2 > > Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, > LUCENE-10292-3.patch, LUCENE-10292.patch > > > I'm filing this based on anecdotal information from a Solr user w/o > experiencing it first hand (and I don't have a test case to demonstrate it) > but based on a reading of the code the underlying problem seems self > evident... > With all other Lookup implementations I've examined, it is possible to call > {{lookup()}} regardless of whether another thread is concurrently calling > {{build()}} – in all cases I've seen, it is even possible to call > {{lookup()}} even if {{build()}} has never been called: the result is just an > "empty" {{List}} > Typically this is works because the {{build()}} method uses temporary > datastructures until it's "build logic" is complete, at which point it > atomically replaces the datastructures used by the {{lookup()}} method. In > the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method > starts by closing & null'ing out the {{protected SearcherManager > searcherMgr}} (which it only populates again once it's completed building up > it's index) and then the lookup method starts with... > {code:java} > if (searcherMgr == null) { > throw new IllegalStateException("suggester was not built"); > } > {code} > ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any > situation where another thread may be calling > {{AnalyzingInfixSuggester.build()}} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10531) Mark testLukeCanBeLaunched @Nightly test and make a dedicated Github CI workflow for it
[ https://issues.apache.org/jira/browse/LUCENE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529218#comment-17529218 ] Dawid Weiss commented on LUCENE-10531: -- Fine with me. > Mark testLukeCanBeLaunched @Nightly test and make a dedicated Github CI > workflow for it > --- > > Key: LUCENE-10531 > URL: https://issues.apache.org/jira/browse/LUCENE-10531 > Project: Lucene - Core > Issue Type: Task > Components: general/test >Reporter: Tomoko Uchida >Priority: Minor > > We are going to allow running the test on Xvfb (a virtual display that speaks > X protocol) in [LUCENE-10528], this tweak is available only on Linux. > I'm just guessing but it could confuse or bother also Mac and Windows users > (we can't know what window manager developers are using); it may be better to > make it opt-in by marking it as slow tests. > Instead, I think we can enable a dedicated Github actions workflow for the > distribution test that is triggered only when the related files are changed. > Besides Linux, we could run it both on Mac and Windows which most users run > the app on - it'd be slow, but if we limit the scope of the test I suppose it > works functionally just fine (I'm running actions workflows on mac and > windows elsewhere). > To make it "slow test", we could add the same {{@Slow}} annotation as the > {{test-framework}} to the distribution tests, for consistency. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
[ https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529018#comment-17529018 ] Dawid Weiss commented on LUCENE-10292: -- I don't see any evidence in implementations of Lookup that build() can be called in a thread-safe manner. Those testLookupsDuringReBuild are only working by a lucky chance (and rarely still fail!). The code typically releases semaphore permissions quickly here: {code} // at every stage of the slow rebuild, we should still be able to get our original suggestions for (int i = 0; i < data.size(); i++) { initialChecks.check(suggester); rebuildGate.release(); } {code} while the build() method is not even invoked yet because this line: {code} suggester.build( new InputArrayIterator(new DelayedIterator<>(suggester, rebuildGate, data.iterator(; {code} is semaphore-blocked in the constructor parameters (InputArrayIterator). So the result is that for suggester.build() is typically entered a long time after the check look has finished. It is enough to modify the code to: {code} // at every stage of the slow rebuild, we should still be able to get our original suggestions for (int i = 0; i < data.size(); i++) { rebuildGate.release(); Thread.sleep(100); initialChecks.check(suggester); } {code} to cause repeatable failures (this isn't a suggested fix but a demonstration that the code is currently broken). > AnalyzingInfixSuggester thread safety: lookup() fails during (re)build() > > > Key: LUCENE-10292 > URL: https://issues.apache.org/jira/browse/LUCENE-10292 > Project: Lucene - Core > Issue Type: Bug >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Fix For: 10.0 (main), 9.2 > > Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, > LUCENE-10292-3.patch, LUCENE-10292.patch > > > I'm filing this based on anecdotal information from a Solr user w/o > experiencing it first hand (and I don't have a test case to demonstrate it) > but based on a reading of the code the underlying problem seems self > evident... > With all other Lookup implementations I've examined, it is possible to call > {{lookup()}} regardless of whether another thread is concurrently calling > {{build()}} – in all cases I've seen, it is even possible to call > {{lookup()}} even if {{build()}} has never been called: the result is just an > "empty" {{List}} > Typically this is works because the {{build()}} method uses temporary > datastructures until it's "build logic" is complete, at which point it > atomically replaces the datastructures used by the {{lookup()}} method. In > the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method > starts by closing & null'ing out the {{protected SearcherManager > searcherMgr}} (which it only populates again once it's completed building up > it's index) and then the lookup method starts with... > {code:java} > if (searcherMgr == null) { > throw new IllegalStateException("suggester was not built"); > } > {code} > ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any > situation where another thread may be calling > {{AnalyzingInfixSuggester.build()}} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
[ https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528998#comment-17528998 ] Dawid Weiss commented on LUCENE-10292: -- [~hossman] - don't know if you saw the recent discussion on the mailing list - how did you arrive at the conclusion that Lookup.build can be called concurrently? I don't think this is mentioned anywhere in Lookup documentation and I don't think the implementation is thread-safe (at least not the TestFreeTextSuggester)? > AnalyzingInfixSuggester thread safety: lookup() fails during (re)build() > > > Key: LUCENE-10292 > URL: https://issues.apache.org/jira/browse/LUCENE-10292 > Project: Lucene - Core > Issue Type: Bug >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Fix For: 10.0 (main), 9.2 > > Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, > LUCENE-10292-3.patch, LUCENE-10292.patch > > > I'm filing this based on anecdotal information from a Solr user w/o > experiencing it first hand (and I don't have a test case to demonstrate it) > but based on a reading of the code the underlying problem seems self > evident... > With all other Lookup implementations I've examined, it is possible to call > {{lookup()}} regardless of whether another thread is concurrently calling > {{build()}} – in all cases I've seen, it is even possible to call > {{lookup()}} even if {{build()}} has never been called: the result is just an > "empty" {{List}} > Typically this is works because the {{build()}} method uses temporary > datastructures until it's "build logic" is complete, at which point it > atomically replaces the datastructures used by the {{lookup()}} method. In > the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method > starts by closing & null'ing out the {{protected SearcherManager > searcherMgr}} (which it only populates again once it's completed building up > it's index) and then the lookup method starts with... > {code:java} > if (searcherMgr == null) { > throw new IllegalStateException("suggester was not built"); > } > {code} > ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any > situation where another thread may be calling > {{AnalyzingInfixSuggester.build()}} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528991#comment-17528991 ] Dawid Weiss commented on LUCENE-10541: -- I agree - we should fix mock analyzer to not return such long terms. > What to do about massive terms in our Wikipedia EN LineFileDocs? > > > Key: LUCENE-10541 > URL: https://issues.apache.org/jira/browse/LUCENE-10541 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > > Spinoff from this fun build failure that [~dweiss] root caused: > [https://lucene.markmail.org/thread/pculfuazll4oebra] > Thank you and sorry [~dweiss]!! > This test failure happened because the test case randomly indexed a chunk of > the nightly (many GBs) LineFileDocs Wikipedia file that had a massive (> IW's > ~32 KB limit) term, and IW threw an {{IllegalArgumentException}} failing the > test. > It's crazy that it took so long for Lucene's randomized tests to discover > this too-massive term in Lucene's nightly benchmarks. It's like searching > for Nessie, or > [SETI|https://en.wikipedia.org/wiki/Search_for_extraterrestrial_intelligence]. > We need to prevent such false failures, somehow, and there are multiple > options: fix this test to not use {{{}LineFileDocs{}}}, remove all "massive" > terms from all tests (nightly and git) {{{}LineFileDocs{}}}, fix > {{MockTokenizer}} to trim such ridiculous terms (I think this is the best > option?), ... -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10540) Remove alphabetically ordered completions from FSTCompletion
Dawid Weiss created LUCENE-10540: Summary: Remove alphabetically ordered completions from FSTCompletion Key: LUCENE-10540 URL: https://issues.apache.org/jira/browse/LUCENE-10540 Project: Lucene - Core Issue Type: Improvement Reporter: Dawid Weiss The code cheats internally by sorting completions that are always weight-ordered. If this is needed, it should be done up the call stack, not in FSTCompletion - this provides an illusion of something that doesn't exist and is potentially quite expensive to compute. {code} if (!higherWeightsFirst && rootArcs.length > 1) { // We could emit a warning here (?). An optimal strategy for // alphabetically sorted // suggestions would be to add them with a constant weight -- this saves // unnecessary // traversals and sorting. return lookup(key).sorted().limit(num).collect(Collectors.toList()); } else { return lookup(key).limit(num).collect(Collectors.toList()); } {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10539) Return a stream of completions from FSTCompletion
[ https://issues.apache.org/jira/browse/LUCENE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528635#comment-17528635 ] Dawid Weiss commented on LUCENE-10539: -- PR is at: https://github.com/apache/lucene/pull/844 > Return a stream of completions from FSTCompletion > - > > Key: LUCENE-10539 > URL: https://issues.apache.org/jira/browse/LUCENE-10539 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > FSTLookup currently has a "num" parameter which limits the number of > completions from the underlying automaton. But this has severe disadvantages > if you need to collect completions that need to fulfill a secondary condition > (for example, collect only verbs or terms that contain a certain infix). Then > you can't determine the 'num' parameter easily because the number of filtered > completions is unknown. > I also think implementation-wise it's also much nicer to provide a stream > that iterates over completions rather than a fixed-size list. This allows for > much more elegant code (stream.filter, stream.limit). > The provided patch adds a single {{Stream lookup(key)}} method > and modifies the existing lookup methods to use it. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10539) Return a stream of completions from FSTCompletion
[ https://issues.apache.org/jira/browse/LUCENE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10539: - Fix Version/s: 9.2 > Return a stream of completions from FSTCompletion > - > > Key: LUCENE-10539 > URL: https://issues.apache.org/jira/browse/LUCENE-10539 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 9.2 > > > FSTLookup currently has a "num" parameter which limits the number of > completions from the underlying automaton. But this has severe disadvantages > if you need to collect completions that need to fulfill a secondary condition > (for example, collect only verbs or terms that contain a certain infix). Then > you can't determine the 'num' parameter easily because the number of filtered > completions is unknown. > I also think implementation-wise it's also much nicer to provide a stream > that iterates over completions rather than a fixed-size list. This allows for > much more elegant code (stream.filter, stream.limit). > The provided patch adds a single {{Stream lookup(key)}} method > and modifies the existing lookup methods to use it. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10539) Return a stream of completions from FSTCompletion
[ https://issues.apache.org/jira/browse/LUCENE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10539: - Summary: Return a stream of completions from FSTCompletion (was: return a stream of completions from FSTCompletion) > Return a stream of completions from FSTCompletion > - > > Key: LUCENE-10539 > URL: https://issues.apache.org/jira/browse/LUCENE-10539 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > FSTLookup currently has a "num" parameter which limits the number of > completions from the underlying automaton. But this has severe disadvantages > if you need to collect completions that need to fulfill a secondary condition > (for example, collect only verbs or terms that contain a certain infix). Then > you can't determine the 'num' parameter easily because the number of filtered > completions is unknown. > I also think implementation-wise it's also much nicer to provide a stream > that iterates over completions rather than a fixed-size list. This allows for > much more elegant code (stream.filter, stream.limit). > The provided patch adds a single {{Stream lookup(key)}} method > and modifies the existing lookup methods to use it. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10539) return a stream of completions from FSTCompletion
Dawid Weiss created LUCENE-10539: Summary: return a stream of completions from FSTCompletion Key: LUCENE-10539 URL: https://issues.apache.org/jira/browse/LUCENE-10539 Project: Lucene - Core Issue Type: New Feature Reporter: Dawid Weiss Assignee: Dawid Weiss FSTLookup currently has a "num" parameter which limits the number of completions from the underlying automaton. But this has severe disadvantages if you need to collect completions that need to fulfill a secondary condition (for example, collect only verbs or terms that contain a certain infix). Then you can't determine the 'num' parameter easily because the number of filtered completions is unknown. I also think implementation-wise it's also much nicer to provide a stream that iterates over completions rather than a fixed-size list. This allows for much more elegant code (stream.filter, stream.limit). The provided patch adds a single {{Stream lookup(key)}} method and modifies the existing lookup methods to use it. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10386) Add BOM module for ease of dependency management in dependent projects
[ https://issues.apache.org/jira/browse/LUCENE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528596#comment-17528596 ] Dawid Weiss commented on LUCENE-10386: -- Hi Petr. So, I did have a look. TL;DR; version is that I honestly think this kind of thing should be moved to downstream projects to handle in whatever way they fancy. It introduces an additional level of overhead to Lucene maintenance and a potential for problems that I don't think justifies the gains (see below for specific examples). BOMs are not the only way to avoid adding consistent version numbers to projects (Lucene uses Palantir's version consistency plugin, for example) and the diversity here means it'll be hard to please everyone. If you need a BOM - you can create a subproject in your own project (with all the dependencies needed) and treat it as a platform... So it's not that difficult. Here is what I noticed when I applied your patch (and it motivates my above opinion): 1) the diff of poms in the release (gradlew -p lucene/distribution assembleRelease) shows the description and name have changed: {code} Apache Lucene (module: lucene-root) Grandparent project for Apache Lucene Core {code} The refactoring you made to extract configurePublicationMetadata has a side effect in that the lazy provider resolves project reference to the root instead of the context properly. 2) the code for constraints in the BOM submodule includes all the exported Lucene subprojects. But in reality many people will be using just a subset of those - the constraints imposed by the BOM (including transitive dependencies?) will have to be downloaded and will be effective for those dependencies the bom-importing project is not touching at all. I see this as a problem than a benefit, actually. > Add BOM module for ease of dependency management in dependent projects > -- > > Key: LUCENE-10386 > URL: https://issues.apache.org/jira/browse/LUCENE-10386 > Project: Lucene - Core > Issue Type: Wish > Components: general/build >Affects Versions: 9.0, 8.4, 8.11.1 >Reporter: Petr Portnov >Priority: Trivial > Labels: BOM, Dependencies > Time Spent: 10m > Remaining Estimate: 0h > > h1. Short description > Add a Bill-of-Materials (BOM) module to Lucene to allow foreign projects to > use it for dependency management. > h1. Reasoning > [A lot of|https://mvnrepository.com/search?q=bom] multi-module projects are > providing BOMs in order to simplify dependency management. This allows > dependant projects to only specify the version of the BOM module while > declaring the dependencies without them (as the will be provided by BOM). > For example: > {code:groovy} > dependencies { > // Only specify the version of the BOM > implementation platform('com.fasterxml.jackson:jackson-bom:2.13.1') > // Don't specify dependency versions as they are provided by the BOM > implementation "com.fasterxml.jackson.core:jackson-annotations" > implementation "com.fasterxml.jackson.core:jackson-core" > implementation "com.fasterxml.jackson.core:jackson-databind" > implementation "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-guava" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-jdk8" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-jsr310" > implementation > "com.fasterxml.jackson.module:jackson-module-parameter-names" > }{code} > > Not only is this approach "popular" but it also has the following pros: > * there is no need to declare a variable (via Maven properties or Gradle > ext) to hold the version > * this is more automation-friendly because tools like Dependabot only have > to update the single version per dependency group > h1. Other suggestions > It may be reasonable to also publish BOMs for old versions so that the > projects which currently rely on older Lucene versions (such as 8.4) can > migrate to the BOM approach without migrating to Lucene 9.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10386) Add BOM module for ease of dependency management in dependent projects
[ https://issues.apache.org/jira/browse/LUCENE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528377#comment-17528377 ] Dawid Weiss commented on LUCENE-10386: -- Hi Petr. Sorry for the delay. I'll try to go through this tomorrow morning and see if I have any doubts. > Add BOM module for ease of dependency management in dependent projects > -- > > Key: LUCENE-10386 > URL: https://issues.apache.org/jira/browse/LUCENE-10386 > Project: Lucene - Core > Issue Type: Wish > Components: general/build >Affects Versions: 9.0, 8.4, 8.11.1 >Reporter: Petr Portnov >Priority: Trivial > Labels: BOM, Dependencies > Time Spent: 10m > Remaining Estimate: 0h > > h1. Short description > Add a Bill-of-Materials (BOM) module to Lucene to allow foreign projects to > use it for dependency management. > h1. Reasoning > [A lot of|https://mvnrepository.com/search?q=bom] multi-module projects are > providing BOMs in order to simplify dependency management. This allows > dependant projects to only specify the version of the BOM module while > declaring the dependencies without them (as the will be provided by BOM). > For example: > {code:groovy} > dependencies { > // Only specify the version of the BOM > implementation platform('com.fasterxml.jackson:jackson-bom:2.13.1') > // Don't specify dependency versions as they are provided by the BOM > implementation "com.fasterxml.jackson.core:jackson-annotations" > implementation "com.fasterxml.jackson.core:jackson-core" > implementation "com.fasterxml.jackson.core:jackson-databind" > implementation "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-guava" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-jdk8" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-jsr310" > implementation > "com.fasterxml.jackson.module:jackson-module-parameter-names" > }{code} > > Not only is this approach "popular" but it also has the following pros: > * there is no need to declare a variable (via Maven properties or Gradle > ext) to hold the version > * this is more automation-friendly because tools like Dependabot only have > to update the single version per dependency group > h1. Other suggestions > It may be reasonable to also publish BOMs for old versions so that the > projects which currently rely on older Lucene versions (such as 8.4) can > migrate to the BOM approach without migrating to Lucene 9.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10535) The build fails in :checkUnusedConstraints (ConcurrentModificationException)
[ https://issues.apache.org/jira/browse/LUCENE-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10535. -- Fix Version/s: 10.0 (main) Resolution: Fixed > The build fails in :checkUnusedConstraints (ConcurrentModificationException) > > > Key: LUCENE-10535 > URL: https://issues.apache.org/jira/browse/LUCENE-10535 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: 10.0 (main) > > > {code} > * What went wrong: > Execution failed for task ':checkUnusedConstraints'. > > Error while evaluating property 'classpath' of task > > ':checkUnusedConstraints' >> Failed to calculate the value of task ':checkUnusedConstraints' property > 'classpath'. > > java.util.ConcurrentModificationException (no error message) > {code} > Seems to be related to this: > https://github.com/palantir/gradle-consistent-versions/issues/450 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10535) The build fails in :checkUnusedConstraints (ConcurrentModificationException)
[ https://issues.apache.org/jira/browse/LUCENE-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-10535: Assignee: Dawid Weiss > The build fails in :checkUnusedConstraints (ConcurrentModificationException) > > > Key: LUCENE-10535 > URL: https://issues.apache.org/jira/browse/LUCENE-10535 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > > {code} > * What went wrong: > Execution failed for task ':checkUnusedConstraints'. > > Error while evaluating property 'classpath' of task > > ':checkUnusedConstraints' >> Failed to calculate the value of task ':checkUnusedConstraints' property > 'classpath'. > > java.util.ConcurrentModificationException (no error message) > {code} > Seems to be related to this: > https://github.com/palantir/gradle-consistent-versions/issues/450 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10535) The build fails in :checkUnusedConstraints (ConcurrentModificationException)
[ https://issues.apache.org/jira/browse/LUCENE-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527924#comment-17527924 ] Dawid Weiss commented on LUCENE-10535: -- Upgraded the plugin to 2.10.0 - the build passes for me locally, let's see if this helps. > The build fails in :checkUnusedConstraints (ConcurrentModificationException) > > > Key: LUCENE-10535 > URL: https://issues.apache.org/jira/browse/LUCENE-10535 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > > {code} > * What went wrong: > Execution failed for task ':checkUnusedConstraints'. > > Error while evaluating property 'classpath' of task > > ':checkUnusedConstraints' >> Failed to calculate the value of task ':checkUnusedConstraints' property > 'classpath'. > > java.util.ConcurrentModificationException (no error message) > {code} > Seems to be related to this: > https://github.com/palantir/gradle-consistent-versions/issues/450 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10535) The build fails in :checkUnusedConstraints (ConcurrentModificationException)
Dawid Weiss created LUCENE-10535: Summary: The build fails in :checkUnusedConstraints (ConcurrentModificationException) Key: LUCENE-10535 URL: https://issues.apache.org/jira/browse/LUCENE-10535 Project: Lucene - Core Issue Type: Bug Reporter: Dawid Weiss {code} * What went wrong: Execution failed for task ':checkUnusedConstraints'. > Error while evaluating property 'classpath' of task ':checkUnusedConstraints' > Failed to calculate the value of task ':checkUnusedConstraints' property 'classpath'. > java.util.ConcurrentModificationException (no error message) {code} Seems to be related to this: https://github.com/palantir/gradle-consistent-versions/issues/450 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10386) Add BOM module for ease of dependency management in dependent projects
[ https://issues.apache.org/jira/browse/LUCENE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527459#comment-17527459 ] Dawid Weiss commented on LUCENE-10386: -- Hi Petr. I saw the PR but I'm not following all the changes happening there. I honestly just prefer dead simple verbosity... Will take another look in a spare minute though, unless somebody beats me to it. > Add BOM module for ease of dependency management in dependent projects > -- > > Key: LUCENE-10386 > URL: https://issues.apache.org/jira/browse/LUCENE-10386 > Project: Lucene - Core > Issue Type: Wish > Components: general/build >Affects Versions: 9.0, 8.4, 8.11.1 >Reporter: Petr Portnov >Priority: Trivial > Labels: BOM, Dependencies > Time Spent: 10m > Remaining Estimate: 0h > > h1. Short description > Add a Bill-of-Materials (BOM) module to Lucene to allow foreign projects to > use it for dependency management. > h1. Reasoning > [A lot of|https://mvnrepository.com/search?q=bom] multi-module projects are > providing BOMs in order to simplify dependency management. This allows > dependant projects to only specify the version of the BOM module while > declaring the dependencies without them (as the will be provided by BOM). > For example: > {code:groovy} > dependencies { > // Only specify the version of the BOM > implementation platform('com.fasterxml.jackson:jackson-bom:2.13.1') > // Don't specify dependency versions as they are provided by the BOM > implementation "com.fasterxml.jackson.core:jackson-annotations" > implementation "com.fasterxml.jackson.core:jackson-core" > implementation "com.fasterxml.jackson.core:jackson-databind" > implementation "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-guava" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-jdk8" > implementation "com.fasterxml.jackson.datatype:jackson-datatype-jsr310" > implementation > "com.fasterxml.jackson.module:jackson-module-parameter-names" > }{code} > > Not only is this approach "popular" but it also has the following pros: > * there is no need to declare a variable (via Maven properties or Gradle > ext) to hold the version > * this is more automation-friendly because tools like Dependabot only have > to update the single version per dependency group > h1. Other suggestions > It may be reasonable to also publish BOMs for old versions so that the > projects which currently rely on older Lucene versions (such as 8.4) can > migrate to the BOM approach without migrating to Lucene 9.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10528) TestScripts.testLukeCanBeLaunched creates X Window when running the tests
[ https://issues.apache.org/jira/browse/LUCENE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526769#comment-17526769 ] Dawid Weiss commented on LUCENE-10528: -- I agree - let's mark it slow. I run slow tests occasionally too and I'm on windows. > TestScripts.testLukeCanBeLaunched creates X Window when running the tests > - > > Key: LUCENE-10528 > URL: https://issues.apache.org/jira/browse/LUCENE-10528 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > When running the tests, this one causes my entire desktop to "flicker" when > it creates some kind of X-Window very quickly and then destroys it. I use > tiling window manager, so whole desktop gets rearranged for a split second, > and I'd rather it not happen :) > I first tried adding -Djava.awt.headless=true to both org.gradle.jvmargs and > tests.jvmargs in my .gradle/gradle.properties. doesn't work, as the test > doesnt use these when launching luke. > I next tried hacking the test by adding this to the ProcessBuilderThingy, but > it didn't help either: > {noformat} > .envvar("LAUNCH_OPTS", "-Djava.awt.headless=true") > {noformat} > One way I can work around it, is to unset {{DISPLAY}} env var so that it > won't create this window. test still passes: > {noformat} > $ unset DISPLAY > $ ./gradlew :lucene:distribution.tests:test > ... (no window gets created) > {noformat} > So maybe as a workaround, we can just not pass DISPLAY environment variable > through to this test? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test
[ https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526611#comment-17526611 ] Dawid Weiss commented on LUCENE-10521: -- You can use a custom IndexDeletionPolicy - one that never deletes and previous commit, for example. Then create two (or more) commits and each will have a different set of files. You can open a reader over any arbitrary commit so this should be simple and consistent? > Tests in windows are failing for the new > testAlwaysRefreshDirectoryTaxonomyReader test > -- > > Key: LUCENE-10521 > URL: https://issues.apache.org/jira/browse/LUCENE-10521 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet > Environment: Windows 10 >Reporter: Gautam Worah >Priority: Minor > > Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is > failing. > > Specifically, the loop which checks if any files still remain to be deleted > is not ending. > We have added an exception to the main test class to not run the test on > WindowsFS (not sure if this is related). > > ``` > SEVERE: 1 thread leaked from SUITE scope at > org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader: > 1) Thread[id=19, > name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959], > state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native > Method) at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390) > at > java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307) > at > java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251) > at > java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at java.base@18/java.nio.file.Files.delete(Files.java:1152) at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121) > at > app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97) > ``` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10528) TestScripts.testLukeCanBeLaunched creates X Window when running the tests
[ https://issues.apache.org/jira/browse/LUCENE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526207#comment-17526207 ] Dawid Weiss commented on LUCENE-10528: -- Should we do it for an entire gradle process though? Why not just for the forked jvm in that test? > TestScripts.testLukeCanBeLaunched creates X Window when running the tests > - > > Key: LUCENE-10528 > URL: https://issues.apache.org/jira/browse/LUCENE-10528 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > When running the tests, this one causes my entire desktop to "flicker" when > it creates some kind of X-Window very quickly and then destroys it. I use > tiling window manager, so whole desktop gets rearranged for a split second, > and I'd rather it not happen :) > I first tried adding -Djava.awt.headless=true to both org.gradle.jvmargs and > tests.jvmargs in my .gradle/gradle.properties. doesn't work, as the test > doesnt use these when launching luke. > I next tried hacking the test by adding this to the ProcessBuilderThingy, but > it didn't help either: > {noformat} > .envvar("LAUNCH_OPTS", "-Djava.awt.headless=true") > {noformat} > One way I can work around it, is to unset {{DISPLAY}} env var so that it > won't create this window. test still passes: > {noformat} > $ unset DISPLAY > $ ./gradlew :lucene:distribution.tests:test > ... (no window gets created) > {noformat} > So maybe as a workaround, we can just not pass DISPLAY environment variable > through to this test? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10528) TestScripts.testLukeCanBeLaunched creates X Window when running the tests
[ https://issues.apache.org/jira/browse/LUCENE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526064#comment-17526064 ] Dawid Weiss commented on LUCENE-10528: -- There are so many layers to awt/swing support that it should actually run on the defaults Java provides. I've seen weird things with virtualized graphics environments (arguably, it's been a while so things might have improved). Running with xvfb on github jobs is a good idea and is better than nothing (I don't know much about setting up xvfb but I can take a look). We can make it opt-in but I'm afraid it'd just bury the test forever and nobody would ever run it. An alternative is to make it opt-out (via gradle.properties) or we can mark it slow, which would disable it for many folks who don't explicitly run slow tests. > TestScripts.testLukeCanBeLaunched creates X Window when running the tests > - > > Key: LUCENE-10528 > URL: https://issues.apache.org/jira/browse/LUCENE-10528 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > > When running the tests, this one causes my entire desktop to "flicker" when > it creates some kind of X-Window very quickly and then destroys it. I use > tiling window manager, so whole desktop gets rearranged for a split second, > and I'd rather it not happen :) > I first tried adding -Djava.awt.headless=true to both org.gradle.jvmargs and > tests.jvmargs in my .gradle/gradle.properties. doesn't work, as the test > doesnt use these when launching luke. > I next tried hacking the test by adding this to the ProcessBuilderThingy, but > it didn't help either: > {noformat} > .envvar("LAUNCH_OPTS", "-Djava.awt.headless=true") > {noformat} > One way I can work around it, is to unset {{DISPLAY}} env var so that it > won't create this window. test still passes: > {noformat} > $ unset DISPLAY > $ ./gradlew :lucene:distribution.tests:test > ... (no window gets created) > {noformat} > So maybe as a workaround, we can just not pass DISPLAY environment variable > through to this test? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10528) TestScripts.testLukeCanBeLaunched creates X Window when running the tests
[ https://issues.apache.org/jira/browse/LUCENE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526050#comment-17526050 ] Dawid Weiss commented on LUCENE-10528: -- Hmm... But we want to run this test occasionally, don't we? If we disable it completely then it will stop testing if Luke can be actually launched (and nothing fails). The reason why it passes in headless mode is because in headless mode LukeMain exits if it detects it: {code} if (sanityCheck && GraphicsEnvironment.isHeadless()) { Logger.getGlobal().log(Level.SEVERE, "[Vader] Hello, Luke. Can't do much in headless mode."); Runtime.getRuntime().exit(0); } {code} We can provide a test annotation group that would be enabled by default but could be explicitly turned off via gradle.properties. Something like RequiresGraphicsEnvironment? > TestScripts.testLukeCanBeLaunched creates X Window when running the tests > - > > Key: LUCENE-10528 > URL: https://issues.apache.org/jira/browse/LUCENE-10528 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > > When running the tests, this one causes my entire desktop to "flicker" when > it creates some kind of X-Window very quickly and then destroys it. I use > tiling window manager, so whole desktop gets rearranged for a split second, > and I'd rather it not happen :) > I first tried adding -Djava.awt.headless=true to both org.gradle.jvmargs and > tests.jvmargs in my .gradle/gradle.properties. doesn't work, as the test > doesnt use these when launching luke. > I next tried hacking the test by adding this to the ProcessBuilderThingy, but > it didn't help either: > {noformat} > .envvar("LAUNCH_OPTS", "-Djava.awt.headless=true") > {noformat} > One way I can work around it, is to unset {{DISPLAY}} env var so that it > won't create this window. test still passes: > {noformat} > $ unset DISPLAY > $ ./gradlew :lucene:distribution.tests:test > ... (no window gets created) > {noformat} > So maybe as a workaround, we can just not pass DISPLAY environment variable > through to this test? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test
[ https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526044#comment-17526044 ] Dawid Weiss commented on LUCENE-10521: -- I'm not familiar with the code (or the test) but to me something seems off here. To me the deletion of an elsewhere open file seems awkward, even in a test, and relying on this behavior seems strange. Why is the list of files in a directory treated as a state ("commit" in the test)? Does it have to be? Wouldn't a proper Lucene's IndexCommit.getFileNames be more adequate? Sorry if this doesn't make sense in the context but it just feels fishy somehow. > Tests in windows are failing for the new > testAlwaysRefreshDirectoryTaxonomyReader test > -- > > Key: LUCENE-10521 > URL: https://issues.apache.org/jira/browse/LUCENE-10521 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet > Environment: Windows 10 >Reporter: Gautam Worah >Priority: Minor > > Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is > failing. > > Specifically, the loop which checks if any files still remain to be deleted > is not ending. > We have added an exception to the main test class to not run the test on > WindowsFS (not sure if this is related). > > ``` > SEVERE: 1 thread leaked from SUITE scope at > org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader: > 1) Thread[id=19, > name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959], > state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native > Method) at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390) > at > java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307) > at > java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251) > at > java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at java.base@18/java.nio.file.Files.delete(Files.java:1152) at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121) > at > app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97) > ``` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks
[ https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521850#comment-17521850 ] Dawid Weiss commented on LUCENE-10510: -- I suspected it might have been the nightly runs. We could try to detect whether the JVM would run with an unexported jdk package (anything up until jdk16?) but I think it buries the problem rather than solves it. I think it's easy to run a first pass that generates those required JVM settings. If you for some reason can't do it, pass them via command-line (or environment variables) directly to gradle - https://docs.gradle.org/current/userguide/build_environment.html#sec:gradle_environment_variables this will also work, even in the absence of gradle.properties, as the task verifies whether the required modules are open (not how or where they were opened). Sorry for the complications - not my fault. :) > Check module access prior to running gjf/spotless/errorprone tasks > -- > > Key: LUCENE-10510 > URL: https://issues.apache.org/jira/browse/LUCENE-10510 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 9.2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > PR at: [https://github.com/apache/lucene/pull/802] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks
[ https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521323#comment-17521323 ] Dawid Weiss commented on LUCENE-10510: -- Hi Alan. The task graph is fine. When you run 'gradlew clean test' the new task would not be included. If you take a look at the dependencies, it is only included if either spotless is actually part of the execution graph or you run java compilation with -Ptests.slow=true (in which case it is needed because error-prone does require those vm opening settings). I think everything is set up correctly. I believe your CI jobs were passing on 9x with JDKs older than 17 because those JDKs emitted a warning about package accesses. The right way to fix the problem would be to add the right exports or, even better, run gradlew help or an explicit gradlew localSettings to make sure everything is set up correctly in gradle.properties. > Check module access prior to running gjf/spotless/errorprone tasks > -- > > Key: LUCENE-10510 > URL: https://issues.apache.org/jira/browse/LUCENE-10510 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 9.2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > PR at: [https://github.com/apache/lucene/pull/802] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users
[ https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520643#comment-17520643 ] Dawid Weiss commented on LUCENE-10513: -- Perhaps you could add a line to: [https://github.com/apache/lucene/blob/main/help/workflow.txt] and mention the tidy task that reformats the code prior to check. > Make it more obvious how to fix Spotless issues for new users > - > > Key: LUCENE-10513 > URL: https://issues.apache.org/jira/browse/LUCENE-10513 > Project: Lucene - Core > Issue Type: Task >Reporter: Rich Bowen >Priority: Minor > > I just made my first PR to Lucene (yay me!) and in the process stumbled on > various things that were non-obvious. > I request, for The Next Person, that the error messaging in `gradlew` make it > more obvious that one should run `./gradlew tidy` the first time around, so > as to avoid the low-hanging formatting problems that cause everything else to > fail. > During the course of my fumbling around, I was encouraged to run: > ./gradlew :lucene:suggest:spotlessJavaCheck > ./gradlew :lucene:suggest:spotlessApply > ./gradlew :lucene:test-framework:spotlessApply > and > ./gradlew check -Ptests.nightly=true > various times, by the error messages in `./gradlew check`, and while I got > there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew > tidy` first may have saved some frustration. > That said, I cannot overstate how impressed I am with the thoroughness of the > testing/verification tools, and wish more projects had this kind of tooling. > Thank you. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users
[ https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520641#comment-17520641 ] Dawid Weiss commented on LUCENE-10513: -- You should make yourself familiar with various help files under help/, here is one of them explicitly talking about formatting: [https://github.com/apache/lucene/blob/main/help/formatting.txt] I don't think more can be done about it, to be honest. > Make it more obvious how to fix Spotless issues for new users > - > > Key: LUCENE-10513 > URL: https://issues.apache.org/jira/browse/LUCENE-10513 > Project: Lucene - Core > Issue Type: Task >Reporter: Rich Bowen >Priority: Minor > > I just made my first PR to Lucene (yay me!) and in the process stumbled on > various things that were non-obvious. > I request, for The Next Person, that the error messaging in `gradlew` make it > more obvious that one should run `./gradlew tidy` the first time around, so > as to avoid the low-hanging formatting problems that cause everything else to > fail. > During the course of my fumbling around, I was encouraged to run: > ./gradlew :lucene:suggest:spotlessJavaCheck > ./gradlew :lucene:suggest:spotlessApply > ./gradlew :lucene:test-framework:spotlessApply > and > ./gradlew check -Ptests.nightly=true > various times, by the error messages in `./gradlew check`, and while I got > there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew > tidy` first may have saved some frustration. > That said, I cannot overstate how impressed I am with the thoroughness of the > testing/verification tools, and wish more projects had this kind of tooling. > Thank you. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets
[ https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10229: - Priority: Minor (was: Major) > Match offsets should be consistent for fields with positions and fields with > offsets > > > Key: LUCENE-10229 > URL: https://issues.apache.org/jira/browse/LUCENE-10229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 9.2 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > This is a follow-up of LUCENE-10223 in which it was discovered that fields > with > offsets don't highlight some more complex interval queries properly. Alan > says: > {quote} > It's because it returns the position of the inner match, but the offsets of > the outer. And so if you're re-analyzing and retrieving offsets by looking > at the positions, you get the 'right' thing. It's not obvious to me what the > correct response is here, but thinking about it the current behaviour is kind > of the worst of both worlds, and perhaps we should change it so that you get > offsets of the inner match as standard, and then the outer match is returned > as part of the sub matches. > {quote} > Intervals are nicely separated into "basic intervals" and "filters" which > restrict some other source of intervals, here is the original documentation: > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50 > My experience from an extended period of using interval queries in a frontend > where they're highlighted is that filters are restrictions that should not be > highlighted - it's the source intervals that people care about. Filters are > what you remove or where you give proper context to source intervals. > The test code contributed in LUCENE-10223 contains numerous query-highlight > examples (on fields with positions) where this intuition is demonstrated on > all kinds of interval functions: > https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542 > This issue is about making the internals work consistently for fields with > positions and fields with offsets. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets
[ https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-10229: Assignee: Dawid Weiss > Match offsets should be consistent for fields with positions and fields with > offsets > > > Key: LUCENE-10229 > URL: https://issues.apache.org/jira/browse/LUCENE-10229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: 9.2 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > This is a follow-up of LUCENE-10223 in which it was discovered that fields > with > offsets don't highlight some more complex interval queries properly. Alan > says: > {quote} > It's because it returns the position of the inner match, but the offsets of > the outer. And so if you're re-analyzing and retrieving offsets by looking > at the positions, you get the 'right' thing. It's not obvious to me what the > correct response is here, but thinking about it the current behaviour is kind > of the worst of both worlds, and perhaps we should change it so that you get > offsets of the inner match as standard, and then the outer match is returned > as part of the sub matches. > {quote} > Intervals are nicely separated into "basic intervals" and "filters" which > restrict some other source of intervals, here is the original documentation: > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50 > My experience from an extended period of using interval queries in a frontend > where they're highlighted is that filters are restrictions that should not be > highlighted - it's the source intervals that people care about. Filters are > what you remove or where you give proper context to source intervals. > The test code contributed in LUCENE-10223 contains numerous query-highlight > examples (on fields with positions) where this intuition is demonstrated on > all kinds of interval functions: > https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542 > This issue is about making the internals work consistently for fields with > positions and fields with offsets. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org