[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete
[ https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906502#comment-16906502 ] Simon Willnauer commented on LUCENE-8369: - +1 for option 1 above as well. Thanks [~nknize] > Remove the spatial module as it is obsolete > --- > > Key: LUCENE-8369 > URL: https://issues.apache.org/jira/browse/LUCENE-8369 > Project: Lucene - Core > Issue Type: Task > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-8369.patch > > > The "spatial" module is at this juncture nearly empty with only a couple > utilities that aren't used by anything in the entire codebase -- > GeoRelationUtils, and MortonEncoder. Perhaps it should have been removed > earlier in LUCENE-7664 which was the removal of GeoPointField which was > essentially why the module existed. Better late than never. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete
[ https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902133#comment-16902133 ] Simon Willnauer commented on LUCENE-8369: - I don't think we should scarify the existence of LatLong point searching out of core for the sake of code visibility. I think we should keep it in core and open up visibility to enable code-reuse in the modules and use _@lucene.internal_ in order to mark classes as internal and prevent users from complaining when the API changes. It's not ideal but progress. Can we separate the disucssion of getting rid of the spacial module from graduating the various shapes from sandbox to wherever? I think keeping a module for 2 classes doesn't make sense. We can move those two classes to core too or even get rid of them altogether I don't think it should influence the discussion if something else should be graduated. One other option would be we move all non-core spacials from sandbox to spatial as long as they don't add any additional dependency. that would be an intermediate step. we can still graduate from there then. > Remove the spatial module as it is obsolete > --- > > Key: LUCENE-8369 > URL: https://issues.apache.org/jira/browse/LUCENE-8369 > Project: Lucene - Core > Issue Type: Task > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-8369.patch > > > The "spatial" module is at this juncture nearly empty with only a couple > utilities that aren't used by anything in the entire codebase -- > GeoRelationUtils, and MortonEncoder. Perhaps it should have been removed > earlier in LUCENE-7664 which was the removal of GeoPointField which was > essentially why the module existed. Better late than never. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8887) CLONE - Add setting for moving FST offheap/onheap
[ https://issues.apache.org/jira/browse/LUCENE-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8887. - Resolution: Duplicate this seems to be opened accidentially > CLONE - Add setting for moving FST offheap/onheap > - > > Key: LUCENE-8887 > URL: https://issues.apache.org/jira/browse/LUCENE-8887 > Project: Lucene - Core > Issue Type: New Feature > Components: core/FSTs, core/store >Reporter: LuYunCheng >Assignee: Simon Willnauer >Priority: Minor > Fix For: master (9.0), 8.1 > > Attachments: offheap_generic_settings.patch, offheap_settings.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > While LUCENE-8635, adds support for loading FST offheap using mmap, users do > not have the flexibility to specify fields for which FST needs to be > offheap. This allows users to tune heap usage as per their workload. > Ideal way will be to add an attribute to FieldInfo, where we have > put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the > appropriate On/OffHeapStore when creating its FST. It can support special > keywords like ALL/NONE. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8865) Use incoming thread for execution if IndexSearcher has an executor
[ https://issues.apache.org/jira/browse/LUCENE-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872690#comment-16872690 ] Simon Willnauer commented on LUCENE-8865: - [~hypothesisx86] I didn't run any benchmarks. maybe [~mikemccand] can provide infos if there are improvements. > Use incoming thread for execution if IndexSearcher has an executor > --- > > Key: LUCENE-8865 > URL: https://issues.apache.org/jira/browse/LUCENE-8865 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Today we don't utilize the incoming thread for a search when IndexSearcher > has an executor. This thread is only idleing but can be used to execute a > search > once all other collectors are dispatched. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868555#comment-16868555 ] Simon Willnauer commented on LUCENE-8857: - A couple of comments: * can you open a PR and associate it with this issue. Patches are so hard to review without context and the ability to comment * for the second case in IndexsSearcher should we also tie-break by doc? * Can we replace the verbose comparators with _Comparator.comparingInt(d -> d.shardIndex);_ and _Comparator.comparingInt(d -> d.doc);_ respectively? * Any chance we can select the tie-breaker based on if one of the TopDocs has a shardIndex != -1 and assert that all of them have it or not? Another option would be to have only one comparator and first tie-break on shardIndex and then on doc since we don't set the shard index it should be fine since they are all -1? WDYT? > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8865) Use incoming thread for execution if IndexSearcher has an executor
[ https://issues.apache.org/jira/browse/LUCENE-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8865. - Resolution: Fixed Fix Version/s: 8.2 master (9.0) > Use incoming thread for execution if IndexSearcher has an executor > --- > > Key: LUCENE-8865 > URL: https://issues.apache.org/jira/browse/LUCENE-8865 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Today we don't utilize the incoming thread for a search when IndexSearcher > has an executor. This thread is only idleing but can be used to execute a > search > once all other collectors are dispatched. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866575#comment-16866575 ] Simon Willnauer commented on LUCENE-8857: - Why don't we just use the comparator and have a default and a doc one? like this: {code} Comparator defaultComparator = Comparator.comparingInt(d -> d.shardIndex); Comparator docComparator = Comparator.comparingInt(d -> d.doc); {code} > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch > > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8853) FileSwitchDirectory is broken if temp outputs are used
[ https://issues.apache.org/jira/browse/LUCENE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8853. - Resolution: Fixed Fix Version/s: 8.2 master (9.0) > FileSwitchDirectory is broken if temp outputs are used > -- > > Key: LUCENE-8853 > URL: https://issues.apache.org/jira/browse/LUCENE-8853 > Project: Lucene - Core > Issue Type: Bug >Reporter: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > FileSwitchDirectory basically doesn't work if tmp output are used for files > that are explicitly mapped with extensions. here is a failing test: > {code} > 16:49:40[junit4] Suite: > org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest > 16:49:40[junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=BlendedInfixSuggesterTest > -Dtests.method=testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch > -Dtests.seed=16D8C93DC8FE5192 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=pt-LU -Dtests.timezone=US/Michigan -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1 > 16:49:40[junit4] ERROR 0.05s J1 | > BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch > <<< > 16:49:40[junit4]> Throwable #1: > java.nio.file.AtomicMoveNotSupportedException: _0.fdx__0.tmp -> _0.fdx: > source and dest are in different directories > 16:49:40[junit4]> at > __randomizedtesting.SeedInfo.seed([16D8C93DC8FE5192:20E180A9490374CE]:0) > 16:49:40[junit4]> at > org.apache.lucene.store.FileSwitchDirectory.rename(FileSwitchDirectory.java:201) > 16:49:40[junit4]> at > org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231) > 16:49:40[junit4]> at > org.apache.lucene.store.LockValidatingDirectoryWrapper.rename(LockValidatingDirectoryWrapper.java:56) > 16:49:40[junit4]> at > org.apache.lucene.store.TrackingDirectoryWrapper.rename(TrackingDirectoryWrapper.java:64) > 16:49:40[junit4]> at > org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) > 16:49:40[junit4]> at > org.apache.lucene.index.SortingStoredFieldsConsumer.flush(SortingStoredFieldsConsumer.java:56) > 16:49:40[junit4]> at > org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:152) > 16:49:40[junit4]> at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468) > 16:49:40[junit4]> at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555) > 16:49:40[junit4]> at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722) > 16:49:40[junit4]> at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3199) > 16:49:40[junit4]> at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444) > 16:49:40[junit4]> at > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409) > 16:49:40[junit4]> at > org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.commit(AnalyzingInfixSuggester.java:345) > 16:49:40[junit4]> at > org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:315) > 16:49:40[junit4]> at > org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.getBlendedInfixSuggester(BlendedInfixSuggesterTest.java:125) > 16:49:40[junit4]> at > org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch(BlendedInfixSuggesterTest.java:79) > 16:49:40[junit4]> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 16:49:40[junit4]> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 16:49:40[junit4]> at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 16:49:40[junit4]> at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > 16:49:40[junit4]> at > java.base/java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866253#comment-16866253 ] Simon Willnauer commented on LUCENE-8857: - >From my perspective we should simplify this even more and remove >_TieBreakingParameters_. TopDocs can use _Comparator_ and default >to the shard index if it's not supplied. That should be sufficient? > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8857.patch, LUCENE-8857.patch > > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8865) Use incoming thread for execution if IndexSearcher has an executor
Simon Willnauer created LUCENE-8865: --- Summary: Use incoming thread for execution if IndexSearcher has an executor Key: LUCENE-8865 URL: https://issues.apache.org/jira/browse/LUCENE-8865 Project: Lucene - Core Issue Type: Improvement Reporter: Simon Willnauer Today we don't utilize the incoming thread for a search when IndexSearcher has an executor. This thread is only idleing but can be used to execute a search once all other collectors are dispatched. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
[ https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863067#comment-16863067 ] Simon Willnauer commented on LUCENE-8829: - {quote} Simon Willnauer That is a fun idea, although it would still need a function to instruct TopDocs#merge whether to set the shard indices or not. {quote} I am not sure we have to. Can't a user initialize it ahead of time if necessary. I think if it's necessary to have this we can just iterate over it and set it from the outside? That should also be possible no? > TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved > - > > Key: LUCENE-8829 > URL: https://issues.apache.org/jira/browse/LUCENE-8829 > Project: Lucene - Core > Issue Type: Bug >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, > LUCENE-8829.patch > > > While investigating LUCENE-8819, I understood that TopDocs#merge's order of > results are indirectly dependent on the number of collectors involved in the > merge. This is troubling because 1) The number of collectors involved in a > merge are cost based and directly dependent on the number of slices created > for the parallel searcher case. 2) TopN hits code path will invoke merge with > a single Collector, so essentially, doing the same TopN query with single > threaded and parallel threaded searcher will invoke different order of > results, which is a bad invariant that breaks. > > The reason why this happens is because of the subtle way TopDocs#merge sets > shardIndex in the ScoreDoc population during populating the priority queue > used for merging. ShardIndex is essentially set to the ordinal of the > collector which generates the hit. This means that the shardIndex is > dependent on the number of collectors, even for the same set of hits. > > In case of no sort order specified, shardIndex is used for tie breaking when > scores are equal. This translates to different orders for same hits with > different shardIndices. > > I propose that we remove shardIndex from the default tie breaking mechanism > and replace it with docID. DocID order is the de facto that is expected > during collection, so it might make sense to use the same factor during tie > breaking when scores are the same. > > CC: [~ivera] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
[ https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861848#comment-16861848 ] Simon Willnauer edited comment on LUCENE-8829 at 6/12/19 8:56 AM: -- I'd remove the _setShardIndex_ parameter alltogether and don't set it was (Author: simonw): I'd remove the _ setShardIndex_ parameter alltogether and don't set it > TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved > - > > Key: LUCENE-8829 > URL: https://issues.apache.org/jira/browse/LUCENE-8829 > Project: Lucene - Core > Issue Type: Bug >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, > LUCENE-8829.patch > > > While investigating LUCENE-8819, I understood that TopDocs#merge's order of > results are indirectly dependent on the number of collectors involved in the > merge. This is troubling because 1) The number of collectors involved in a > merge are cost based and directly dependent on the number of slices created > for the parallel searcher case. 2) TopN hits code path will invoke merge with > a single Collector, so essentially, doing the same TopN query with single > threaded and parallel threaded searcher will invoke different order of > results, which is a bad invariant that breaks. > > The reason why this happens is because of the subtle way TopDocs#merge sets > shardIndex in the ScoreDoc population during populating the priority queue > used for merging. ShardIndex is essentially set to the ordinal of the > collector which generates the hit. This means that the shardIndex is > dependent on the number of collectors, even for the same set of hits. > > In case of no sort order specified, shardIndex is used for tie breaking when > scores are equal. This translates to different orders for same hits with > different shardIndices. > > I propose that we remove shardIndex from the default tie breaking mechanism > and replace it with docID. DocID order is the de facto that is expected > during collection, so it might make sense to use the same factor during tie > breaking when scores are the same. > > CC: [~ivera] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
[ https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861848#comment-16861848 ] Simon Willnauer commented on LUCENE-8829: - I'd remove the _ setShardIndex_ parameter alltogether and don't set it > TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved > - > > Key: LUCENE-8829 > URL: https://issues.apache.org/jira/browse/LUCENE-8829 > Project: Lucene - Core > Issue Type: Bug >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, > LUCENE-8829.patch > > > While investigating LUCENE-8819, I understood that TopDocs#merge's order of > results are indirectly dependent on the number of collectors involved in the > merge. This is troubling because 1) The number of collectors involved in a > merge are cost based and directly dependent on the number of slices created > for the parallel searcher case. 2) TopN hits code path will invoke merge with > a single Collector, so essentially, doing the same TopN query with single > threaded and parallel threaded searcher will invoke different order of > results, which is a bad invariant that breaks. > > The reason why this happens is because of the subtle way TopDocs#merge sets > shardIndex in the ScoreDoc population during populating the priority queue > used for merging. ShardIndex is essentially set to the ordinal of the > collector which generates the hit. This means that the shardIndex is > dependent on the number of collectors, even for the same set of hits. > > In case of no sort order specified, shardIndex is used for tie breaking when > scores are equal. This translates to different orders for same hits with > different shardIndices. > > I propose that we remove shardIndex from the default tie breaking mechanism > and replace it with docID. DocID order is the de facto that is expected > during collection, so it might make sense to use the same factor during tie > breaking when scores are the same. > > CC: [~ivera] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
[ https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861821#comment-16861821 ] Simon Willnauer commented on LUCENE-8829: - I do wonder if we can simplify this API now that we have FunctionalInterfaces. If we change _TopDocs#merge_ to take a _ToIntFunction_ we should be able to have a default of _ScoreDoc::doc_ and users that want to use the shardindex can use _ScoreDoc::shardIndex_ that should also simplify our code I guess. Yet, I haven't check if it works across the board just an idea. > TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved > - > > Key: LUCENE-8829 > URL: https://issues.apache.org/jira/browse/LUCENE-8829 > Project: Lucene - Core > Issue Type: Bug >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, > LUCENE-8829.patch > > > While investigating LUCENE-8819, I understood that TopDocs#merge's order of > results are indirectly dependent on the number of collectors involved in the > merge. This is troubling because 1) The number of collectors involved in a > merge are cost based and directly dependent on the number of slices created > for the parallel searcher case. 2) TopN hits code path will invoke merge with > a single Collector, so essentially, doing the same TopN query with single > threaded and parallel threaded searcher will invoke different order of > results, which is a bad invariant that breaks. > > The reason why this happens is because of the subtle way TopDocs#merge sets > shardIndex in the ScoreDoc population during populating the priority queue > used for merging. ShardIndex is essentially set to the ordinal of the > collector which generates the hit. This means that the shardIndex is > dependent on the number of collectors, even for the same set of hits. > > In case of no sort order specified, shardIndex is used for tie breaking when > scores are equal. This translates to different orders for same hits with > different shardIndices. > > I propose that we remove shardIndex from the default tie breaking mechanism > and replace it with docID. DocID order is the de facto that is expected > during collection, so it might make sense to use the same factor during tie > breaking when scores are the same. > > CC: [~ivera] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8853) FileSwitchDirectory is broken if temp outputs are used
[ https://issues.apache.org/jira/browse/LUCENE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861785#comment-16861785 ] Simon Willnauer commented on LUCENE-8853: - I attached a PR but I am not really happy with it, yet it's my best bet. I am wondering sure if we should start a discussion about removal of FileSwitchDirectory. It's hard to get right and there are many situtations where it can break. I do wonder what's it's usecase other than opening a file with NIO vs. MMAP as elasticsearch uses. If that's the main purpose we can build a better version of it. /cc [~rcmuir] > FileSwitchDirectory is broken if temp outputs are used > -- > > Key: LUCENE-8853 > URL: https://issues.apache.org/jira/browse/LUCENE-8853 > Project: Lucene - Core > Issue Type: Bug >Reporter: Simon Willnauer >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > FileSwitchDirectory basically doesn't work if tmp output are used for files > that are explicitly mapped with extensions. here is a failing test: > {code} > 16:49:40[junit4] Suite: > org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest > 16:49:40[junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=BlendedInfixSuggesterTest > -Dtests.method=testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch > -Dtests.seed=16D8C93DC8FE5192 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=pt-LU -Dtests.timezone=US/Michigan -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1 > 16:49:40[junit4] ERROR 0.05s J1 | > BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch > <<< > 16:49:40[junit4]> Throwable #1: > java.nio.file.AtomicMoveNotSupportedException: _0.fdx__0.tmp -> _0.fdx: > source and dest are in different directories > 16:49:40[junit4]> at > __randomizedtesting.SeedInfo.seed([16D8C93DC8FE5192:20E180A9490374CE]:0) > 16:49:40[junit4]> at > org.apache.lucene.store.FileSwitchDirectory.rename(FileSwitchDirectory.java:201) > 16:49:40[junit4]> at > org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231) > 16:49:40[junit4]> at > org.apache.lucene.store.LockValidatingDirectoryWrapper.rename(LockValidatingDirectoryWrapper.java:56) > 16:49:40[junit4]> at > org.apache.lucene.store.TrackingDirectoryWrapper.rename(TrackingDirectoryWrapper.java:64) > 16:49:40[junit4]> at > org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) > 16:49:40[junit4]> at > org.apache.lucene.index.SortingStoredFieldsConsumer.flush(SortingStoredFieldsConsumer.java:56) > 16:49:40[junit4]> at > org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:152) > 16:49:40[junit4]> at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468) > 16:49:40[junit4]> at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555) > 16:49:40[junit4]> at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722) > 16:49:40[junit4]> at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3199) > 16:49:40[junit4]> at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444) > 16:49:40[junit4]> at > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409) > 16:49:40[junit4]> at > org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.commit(AnalyzingInfixSuggester.java:345) > 16:49:40[junit4]> at > org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:315) > 16:49:40[junit4]> at > org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.getBlendedInfixSuggester(BlendedInfixSuggesterTest.java:125) > 16:49:40[junit4]> at > org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch(BlendedInfixSuggesterTest.java:79) > 16:49:40[junit4]> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 16:49:40[junit4]> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 16:49:40[junit4]> at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 16:49:40[junit4]> at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > 16:49:40[junit4]> at > java.base/java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent b
[jira] [Created] (LUCENE-8853) FileSwitchDirectory is broken if temp outputs are used
Simon Willnauer created LUCENE-8853: --- Summary: FileSwitchDirectory is broken if temp outputs are used Key: LUCENE-8853 URL: https://issues.apache.org/jira/browse/LUCENE-8853 Project: Lucene - Core Issue Type: Bug Reporter: Simon Willnauer FileSwitchDirectory basically doesn't work if tmp output are used for files that are explicitly mapped with extensions. here is a failing test: {code} 16:49:40[junit4] Suite: org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest 16:49:40[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=BlendedInfixSuggesterTest -Dtests.method=testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch -Dtests.seed=16D8C93DC8FE5192 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=pt-LU -Dtests.timezone=US/Michigan -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 16:49:40[junit4] ERROR 0.05s J1 | BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch <<< 16:49:40[junit4]> Throwable #1: java.nio.file.AtomicMoveNotSupportedException: _0.fdx__0.tmp -> _0.fdx: source and dest are in different directories 16:49:40[junit4]> at __randomizedtesting.SeedInfo.seed([16D8C93DC8FE5192:20E180A9490374CE]:0) 16:49:40[junit4]> at org.apache.lucene.store.FileSwitchDirectory.rename(FileSwitchDirectory.java:201) 16:49:40[junit4]> at org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231) 16:49:40[junit4]> at org.apache.lucene.store.LockValidatingDirectoryWrapper.rename(LockValidatingDirectoryWrapper.java:56) 16:49:40[junit4]> at org.apache.lucene.store.TrackingDirectoryWrapper.rename(TrackingDirectoryWrapper.java:64) 16:49:40[junit4]> at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) 16:49:40[junit4]> at org.apache.lucene.index.SortingStoredFieldsConsumer.flush(SortingStoredFieldsConsumer.java:56) 16:49:40[junit4]> at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:152) 16:49:40[junit4]> at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468) 16:49:40[junit4]> at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555) 16:49:40[junit4]> at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722) 16:49:40[junit4]> at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3199) 16:49:40[junit4]> at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444) 16:49:40[junit4]> at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409) 16:49:40[junit4]> at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.commit(AnalyzingInfixSuggester.java:345) 16:49:40[junit4]> at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:315) 16:49:40[junit4]> at org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.getBlendedInfixSuggester(BlendedInfixSuggesterTest.java:125) 16:49:40[junit4]> at org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch(BlendedInfixSuggesterTest.java:79) 16:49:40[junit4]> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 16:49:40[junit4]> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 16:49:40[junit4]> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 16:49:40[junit4]> at java.base/java.lang.reflect.Method.invoke(Method.java:566) 16:49:40[junit4]> at java.base/java.lang.Thread.run(Thread.java:834) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8835) Respect file extension when listing files form FileSwitchDirectory
[ https://issues.apache.org/jira/browse/LUCENE-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8835. - Resolution: Fixed Assignee: Simon Willnauer Fix Version/s: 8.2 master (9.0) > Respect file extension when listing files form FileSwitchDirectory > -- > > Key: LUCENE-8835 > URL: https://issues.apache.org/jira/browse/LUCENE-8835 > Project: Lucene - Core > Issue Type: Bug >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 50m > Remaining Estimate: 0h > > FileSwitchDirectory splits file actions between 2 directories based on file > extensions. The extensions are respected on write operations like delete or > create but ignored when we list the content of the directories. Until now we > only deduplicated the contents on Directory#listAll which can cause > inconsistencies and hard to debug errors due to double deletions in > IndexWriter is a file is pending delete in one of the directories but still > shows up in the directory listing form the other directory. This case can > happen if both directories point to the same underlying FS directory which is > a common usecase to split between mmap and noifs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs
[ https://issues.apache.org/jira/browse/LUCENE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858441#comment-16858441 ] Simon Willnauer commented on LUCENE-8833: - I do like the idea of #warm but the footprint is much bigger since it's a public API. I mean for my specific usecase I'd subclass mmap anyway and it would make it easier that way. FileSwitchDirectory is quite heavy and isn't really build for what I wanna do. I basically would need a IndexInput factory that I can plug into a directory that can alternate between NIOFS and mmap etc. and conditionally preload the mmap. Either way I can work with both I just think this change is the minimum viable change. lemme know if you are ok moving forward. > Allow subclasses of MMapDirecory to preload individual IndexInputs > -- > > Key: LUCENE-8833 > URL: https://issues.apache.org/jira/browse/LUCENE-8833 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I think it's useful for subclasses to select the preload flag on a per index > input basis rather than all or nothing. Here is a patch that has an > overloaded protected openInput method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs
[ https://issues.apache.org/jira/browse/LUCENE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857525#comment-16857525 ] Simon Willnauer commented on LUCENE-8833: - > what would the iocontext provide to base the preload decision on? just > curious. sure, the one I had in mind as an example is merge. I am not sure if it makes a big difference I was just thinking if there are other signals than the file extension. I opened LUCENE-8835 to fix the file listing issue FileSwitchDirectory has. > Allow subclasses of MMapDirecory to preload individual IndexInputs > -- > > Key: LUCENE-8833 > URL: https://issues.apache.org/jira/browse/LUCENE-8833 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I think it's useful for subclasses to select the preload flag on a per index > input basis rather than all or nothing. Here is a patch that has an > overloaded protected openInput method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8835) Respect file extension when listing files form FileSwitchDirectory
Simon Willnauer created LUCENE-8835: --- Summary: Respect file extension when listing files form FileSwitchDirectory Key: LUCENE-8835 URL: https://issues.apache.org/jira/browse/LUCENE-8835 Project: Lucene - Core Issue Type: Bug Reporter: Simon Willnauer FileSwitchDirectory splits file actions between 2 directories based on file extensions. The extensions are respected on write operations like delete or create but ignored when we list the content of the directories. Until now we only deduplicated the contents on Directory#listAll which can cause inconsistencies and hard to debug errors due to double deletions in IndexWriter is a file is pending delete in one of the directories but still shows up in the directory listing form the other directory. This case can happen if both directories point to the same underlying FS directory which is a common usecase to split between mmap and noifs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs
[ https://issues.apache.org/jira/browse/LUCENE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856781#comment-16856781 ] Simon Willnauer commented on LUCENE-8833: - you are correct that's what elasticsearch does. Yet, FileSwitchDirectory had many issues in the past and still has (I am working on one issue related to [this|https://github.com/elastic/elasticsearch/pull/37140] and will open another issue soon. Especially with the push of pending deletes down to FSDirectory things became more tricky for FileSwitchDirectory especially. That said I think these issue should be fixed and I will work on it it was more of a trigger to look closer. I also wanted to make decisions if you preload or not based on the IOContext down the road which FileSwitch would not be capable of doing in this context. I hope this makes sense? > Allow subclasses of MMapDirecory to preload individual IndexInputs > -- > > Key: LUCENE-8833 > URL: https://issues.apache.org/jira/browse/LUCENE-8833 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I think it's useful for subclasses to select the preload flag on a per index > input basis rather than all or nothing. Here is a patch that has an > overloaded protected openInput method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs
Simon Willnauer created LUCENE-8833: --- Summary: Allow subclasses of MMapDirecory to preload individual IndexInputs Key: LUCENE-8833 URL: https://issues.apache.org/jira/browse/LUCENE-8833 Project: Lucene - Core Issue Type: Improvement Reporter: Simon Willnauer I think it's useful for subclasses to select the preload flag on a per index input basis rather than all or nothing. Here is a patch that has an overloaded protected openInput method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8809) Refresh and rollback concurrently can leave segment states unclosed
[ https://issues.apache.org/jira/browse/LUCENE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856364#comment-16856364 ] Simon Willnauer commented on LUCENE-8809: - [~dnhatn] can we close this issue? > Refresh and rollback concurrently can leave segment states unclosed > --- > > Key: LUCENE-8809 > URL: https://issues.apache.org/jira/browse/LUCENE-8809 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.7, 8.1, 8.2 >Reporter: Nhat Nguyen >Assignee: Nhat Nguyen >Priority: Major > Fix For: 7.7.2, master (9.0), 8.2, 8.1.2 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > A [failed test|https://github.com/elastic/elasticsearch/issues/30290] from > Elasticsearch shows that refresh and rollback concurrently can leave segment > states unclosed leads to leaking refCount of some SegmentReaders. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8813) testIndexTooManyDocs fails
[ https://issues.apache.org/jira/browse/LUCENE-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8813. - Resolution: Fixed Fix Version/s: 8.2 master (9.0) > testIndexTooManyDocs fails > -- > > Key: LUCENE-8813 > URL: https://issues.apache.org/jira/browse/LUCENE-8813 > Project: Lucene - Core > Issue Type: Test > Components: core/index >Reporter: Nhat Nguyen >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 2.5h > Remaining Estimate: 0h > > testIndexTooManyDocs fails on [Elastic > CI|https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+branch_8x/6402/console]. > This failure does not reproduce locally for me. > {noformat} > [junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs >[junit4] 2> KTN 23, 2019 4:09:37 PM > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException >[junit4] 2> WARNING: Uncaught exception in thread: > Thread[Thread-612,5,TGRP-TestIndexTooManyDocs] >[junit4] 2> java.lang.AssertionError: only modifications from the > current flushing queue are permitted while doing a full flush >[junit4] 2> at > __randomizedtesting.SeedInfo.seed([1F16B1DA7056AA52]:0) >[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.assertTicketQueueModification(DocumentsWriter.java:683) >[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:187) >[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:411) >[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:514) >[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) >[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586) >[junit4] 2> at > org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70) >[junit4] 2> at java.base/java.lang.Thread.run(Thread.java:834) >[junit4] 2> >[junit4] 2> KTN 23, 2019 6:09:36 PM > com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate >[junit4] 2> WARNING: Suite execution timed out: > org.apache.lucene.index.TestIndexTooManyDocs >[junit4] 2>1) Thread[id=669, > name=SUITE-TestIndexTooManyDocs-seed#[1F16B1DA7056AA52], state=RUNNABLE, > group=TGRP-TestIndexTooManyDocs] >[junit4] 2> at > java.base/java.lang.Thread.getStackTrace(Thread.java:1606) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:696) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:693) >[junit4] 2> at > java.base/java.security.AccessController.doPrivileged(Native Method) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl.getStackTrace(ThreadLeakControl.java:693) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl.getThreadsWithTraces(ThreadLeakControl.java:709) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl.formatThreadStacksFull(ThreadLeakControl.java:689) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl.access$1000(ThreadLeakControl.java:65) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:415) >[junit4] 2> at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:708) >[junit4] 2> at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:138) >[junit4] 2> at > com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:629) >[junit4] 2>2) Thread[id=671, name=Thread-606, state=BLOCKED, > group=TGRP-TestIndexTooManyDocs] >[junit4] 2> at > app//org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:4945) >[junit4] 2> at > app//org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293) >[junit4] 2> at > app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272) >[junit4] 2> at > app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:262) >[junit4] 2> at > app//org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:165) >[junit4] 2> at > app//org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTo
[jira] [Commented] (LUCENE-8813) testIndexTooManyDocs fails
[ https://issues.apache.org/jira/browse/LUCENE-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849506#comment-16849506 ] Simon Willnauer commented on LUCENE-8813: - I looked at this and I think the issue here is that we are executing 2 flushes very very quickly after another while at the same time a single thread has already released it's DWPT before the first flush but has not tried to applying deletes before the second flush is done. In this case the assertion doesn't hold anymore. The window is super small and that is likely why we never tripped this. I don't think we have a correctness issue here but I will still try to improve the way we assert/apply deletes. > testIndexTooManyDocs fails > -- > > Key: LUCENE-8813 > URL: https://issues.apache.org/jira/browse/LUCENE-8813 > Project: Lucene - Core > Issue Type: Test > Components: core/index >Reporter: Nhat Nguyen >Priority: Major > > testIndexTooManyDocs fails on [Elastic > CI|https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+branch_8x/6402/console]. > This failure does not reproduce locally for me. > {noformat} > [junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs >[junit4] 2> KTN 23, 2019 4:09:37 PM > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException >[junit4] 2> WARNING: Uncaught exception in thread: > Thread[Thread-612,5,TGRP-TestIndexTooManyDocs] >[junit4] 2> java.lang.AssertionError: only modifications from the > current flushing queue are permitted while doing a full flush >[junit4] 2> at > __randomizedtesting.SeedInfo.seed([1F16B1DA7056AA52]:0) >[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.assertTicketQueueModification(DocumentsWriter.java:683) >[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:187) >[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:411) >[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:514) >[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) >[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586) >[junit4] 2> at > org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70) >[junit4] 2> at java.base/java.lang.Thread.run(Thread.java:834) >[junit4] 2> >[junit4] 2> KTN 23, 2019 6:09:36 PM > com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate >[junit4] 2> WARNING: Suite execution timed out: > org.apache.lucene.index.TestIndexTooManyDocs >[junit4] 2>1) Thread[id=669, > name=SUITE-TestIndexTooManyDocs-seed#[1F16B1DA7056AA52], state=RUNNABLE, > group=TGRP-TestIndexTooManyDocs] >[junit4] 2> at > java.base/java.lang.Thread.getStackTrace(Thread.java:1606) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:696) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:693) >[junit4] 2> at > java.base/java.security.AccessController.doPrivileged(Native Method) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl.getStackTrace(ThreadLeakControl.java:693) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl.getThreadsWithTraces(ThreadLeakControl.java:709) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl.formatThreadStacksFull(ThreadLeakControl.java:689) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl.access$1000(ThreadLeakControl.java:65) >[junit4] 2> at > com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:415) >[junit4] 2> at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:708) >[junit4] 2> at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:138) >[junit4] 2> at > com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:629) >[junit4] 2>2) Thread[id=671, name=Thread-606, state=BLOCKED, > group=TGRP-TestIndexTooManyDocs] >[junit4] 2> at > app//org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:4945) >[junit4] 2> at > app//org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293) >[junit4] 2> at > app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfCha
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843726#comment-16843726 ] Simon Willnauer commented on LUCENE-8757: - [~atris] can we instead of asserting the order just sort the slice in LeafSlice ctor? This should prevent any issues down the road and it's cheap enough IMO > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-8757: --- Assignee: Simon Willnauer > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837615#comment-16837615 ] Simon Willnauer commented on LUCENE-8757: - LGTM I will try to commit this in the coming days > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837003#comment-16837003 ] Simon Willnauer commented on LUCENE-8757: - {quote} I think there is an important justification for the 2nd criteria (number of segments in each work unit / slice), which is if you have an index with some large segments, and then with a long tail of small segments (easily happens if your machine has substantially CPU concurrency and you use multiple threads), since there is a fixed cost for visiting each segment, if you put too many small segments into one work unit, those fixed costs multiply and that one work unit can become too slow even though it's not actually going to visit too many documents. I think we should keep it? {quote} fair enough. lets add it back > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure
[ https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835894#comment-16835894 ] Simon Willnauer commented on LUCENE-8785: - {quote} Please feel free to commit this to the release branch. In case of a re-spin, I'll pick this change up. {quote} [~ichattopadhyaya] done. Thanks. > TestIndexWriterDelete.testDeleteAllNoDeadlock failure > - > > Key: LUCENE-8785 > URL: https://issues.apache.org/jira/browse/LUCENE-8785 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.6 > Environment: OpenJDK 1.8.0_202 >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Fix For: 7.7.2, master (9.0), 8.2, 8.1.1 > > Time Spent: 40m > Remaining Estimate: 0h > > I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 > cores), and hit this random yet spooky failure: > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock > -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\ > serts=true -Dtests.file.encoding=US-ASCII > [junit4] ERROR 0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock > <<< > [junit4] > Throwable #1: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, > group=TGRP-TestIndexWriterDelete] > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0) > [junit4] > Caused by: java.lang.RuntimeException: > java.lang.IllegalArgumentException: field number 0 is already mapped to field > name "null", not "content" > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332) > [junit4] > Caused by: java.lang.IllegalArgumentException: field number > 0 is already mapped to field name "null", not "content" > [junit4] > at > org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310) > [junit4] > at > org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > [junit4] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > [junit4] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > [junit4] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > [junit4] > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > [junit4] > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat} > It does *not* reproduce unfortunately ... but maybe there is some subtle > thread safety issue in this code ... this is a hairy part of Lucene ;) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835481#comment-16835481 ] Simon Willnauer commented on LUCENE-8757: - Thanks for the additional iteration, now that we simplified this can we remove the sorting? I don't necessearily see how the sort makes things simpler. If we see a segment > threshold we can just add it as a group? I though you did that already and hence my comment about the assertion. WDYT? I also want to suggest to beef up testing a bit with a randomized version of this like this: {code} diff --git a/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java b/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java index 7c63a817adb..76ccca64ee7 100644 --- a/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java +++ b/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java @@ -1933,6 +1933,14 @@ public abstract class LuceneTestCase extends Assert { ret = random.nextBoolean() ? new AssertingIndexSearcher(random, r, ex) : new AssertingIndexSearcher(random, r.getContext(), ex); + } else if (random.nextBoolean()) { +int maxDocPerSlice = 1 + random.nextInt(10); +ret = new IndexSearcher(r, ex) { + @Override + protected LeafSlice[] slices(List leaves) { +return slices(leaves, maxDocPerSlice); + } +}; } else { ret = random.nextBoolean() ? new IndexSearcher(r, ex) {code} > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7840) BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 1 MUST/FILTER clause and 0==minShouldMatch
[ https://issues.apache.org/jira/browse/LUCENE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835473#comment-16835473 ] Simon Willnauer commented on LUCENE-7840: - LGTM > BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least > 1 MUST/FILTER clause and 0==minShouldMatch > --- > > Key: LUCENE-7840 > URL: https://issues.apache.org/jira/browse/LUCENE-7840 > Project: Lucene - Core > Issue Type: Task >Reporter: Hoss Man >Priority: Major > Attachments: LUCENE-7840.patch, LUCENE-7840.patch, LUCENE-7840.patch > > > I haven't thought this through completely, let alone write up a patch / test > case, but IIUC... > We should be able to optimize {{ BooleanQuery rewriteNoScoring() }} so that > (after converting MUST clauses to FILTER clauses) we can check for the common > case of {{0==getMinimumNumberShouldMatch()}} and throw away any SHOULD > clauses as long as there is is at least one FILTER clause. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure
[ https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8785. - Resolution: Fixed > TestIndexWriterDelete.testDeleteAllNoDeadlock failure > - > > Key: LUCENE-8785 > URL: https://issues.apache.org/jira/browse/LUCENE-8785 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.6 > Environment: OpenJDK 1.8.0_202 >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Fix For: 7.7.2, master (9.0), 8.2, 8.1.1 > > Time Spent: 40m > Remaining Estimate: 0h > > I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 > cores), and hit this random yet spooky failure: > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock > -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\ > serts=true -Dtests.file.encoding=US-ASCII > [junit4] ERROR 0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock > <<< > [junit4] > Throwable #1: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, > group=TGRP-TestIndexWriterDelete] > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0) > [junit4] > Caused by: java.lang.RuntimeException: > java.lang.IllegalArgumentException: field number 0 is already mapped to field > name "null", not "content" > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332) > [junit4] > Caused by: java.lang.IllegalArgumentException: field number > 0 is already mapped to field name "null", not "content" > [junit4] > at > org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310) > [junit4] > at > org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > [junit4] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > [junit4] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > [junit4] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > [junit4] > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > [junit4] > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat} > It does *not* reproduce unfortunately ... but maybe there is some subtle > thread safety issue in this code ... this is a hairy part of Lucene ;) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure
[ https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8785: Fix Version/s: (was: 8.0.1) (was: 8.1) (was: 7.7.1) 8.2 7.7.2 8.1.1 > TestIndexWriterDelete.testDeleteAllNoDeadlock failure > - > > Key: LUCENE-8785 > URL: https://issues.apache.org/jira/browse/LUCENE-8785 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.6 > Environment: OpenJDK 1.8.0_202 >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Fix For: 7.7.2, master (9.0), 8.2, 8.1.1 > > Time Spent: 40m > Remaining Estimate: 0h > > I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 > cores), and hit this random yet spooky failure: > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock > -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\ > serts=true -Dtests.file.encoding=US-ASCII > [junit4] ERROR 0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock > <<< > [junit4] > Throwable #1: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, > group=TGRP-TestIndexWriterDelete] > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0) > [junit4] > Caused by: java.lang.RuntimeException: > java.lang.IllegalArgumentException: field number 0 is already mapped to field > name "null", not "content" > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332) > [junit4] > Caused by: java.lang.IllegalArgumentException: field number > 0 is already mapped to field name "null", not "content" > [junit4] > at > org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310) > [junit4] > at > org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > [junit4] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > [junit4] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > [junit4] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > [junit4] > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > [junit4] > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat} > It does *not* reproduce unfortunately ... but maybe there is some subtle > thread safety issue in this code ... this is a hairy part of Lucene ;) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7840) BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 1 MUST/FILTER clause and 0==minShouldMatch
[ https://issues.apache.org/jira/browse/LUCENE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834778#comment-16834778 ] Simon Willnauer commented on LUCENE-7840: - I think there are some style issues in this patch like here were _else_ shoud be on the prev line: {code:java} + } +} +else { + newQuery.add(clause); +} {code} the other question is if we should use a switch instead of if / else? Otherwise it's looking fine > BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least > 1 MUST/FILTER clause and 0==minShouldMatch > --- > > Key: LUCENE-7840 > URL: https://issues.apache.org/jira/browse/LUCENE-7840 > Project: Lucene - Core > Issue Type: Task >Reporter: Hoss Man >Priority: Major > Attachments: LUCENE-7840.patch, LUCENE-7840.patch > > > I haven't thought this through completely, let alone write up a patch / test > case, but IIUC... > We should be able to optimize {{ BooleanQuery rewriteNoScoring() }} so that > (after converting MUST clauses to FILTER clauses) we can check for the common > case of {{0==getMinimumNumberShouldMatch()}} and throw away any SHOULD > clauses as long as there is is at least one FILTER clause. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834767#comment-16834767 ] Simon Willnauer commented on LUCENE-8757: - [~atris] I think the assertion in this part doesn't hold: {code} +for (LeafReaderContext ctx : sortedLeaves) { + if (ctx.reader().maxDoc() > maxDocsPerSlice) { +assert group == null; +List singleSegmentSlice = new ArrayList(); {code} if the previous segment was smallish then _group_ is non-null? I think you should test these cases, maybe add a random test and randomize the order or the segments? This: {code} +List singleSegmentSlice = new ArrayList(); + +singleSegmentSlice.add(ctx); +groupedLeaves.add(singleSegmentSlice); {code} can and should be replaced by: {code} groupedLeaves.add(Collections.singletonList(ctx)); {code} otherwise it looks good. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834525#comment-16834525 ] Simon Willnauer commented on LUCENE-8757: - [~atris] actually I thought about these defaults again and I am starting to think it's an ok default. The reason for this is that we try to prevent having dedicated threads for smallish segments so we group them together. I still do wonder if we need to have 2 parameters? Wouldn't it be enough to just say that we group things together until we have at least 250k docs per thread to be searched? is it really necessary to have another parameter that limits the number of segmetns per slice? I think a single parameter would be great and simpler. WDYT? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure
[ https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8785: Fix Version/s: 7.7.1 master (9.0) 8.1 8.0.1 > TestIndexWriterDelete.testDeleteAllNoDeadlock failure > - > > Key: LUCENE-8785 > URL: https://issues.apache.org/jira/browse/LUCENE-8785 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.6 > Environment: OpenJDK 1.8.0_202 >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Fix For: 7.7.1, 8.0.1, 8.1, master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 > cores), and hit this random yet spooky failure: > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock > -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\ > serts=true -Dtests.file.encoding=US-ASCII > [junit4] ERROR 0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock > <<< > [junit4] > Throwable #1: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, > group=TGRP-TestIndexWriterDelete] > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0) > [junit4] > Caused by: java.lang.RuntimeException: > java.lang.IllegalArgumentException: field number 0 is already mapped to field > name "null", not "content" > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332) > [junit4] > Caused by: java.lang.IllegalArgumentException: field number > 0 is already mapped to field name "null", not "content" > [junit4] > at > org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310) > [junit4] > at > org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > [junit4] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > [junit4] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > [junit4] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > [junit4] > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > [junit4] > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat} > It does *not* reproduce unfortunately ... but maybe there is some subtle > thread safety issue in this code ... this is a hairy part of Lucene ;) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure
[ https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-8785: --- Assignee: Simon Willnauer > TestIndexWriterDelete.testDeleteAllNoDeadlock failure > - > > Key: LUCENE-8785 > URL: https://issues.apache.org/jira/browse/LUCENE-8785 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.6 > Environment: OpenJDK 1.8.0_202 >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 > cores), and hit this random yet spooky failure: > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock > -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\ > serts=true -Dtests.file.encoding=US-ASCII > [junit4] ERROR 0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock > <<< > [junit4] > Throwable #1: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, > group=TGRP-TestIndexWriterDelete] > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0) > [junit4] > Caused by: java.lang.RuntimeException: > java.lang.IllegalArgumentException: field number 0 is already mapped to field > name "null", not "content" > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332) > [junit4] > Caused by: java.lang.IllegalArgumentException: field number > 0 is already mapped to field name "null", not "content" > [junit4] > at > org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310) > [junit4] > at > org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > [junit4] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > [junit4] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > [junit4] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > [junit4] > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > [junit4] > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat} > It does *not* reproduce unfortunately ... but maybe there is some subtle > thread safety issue in this code ... this is a hairy part of Lucene ;) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure
[ https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834467#comment-16834467 ] Simon Willnauer commented on LUCENE-8785: - {quote} If there is another thread coming in after we locked the existent threadstates we just issue a new one. Yuck {quote} I looked at the code again and we actually lock the threadstates for this purpose. I implemented this in LUCENE-8639. The issue here is in-fact a race condition since we request the number of active threadstates before we lock new ones. It's a classic one-line fix. I referenced a PR for this. [~mikemccand] would you take a look > TestIndexWriterDelete.testDeleteAllNoDeadlock failure > - > > Key: LUCENE-8785 > URL: https://issues.apache.org/jira/browse/LUCENE-8785 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.6 > Environment: OpenJDK 1.8.0_202 >Reporter: Michael McCandless >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 > cores), and hit this random yet spooky failure: > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock > -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\ > serts=true -Dtests.file.encoding=US-ASCII > [junit4] ERROR 0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock > <<< > [junit4] > Throwable #1: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, > group=TGRP-TestIndexWriterDelete] > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0) > [junit4] > Caused by: java.lang.RuntimeException: > java.lang.IllegalArgumentException: field number 0 is already mapped to field > name "null", not "content" > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332) > [junit4] > Caused by: java.lang.IllegalArgumentException: field number > 0 is already mapped to field name "null", not "content" > [junit4] > at > org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310) > [junit4] > at > org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > [junit4] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > [junit4] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > [junit4] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > [junit4] > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > [junit4] > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat} > It does *not* reproduce unfortunately ... but maybe there is some subtle > thread safety issue in this code ... this is a hairy part of Lucene ;) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1683#comment-1683 ] Simon Willnauer commented on LUCENE-8757: - > Would it make sense to push this patch, and then let users consume it and > provide feedback while we iterate on the more sophisticated version? We could > even have both of the methods available as options to users, potentially I don't think we should push this if we already know we wanna do something different. That said, I am not convinced the numbers are good defaults. At the same time I don't have any numbers here do you have anything to back these defaults up? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832343#comment-16832343 ] Simon Willnauer commented on LUCENE-8757: - Thanks [~atris], can you bring back the javadocs for {code:java} protected LeafSlice[] slices(List leaves){code} please don't reassign an argument like here: {code:java} leaves = new ArrayList<>(leaves); {code} The rest of the patch looks OK to me yet I am not so sure about the defaults. I do wonder if we should look at this from a different perspective. Rather than using hard numbers can we try to evenly balance the total number of documents across N threads and make N the variable? [~mikemccand] WDYT? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure
[ https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832336#comment-16832336 ] Simon Willnauer commented on LUCENE-8785: - {quote} I realize neither ES nor Solr expose deleteAll but I don't think that's a valid argument to remove it from Lucene. {quote} huh, I don't think that's a valid argument either, I just re-read my comments - sorry if you felt I was alluding to es or solr here. My argument is that if you want to do that you should construct a new IndexWriter instead of calling deleteAll(). Given this comment on the javadocs: {noformat} Essentially a call to {@link #deleteAll()} is equivalent to creating a new {@link IndexWriter} with {@link OpenMode#CREATE} {noformat} I want to understand why, in such a rather edgy case a user can't do exactly this. There is no race, no confusion it's very simple from a semantics perspective. Currently there are 2 ways and one if confusing. I think we should move towards removing the second way. {quote}And for some reason the index is reset once per week, but the devs want to allow searching of the old index while the new index is (slowly) built up. But if something goes badly wrong, they need to be able to rollback (the deleteAll and all subsequently added docs) to the last commit and try again later. If instead it succeeds, then a refresh/commit will switch to the new index atomically. {quote} Well, there are tons of ways to do that no? I mean you can have 2 directories? Yes it causes some engineering effort but the semantics would be cleaner even for the app that does what you explain. > TestIndexWriterDelete.testDeleteAllNoDeadlock failure > - > > Key: LUCENE-8785 > URL: https://issues.apache.org/jira/browse/LUCENE-8785 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.6 > Environment: OpenJDK 1.8.0_202 >Reporter: Michael McCandless >Priority: Minor > > I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 > cores), and hit this random yet spooky failure: > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock > -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\ > serts=true -Dtests.file.encoding=US-ASCII > [junit4] ERROR 0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock > <<< > [junit4] > Throwable #1: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, > group=TGRP-TestIndexWriterDelete] > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0) > [junit4] > Caused by: java.lang.RuntimeException: > java.lang.IllegalArgumentException: field number 0 is already mapped to field > name "null", not "content" > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332) > [junit4] > Caused by: java.lang.IllegalArgumentException: field number > 0 is already mapped to field name "null", not "content" > [junit4] > at > org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310) > [junit4] > at > org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > [junit4] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > [junit4] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > [junit4] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > [junit4] > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > [junit4] > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat} > It does *not* reproduce unfortunately ... but maybe there is some subtle > thread safety issue in this code ... this is a hairy part of Lucene ;) -- This m
[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure
[ https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831635#comment-16831635 ] Simon Willnauer commented on LUCENE-8785: - > But at the point we call clear() haven't we already blocked all indexing > threads? no, it might look like we do that but we don't. We block and lock all threads up that that point in time. If there is another thread coming in after we locked the existent threadstates we just issue a new one. > I also dislike deleteAll() and you're right a user could deleteByQuery using > MatchAllDocsQuery; can we make that close-ish as efficient as deleteAll() is > today? I think we can just do what deleteAll() does today except of not dropping the schema on the floor? > Though indeed that would preserve the schema, while deleteAll() let's you > delete docs, delete schema, all under transaction (the change is not visible > until commit). I want to understand the usecase for this. I can see how somebody wants to drop all docs but basically droping all IW state on the floor is difficult in my eyes. > TestIndexWriterDelete.testDeleteAllNoDeadlock failure > - > > Key: LUCENE-8785 > URL: https://issues.apache.org/jira/browse/LUCENE-8785 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.6 > Environment: OpenJDK 1.8.0_202 >Reporter: Michael McCandless >Priority: Minor > > I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 > cores), and hit this random yet spooky failure: > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock > -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\ > serts=true -Dtests.file.encoding=US-ASCII > [junit4] ERROR 0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock > <<< > [junit4] > Throwable #1: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, > group=TGRP-TestIndexWriterDelete] > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0) > [junit4] > Caused by: java.lang.RuntimeException: > java.lang.IllegalArgumentException: field number 0 is already mapped to field > name "null", not "content" > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332) > [junit4] > Caused by: java.lang.IllegalArgumentException: field number > 0 is already mapped to field name "null", not "content" > [junit4] > at > org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310) > [junit4] > at > org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > [junit4] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > [junit4] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > [junit4] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > [junit4] > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > [junit4] > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat} > It does *not* reproduce unfortunately ... but maybe there is some subtle > thread safety issue in this code ... this is a hairy part of Lucene ;) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure
[ https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831612#comment-16831612 ] Simon Willnauer commented on LUCENE-8785: - [~mikemccand] I think this is caused by the fact that we simply call _clear()_ during _IW#deleteAll()_. If this happens concurrently to the a document being indexed this assertion can trip. I personally always disliked the complexity of _IW#deleteAll_ and from my perspective we should remove this method entirely and ask users to open a new IW if they want to drop all the information including the _schema_. We can still fast-path a _MatchAllQuery_ through something like this as we do today (which is a problem IMO since it drops all fields map info which it shouldn't?). IMO if you want a fresh index start from scratch but to delete all docs go and run DeleteByQueyr and keep the schema. > TestIndexWriterDelete.testDeleteAllNoDeadlock failure > - > > Key: LUCENE-8785 > URL: https://issues.apache.org/jira/browse/LUCENE-8785 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.6 > Environment: OpenJDK 1.8.0_202 >Reporter: Michael McCandless >Priority: Minor > > I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 > cores), and hit this random yet spooky failure: > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock > -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\ > serts=true -Dtests.file.encoding=US-ASCII > [junit4] ERROR 0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock > <<< > [junit4] > Throwable #1: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, > group=TGRP-TestIndexWriterDelete] > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0) > [junit4] > Caused by: java.lang.RuntimeException: > java.lang.IllegalArgumentException: field number 0 is already mapped to field > name "null", not "content" > [junit4] > at > __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332) > [junit4] > Caused by: java.lang.IllegalArgumentException: field number > 0 is already mapped to field name "null", not "content" > [junit4] > at > org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310) > [junit4] > at > org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) > [junit4] > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > [junit4] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > [junit4] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > [junit4] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > [junit4] > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > [junit4] > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159) > [junit4] > at > org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat} > It does *not* reproduce unfortunately ... but maybe there is some subtle > thread safety issue in this code ... this is a hairy part of Lucene ;) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8776) Start offset going backwards has a legitimate purpose
[ https://issues.apache.org/jira/browse/LUCENE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831604#comment-16831604 ] Simon Willnauer commented on LUCENE-8776: - [~venkat11] I do understand your frustration. Believe me, we don't take changes like this easily. One persons bug is another persons feature and as we grow and mature strong guarantess are essential for a vast majority of users, for future developments for faster iterations and more performant code. There might not be a tradeoff from your perspective, from the maintainers perspective there is. Now we can debate if a major version bump is _enough_ time to migrate or not, our policy is that we can make BWC and behavioral changes like this in a major release. In-fact we don't do it in minors to provide you the time you need and to easy upgrades to minors. We will and have build features on top of this guarantee and in order to manage expectations I am pretty sure we won't go back an allow negative offsets. I think your best option, if you like it or not, is to work towards a fix for your issue with either the tools you have now or improve lucene for instance with the suggestion from [~mgibney] regarding indexing more information. Please don't get mad at me, I am just trying to manage expectations. > Start offset going backwards has a legitimate purpose > - > > Key: LUCENE-8776 > URL: https://issues.apache.org/jira/browse/LUCENE-8776 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.6 >Reporter: Ram Venkat >Priority: Major > > Here is the use case where startOffset can go backwards: > Say there is a line "Organic light-emitting-diode glows", and I want to run > span queries and highlight them properly. > During index time, light-emitting-diode is split into three words, which > allows me to search for 'light', 'emitting' and 'diode' individually. The > three words occupy adjacent positions in the index, as 'light' adjacent to > 'emitting' and 'light' at a distance of two words from 'diode' need to match > this word. So, the order of words after splitting are: Organic, light, > emitting, diode, glows. > But, I also want to search for 'organic' being adjacent to > 'light-emitting-diode' or 'light-emitting-diode' being adjacent to 'glows'. > The way I solved this was to also generate 'light-emitting-diode' at two > positions: (a) In the same position as 'light' and (b) in the same position > as 'glows', like below: > ||organic||light||emitting||diode||glows|| > | |light-emitting-diode| |light-emitting-diode| | > |0|1|2|3|4| > The positions of the two 'light-emitting-diode' are 1 and 3, but the offsets > are obviously the same. This works beautifully in Lucene 5.x in both > searching and highlighting with span queries. > But when I try this in Lucene 7.6, it hits the condition "Offsets must not go > backwards" at DefaultIndexingChain:818. This IllegalArgumentException is > being thrown without any comments on why this check is needed. As I explained > above, startOffset going backwards is perfectly valid, to deal with word > splitting and span operations on these specialized use cases. On the other > hand, it is not clear what value is added by this check and which highlighter > code is affected by offsets going backwards. This same check is done at > BaseTokenStreamTestCase:245. > I see others talk about how this check found bugs in WordDelimiter etc. but > it also prevents legitimate use cases. Can this check be removed? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831591#comment-16831591 ] Simon Willnauer commented on LUCENE-8757: - Hey Atri, thanks for putting up this patch, here is some additional feedback: - can we stick with an protected non-static method on IndexSearcher subclasses should be able to override your impl. I think it's ok to have a static method like this: {code:java} public static LeafSlice[] slices (List leaves, int maxDocsPerSlice, int maxSegPerSlice){code} that you can call from the protected method with your defaults? - you might want to change your sort to something like this: {code:java} Collections.sort(leaves, Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc(;{code} - I think the _Leaves_ class is unnecessary we can just use _List_ instead? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8671) Add setting for moving FST offheap/onheap
[ https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8671. - Resolution: Fixed Assignee: Simon Willnauer Fix Version/s: master (9.0) 8.1 > Add setting for moving FST offheap/onheap > - > > Key: LUCENE-8671 > URL: https://issues.apache.org/jira/browse/LUCENE-8671 > Project: Lucene - Core > Issue Type: New Feature > Components: core/FSTs, core/store >Reporter: Ankit Jain >Assignee: Simon Willnauer >Priority: Minor > Fix For: 8.1, master (9.0) > > Attachments: offheap_generic_settings.patch, offheap_settings.patch > > Original Estimate: 24h > Time Spent: 5h > Remaining Estimate: 19h > > While LUCENE-8635, adds support for loading FST offheap using mmap, users do > not have the flexibility to specify fields for which FST needs to be > offheap. This allows users to tune heap usage as per their workload. > Ideal way will be to add an attribute to FieldInfo, where we have > put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the > appropriate On/OffHeapStore when creating its FST. It can support special > keywords like ALL/NONE. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8754) SegmentInfo#toString can cause ConcurrentModificationException
[ https://issues.apache.org/jira/browse/LUCENE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8754. - Resolution: Fixed Fix Version/s: master (9.0) 8.1 > SegmentInfo#toString can cause ConcurrentModificationException > -- > > Key: LUCENE-8754 > URL: https://issues.apache.org/jira/browse/LUCENE-8754 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: 8.1, master (9.0) > > Time Spent: 3h > Remaining Estimate: 0h > > A recent change increased the likelihood for this issue to show up but it can > already happen before since we are using the attributes map in the > StoredFieldsFormat for quite some time. I found this issue due to a test > failure on our CI: > {noformat} > 13:11:56[junit4] Suite: org.apache.lucene.index.TestIndexSorting > 13:11:56[junit4] 2> apr 05, 2019 8:11:53 AM > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 13:11:56[junit4] 2> WARNING: Uncaught exception in thread: > Thread[Thread-507,5,TGRP-TestIndexSorting] > 13:11:56[junit4] 2> java.util.ConcurrentModificationException > 13:11:56[junit4] 2> at > __randomizedtesting.SeedInfo.seed([7C25B308F180203B]:0) > 13:11:56[junit4] 2> at > java.util.HashMap$HashIterator.nextNode(HashMap.java:1442) > 13:11:56[junit4] 2> at > java.util.HashMap$EntryIterator.next(HashMap.java:1476) > 13:11:56[junit4] 2> at > java.util.HashMap$EntryIterator.next(HashMap.java:1474) > 13:11:56[junit4] 2> at > java.util.AbstractMap.toString(AbstractMap.java:554) > 13:11:56[junit4] 2> at > org.apache.lucene.index.SegmentInfo.toString(SegmentInfo.java:222) > 13:11:56[junit4] 2> at > org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:345) > 13:11:56[junit4] 2> at > org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:364) > 13:11:56[junit4] 2> at java.lang.String.valueOf(String.java:2994) > 13:11:56[junit4] 2> at > java.lang.StringBuilder.append(StringBuilder.java:131) > 13:11:56[junit4] 2> at > java.util.AbstractMap.toString(AbstractMap.java:557) > 13:11:56[junit4] 2> at > java.util.Collections$UnmodifiableMap.toString(Collections.java:1493) > 13:11:56[junit4] 2> at java.lang.String.valueOf(String.java:2994) > 13:11:56[junit4] 2> at > java.lang.StringBuilder.append(StringBuilder.java:131) > 13:11:56[junit4] 2> at > org.apache.lucene.index.TieredMergePolicy.findForcedMerges(TieredMergePolicy.java:628) > 13:11:56[junit4] 2> at > org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2181) > 13:11:56[junit4] 2> at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2154) > 13:11:56[junit4] 2> at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1988) > 13:11:56[junit4] 2> at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1939) > 13:11:56[junit4] 2> at > org.apache.lucene.index.TestIndexSorting$UpdateRunnable.run(TestIndexSorting.java:1851) > 13:11:56[junit4] 2> at java.lang.Thread.run(Thread.java:748) > 13:11:56[junit4] 2> > 13:11:56[junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexSorting -Dtests.method=testConcurrentUpdates > -Dtests.seed=7C25B308F180203B -Dtests.slow=true -Dtest > {noformat} > The issue is that we update the attributes map (also we similarly do the same > for diagnostics but it's not necessarily causing the issue since the > diagnostics map is never modified) during the merge process but access it in > the merge policy when looking at running merges and there we call toString on > SegmentCommitInfo which happens without any synchronization. This is > technically unsafe publication but IW is a mess along those lines and real > fixes would require significant changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8754) SegmentInfo#toString can cause ConcurrentModificationException
Simon Willnauer created LUCENE-8754: --- Summary: SegmentInfo#toString can cause ConcurrentModificationException Key: LUCENE-8754 URL: https://issues.apache.org/jira/browse/LUCENE-8754 Project: Lucene - Core Issue Type: Improvement Reporter: Simon Willnauer A recent change increased the likelihood for this issue to show up but it can already happen before since we are using the attributes map in the StoredFieldsFormat for quite some time. I found this issue due to a test failure on our CI: {noformat} 13:11:56[junit4] Suite: org.apache.lucene.index.TestIndexSorting 13:11:56[junit4] 2> apr 05, 2019 8:11:53 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException 13:11:56[junit4] 2> WARNING: Uncaught exception in thread: Thread[Thread-507,5,TGRP-TestIndexSorting] 13:11:56[junit4] 2> java.util.ConcurrentModificationException 13:11:56[junit4] 2> at __randomizedtesting.SeedInfo.seed([7C25B308F180203B]:0) 13:11:56[junit4] 2> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442) 13:11:56[junit4] 2> at java.util.HashMap$EntryIterator.next(HashMap.java:1476) 13:11:56[junit4] 2> at java.util.HashMap$EntryIterator.next(HashMap.java:1474) 13:11:56[junit4] 2> at java.util.AbstractMap.toString(AbstractMap.java:554) 13:11:56[junit4] 2> at org.apache.lucene.index.SegmentInfo.toString(SegmentInfo.java:222) 13:11:56[junit4] 2> at org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:345) 13:11:56[junit4] 2> at org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:364) 13:11:56[junit4] 2> at java.lang.String.valueOf(String.java:2994) 13:11:56[junit4] 2> at java.lang.StringBuilder.append(StringBuilder.java:131) 13:11:56[junit4] 2> at java.util.AbstractMap.toString(AbstractMap.java:557) 13:11:56[junit4] 2> at java.util.Collections$UnmodifiableMap.toString(Collections.java:1493) 13:11:56[junit4] 2> at java.lang.String.valueOf(String.java:2994) 13:11:56[junit4] 2> at java.lang.StringBuilder.append(StringBuilder.java:131) 13:11:56[junit4] 2> at org.apache.lucene.index.TieredMergePolicy.findForcedMerges(TieredMergePolicy.java:628) 13:11:56[junit4] 2> at org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2181) 13:11:56[junit4] 2> at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2154) 13:11:56[junit4] 2> at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1988) 13:11:56[junit4] 2> at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1939) 13:11:56[junit4] 2> at org.apache.lucene.index.TestIndexSorting$UpdateRunnable.run(TestIndexSorting.java:1851) 13:11:56[junit4] 2> at java.lang.Thread.run(Thread.java:748) 13:11:56[junit4] 2> 13:11:56[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexSorting -Dtests.method=testConcurrentUpdates -Dtests.seed=7C25B308F180203B -Dtests.slow=true -Dtest {noformat} The issue is that we update the attributes map (also we similarly do the same for diagnostics but it's not necessarily causing the issue since the diagnostics map is never modified) during the merge process but access it in the merge policy when looking at running merges and there we call toString on SegmentCommitInfo which happens without any synchronization. This is technically unsafe publication but IW is a mess along those lines and real fixes would require significant changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8735) FileAlreadyExistsException after opening old commit
[ https://issues.apache.org/jira/browse/LUCENE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801820#comment-16801820 ] Simon Willnauer commented on LUCENE-8735: - thanks henning > FileAlreadyExistsException after opening old commit > --- > > Key: LUCENE-8735 > URL: https://issues.apache.org/jira/browse/LUCENE-8735 > Project: Lucene - Core > Issue Type: Bug > Components: core/store >Affects Versions: 8.0 >Reporter: Henning Andersen >Assignee: Simon Willnauer >Priority: Major > Fix For: 7.7.1, 7.7.2, 8.0.1, 8.1, master (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > FilterDirectory.getPendingDeletes() does not delegate calls. This in turn > means that IndexFileDeleter does not consider those as relevant files. > When opening an IndexWriter for an older commit, excess files are attempted > deleted. If an IndexReader exists using one of the newer commits, the excess > files may fail to delete (at least on windows or when using the mocking > WindowsFS). > If then closing and opening the IndexWriter, the information on the pending > deletes are gone if a FilterDirectory derivate is used. At the same time, the > pending deletes are filtered out of listAll. This leads to a risk of hitting > an existing file name, causing a FileAlreadyExistsException. > This issue likely only exists on windows. > Will create pull request with fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8735) FileAlreadyExistsException after opening old commit
[ https://issues.apache.org/jira/browse/LUCENE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8735. - Resolution: Fixed Assignee: Simon Willnauer Fix Version/s: 7.7.1 8.1 8.0.1 7.7.2 > FileAlreadyExistsException after opening old commit > --- > > Key: LUCENE-8735 > URL: https://issues.apache.org/jira/browse/LUCENE-8735 > Project: Lucene - Core > Issue Type: Bug > Components: core/store >Affects Versions: 8.0 >Reporter: Henning Andersen >Assignee: Simon Willnauer >Priority: Major > Fix For: 7.7.2, 8.0.1, 8.1, master (9.0), 7.7.1 > > Time Spent: 40m > Remaining Estimate: 0h > > FilterDirectory.getPendingDeletes() does not delegate calls. This in turn > means that IndexFileDeleter does not consider those as relevant files. > When opening an IndexWriter for an older commit, excess files are attempted > deleted. If an IndexReader exists using one of the newer commits, the excess > files may fail to delete (at least on windows or when using the mocking > WindowsFS). > If then closing and opening the IndexWriter, the information on the pending > deletes are gone if a FilterDirectory derivate is used. At the same time, the > pending deletes are filtered out of listAll. This leads to a risk of hitting > an existing file name, causing a FileAlreadyExistsException. > This issue likely only exists on windows. > Will create pull request with fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8700) Enable concurrent flushing when no indexing is in progress
[ https://issues.apache.org/jira/browse/LUCENE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8700. - Resolution: Invalid We settled on the PR that IndexWriter#flushNextBuffer is sufficient for this usecase. I opened a new PR for the test-improvements. here https://github.com/apache/lucene-solr/pull/607 > Enable concurrent flushing when no indexing is in progress > -- > > Key: LUCENE-8700 > URL: https://issues.apache.org/jira/browse/LUCENE-8700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mike Sokolov >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > As discussed on mailing list, this is for adding a IndexWriter.yield() method > that callers can use to enable concurrent flushing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() may not reflect all corrupting exceptions (notably: NoSuchFileException)
[ https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790895#comment-16790895 ] Simon Willnauer commented on LUCENE-8692: - > rollback gives you a way to close IndexWriter without doing a commit, which > seems useful. If you removed that, what would users do instead? can't we expend close to close without commit? I mean we can keep rollback but bet more strict about exceptions during the commit and friends? > IndexWriter.getTragicException() may not reflect all corrupting exceptions > (notably: NoSuchFileException) > - > > Key: LUCENE-8692 > URL: https://issues.apache.org/jira/browse/LUCENE-8692 > Project: Lucene - Core > Issue Type: Bug >Reporter: Hoss Man >Priority: Major > Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch, > LUCENE-8692_test.patch > > > Backstory... > Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's > {{corruptFiles}} to introduce corruption into the "leader" node's index and > then assert that this solr node gives up it's leadership of the shard and > another replica takes over. > This can currently fail sporadically (but usually reproducibly - see > SOLR-13237) due to the leader not giving up it's leadership even after the > corruption causes an update/commit to fail. Solr's leadership code makes this > decision after encountering an exception from the IndexWriter based on wether > {{IndexWriter.getTragicException()}} is (non-)null. > > While investigating this, I created an isolated Lucene-Core equivilent test > that demonstrates the same basic situation: > * Gradually cause corruption on an index untill (otherwise) valid execution > of IW.add() + IW.commit() calls throw an exception to the IW client. > * assert that if an exception is thrown to the IW client, > {{getTragicException()}} is now non-null. > It's fairly easy to make my new test fail reproducibly – in every situation > I've seen the underlying exception is a {{NoSuchFileException}} (ie: the > randomly introduced corruption was to delete some file). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() may not reflect all corrupting exceptions (notably: NoSuchFileException)
[ https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789320#comment-16789320 ] Simon Willnauer commented on LUCENE-8692: - {quote} It definitely seems like there should be something we can/should do to better recognize situations like this as "unrecoverable" and be more strict in dealing with low level exceptions during things like commit – but I'm out definitely out of my depth in understanding/suggesting what that might look like. {quote} I agree with you here, I personally question the purpose of rollback since all the cases I have seen a missing rollback would simply mean dataloss. if somebody continues after a failed commit / prepareCommit / reopen they will end up with inconsistency and / or dataloss. I can't think of a reason why you would want to do it. I am curious what [~mikemccand] [~jpountz] [~rcmuir ] think about that. If we deprecated and remove rollback() we can be more agressive when it gets to tragic events and prevent users from continuing after such an exception by closing the writer automatically. > IndexWriter.getTragicException() may not reflect all corrupting exceptions > (notably: NoSuchFileException) > - > > Key: LUCENE-8692 > URL: https://issues.apache.org/jira/browse/LUCENE-8692 > Project: Lucene - Core > Issue Type: Bug >Reporter: Hoss Man >Priority: Major > Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch, > LUCENE-8692_test.patch > > > Backstory... > Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's > {{corruptFiles}} to introduce corruption into the "leader" node's index and > then assert that this solr node gives up it's leadership of the shard and > another replica takes over. > This can currently fail sporadically (but usually reproducibly - see > SOLR-13237) due to the leader not giving up it's leadership even after the > corruption causes an update/commit to fail. Solr's leadership code makes this > decision after encountering an exception from the IndexWriter based on wether > {{IndexWriter.getTragicException()}} is (non-)null. > > While investigating this, I created an isolated Lucene-Core equivilent test > that demonstrates the same basic situation: > * Gradually cause corruption on an index untill (otherwise) valid execution > of IW.add() + IW.commit() calls throw an exception to the IW client. > * assert that if an exception is thrown to the IW client, > {{getTragicException()}} is now non-null. > It's fairly easy to make my new test fail reproducibly – in every situation > I've seen the underlying exception is a {{NoSuchFileException}} (ie: the > randomly introduced corruption was to delete some file). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8671) Add setting for moving FST offheap/onheap
[ https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785703#comment-16785703 ] Simon Willnauer commented on LUCENE-8671: - I don't think we should add a setter to FieldInfo. This is a code-private thing and should be treated this way. This looks like we need to have a way to pass more info down when we open new SegmentReaders. I wonder if we can accept a simple Map on {noformat} public static DirectoryReader open(final IndexWriter writer, boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {noformat} We can then pass it down to the relevant parts and make it part of `SegmentReaderState`? This map can also be passed via IndexWriterConfig for the NRT case. That way we can pass stuff per DirectoryReader open which is what we want I guess. > Add setting for moving FST offheap/onheap > - > > Key: LUCENE-8671 > URL: https://issues.apache.org/jira/browse/LUCENE-8671 > Project: Lucene - Core > Issue Type: New Feature > Components: core/FSTs, core/store >Reporter: Ankit Jain >Priority: Minor > Attachments: offheap_generic_settings.patch, offheap_settings.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > While LUCENE-8635, adds support for loading FST offheap using mmap, users do > not have the flexibility to specify fields for which FST needs to be > offheap. This allows users to tune heap usage as per their workload. > Ideal way will be to add an attribute to FieldInfo, where we have > put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the > appropriate On/OffHeapStore when creating its FST. It can support special > keywords like ALL/NONE. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() nay not reflect all corrupting exceptions (notably: NoSuchFileException)
[ https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785625#comment-16785625 ] Simon Willnauer commented on LUCENE-8692: - {quote} For now I've updated the patch to take the simplest possible approach to checking for MergeAbortedException {quote} +1 {quote} Well, to flip your question around: is there an example of a Throwable you can think of bubbling up out of IndexWriter.startCommit() that should NOT be considered fatal? {quote} I think we need to be careful here. From my perspective there are 3 types of exceptions here: * unrecoverable exceptions aka. VirtualMachineErrors * exceptions that happen during indexing and are not recoverable (these are handled in DocumentsWriter) * exceptions that cause dataloss or inconsistencies (we didn't handle those as fatal yet at least not consistently) but we only catch VirtualMachineError. Those are in particular: * getReader() * deleteAll() * addIndexes() * flushNextBuffer() * prepareCommitInternal() * doFlush() * startCommit() Those methods might cause documents go missing etc. but we treated them not as fatal or tragic events since a user could always call rollback() to go back the the last known safe-point / previous commit. Now we can debate if we want to change this and we can, in-fact I am all for making it even more strict especially since it's inconsistent with what we do if addDocument fails with an aborting exception. If we do that we need to see if rollback still has a purpose and maybe remove it? now speaking of maybeMerge I don't see why we need to close the index writer with a tragic event, there is no dataloss nor an inconsistency? From that logic I don't think we need to handle these exceptions in such a drastic way? {quote} I don't use github for lucene development – I track all contributions as patches in the official issue tracker for the project as recommended by our official guidelines : ) ... but i'll go ahead and create a jira/LUCENE-8692 branch if that will help you review. {quote} Bummer, I am not sure branches help. Working like it's still 1999 is a pain we should fix our guidelines. > IndexWriter.getTragicException() nay not reflect all corrupting exceptions > (notably: NoSuchFileException) > - > > Key: LUCENE-8692 > URL: https://issues.apache.org/jira/browse/LUCENE-8692 > Project: Lucene - Core > Issue Type: Bug >Reporter: Hoss Man >Priority: Major > Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch, > LUCENE-8692_test.patch > > > Backstory... > Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's > {{corruptFiles}} to introduce corruption into the "leader" node's index and > then assert that this solr node gives up it's leadership of the shard and > another replica takes over. > This can currently fail sporadically (but usually reproducibly - > seeSOLR-13237) due to the leader not giving up it's leadership even after the > corruption causes an update/commit to fail. Solr's leadership code makes > this decision after encountering an exception from the IndexWriter based on > wether {{IndexWriter.getTragicException()}} is (non-)null. > > While investigating this, I created an isolated Lucene-Core equivilent test > that demonstrates the same basic situation: > * Gradually cause corruption on an index untill (otherwise) valid execution > of IW.add() + IW.commit() calls throw an exception to the IW client. > * assert that if an exception is thrown to the IW client, > {{getTragicException()}} is now non-null. > It's fairly easy to make my new test fail reproducibly -- in every situation > I've seen the underlying exception is a {{NoSuchFileException}} (ie: the > randomly introduced corruption was to delete some file). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() nay not reflect all corrupting exceptions (notably: NoSuchFileException)
[ https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784438#comment-16784438 ] Simon Willnauer commented on LUCENE-8692: - {noformat} I think there is an issue with the patch with MergeAbortedExeption indeed given that registerMerge might throw such an exception. Maybe we should move this try block to registerMerge instead where we know which OneMerge is being registered (and is also where the exception is thrown when estimating the size of the merge). {noformat} +1 {code:java} -} catch (VirtualMachineError tragedy) { +} catch (Throwable tragedy) { tragicEvent(tragedy, "startCommit"); {code} I am not sure why we need to treat every exception as fatal in this case? I also wonder if we could move this to a PR on github, iterations would be simpler and comments too. I can't tell which patch is relevant which one isn't. > IndexWriter.getTragicException() nay not reflect all corrupting exceptions > (notably: NoSuchFileException) > - > > Key: LUCENE-8692 > URL: https://issues.apache.org/jira/browse/LUCENE-8692 > Project: Lucene - Core > Issue Type: Bug >Reporter: Hoss Man >Priority: Major > Attachments: LUCENE-8692.patch, LUCENE-8692.patch, > LUCENE-8692_test.patch > > > Backstory... > Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's > {{corruptFiles}} to introduce corruption into the "leader" node's index and > then assert that this solr node gives up it's leadership of the shard and > another replica takes over. > This can currently fail sporadically (but usually reproducibly - > seeSOLR-13237) due to the leader not giving up it's leadership even after the > corruption causes an update/commit to fail. Solr's leadership code makes > this decision after encountering an exception from the IndexWriter based on > wether {{IndexWriter.getTragicException()}} is (non-)null. > > While investigating this, I created an isolated Lucene-Core equivilent test > that demonstrates the same basic situation: > * Gradually cause corruption on an index untill (otherwise) valid execution > of IW.add() + IW.commit() calls throw an exception to the IW client. > * assert that if an exception is thrown to the IW client, > {{getTragicException()}} is now non-null. > It's fairly easy to make my new test fail reproducibly -- in every situation > I've seen the underlying exception is a {{NoSuchFileException}} (ie: the > randomly introduced corruption was to delete some file). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773011#comment-16773011 ] Simon Willnauer commented on LUCENE-3041: - [~romseygeek] any chance you can open a PR for this. Patches are so hard to review and comment on > Support Query Visting / Walking > --- > > Key: LUCENE-3041 > URL: https://issues.apache.org/jira/browse/LUCENE-3041 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Chris Male >Assignee: Simon Willnauer >Priority: Minor > Fix For: 4.9, 6.0 > > Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, > LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch > > > Out of the discussion in LUCENE-2868, it could be useful to add a generic > Query Visitor / Walker that could be used for more advanced rewriting, > optimizations or anything that requires state to be stored as each Query is > visited. > We could keep the interface very simple: > {code} > public interface QueryVisitor { > Query visit(Query query); > } > {code} > and then use a reflection based visitor like Earwin suggested, which would > allow implementators to provide visit methods for just Querys that they are > interested in. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
[ https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771013#comment-16771013 ] Simon Willnauer commented on LUCENE-8292: - [~dsmiley] I coordinated this with [~romseygeek] given that we had to respin for https://issues.apache.org/jira/browse/SOLR-13126 anyhow. > Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods > -- > > Key: LUCENE-8292 > URL: https://issues.apache.org/jira/browse/LUCENE-8292 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.2.1 >Reporter: Bruno Roustant >Priority: Major > Fix For: trunk, 8.0, 8.x, master (9.0) > > Attachments: > 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, > LUCENE-8292.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many > methods. > It misses some seekExact() methods, thus it is not possible to the delegate > to override these methods to have specific behavior (unlike the TermsEnum API > which allows that). > The fix is straightforward: simply override these seekExact() methods and > delegate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
[ https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8292. - Resolution: Fixed Fix Version/s: master (9.0) 8.x 8.0 > Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods > -- > > Key: LUCENE-8292 > URL: https://issues.apache.org/jira/browse/LUCENE-8292 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.2.1 >Reporter: Bruno Roustant >Priority: Major > Fix For: trunk, 8.0, 8.x, master (9.0) > > Attachments: > 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, > LUCENE-8292.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many > methods. > It misses some seekExact() methods, thus it is not possible to the delegate > to override these methods to have specific behavior (unlike the TermsEnum API > which allows that). > The fix is straightforward: simply override these seekExact() methods and > delegate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
[ https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769324#comment-16769324 ] Simon Willnauer commented on LUCENE-8292: - I opened a PR here https://github.com/apache/lucene-solr/pull/574 > Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods > -- > > Key: LUCENE-8292 > URL: https://issues.apache.org/jira/browse/LUCENE-8292 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.2.1 >Reporter: Bruno Roustant >Priority: Major > Fix For: trunk > > Attachments: > 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, > LUCENE-8292.patch > > Time Spent: 10m > Remaining Estimate: 0h > > FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many > methods. > It misses some seekExact() methods, thus it is not possible to the delegate > to override these methods to have specific behavior (unlike the TermsEnum API > which allows that). > The fix is straightforward: simply override these seekExact() methods and > delegate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
[ https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767061#comment-16767061 ] Simon Willnauer commented on LUCENE-8292: - I do see both points here. [~dsmiley] I hate how trappy this is and [~jpountz] I completely agree with you. My suggestions here would be to add an additional class TermsEnum that has all methods abstract and BaseTermsEnum that can add default impls. FilterTermsEnum then subclasses TermsEnum and does the right thing. Other classes that don't need to override stuff like seekExact and seek(BytesRef, TermState) / TermState termState() can simply subclass BaseTermsEnum and we don't have to duplicate code all over the place. I don't think we need to do this in other places were we have the same pattern but in this case the traps are significant and we can fix it with a simple class in-between? > Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods > -- > > Key: LUCENE-8292 > URL: https://issues.apache.org/jira/browse/LUCENE-8292 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.2.1 >Reporter: Bruno Roustant >Priority: Major > Fix For: trunk > > Attachments: > 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, > LUCENE-8292.patch > > > FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many > methods. > It misses some seekExact() methods, thus it is not possible to the delegate > to override these methods to have specific behavior (unlike the TermsEnum API > which allows that). > The fix is straightforward: simply override these seekExact() methods and > delegate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract + delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763762#comment-16763762 ] Simon Willnauer commented on LUCENE-8662: - [~tomasflobbe] yes I think this should go into 8.0 - feel free to pull it in, I will do it next week once I am back at the keyboard. > Change TermsEnum.seekExact(BytesRef) to abstract + delegate > seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > --- > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 50m > Remaining Estimate: 0h > > Recently in our production, we found that Solr uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8664) Add equals/hashcode to TotalHits
[ https://issues.apache.org/jira/browse/LUCENE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8664. - Resolution: Fixed Fix Version/s: master (9.0) 8.0 > Add equals/hashcode to TotalHits > > > Key: LUCENE-8664 > URL: https://issues.apache.org/jira/browse/LUCENE-8664 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Luca Cavanna >Priority: Minor > Fix For: 8.0, master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > I think it would be convenient to add equals/hashcode methods to the > TotalHits class. I opened a PR here: > [https://github.com/apache/lucene-solr/pull/552] . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8664) Add equals/hashcode to TotalHits
[ https://issues.apache.org/jira/browse/LUCENE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756032#comment-16756032 ] Simon Willnauer commented on LUCENE-8664: - pushed - thanks [~lucacavanna] > Add equals/hashcode to TotalHits > > > Key: LUCENE-8664 > URL: https://issues.apache.org/jira/browse/LUCENE-8664 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Luca Cavanna >Priority: Minor > Fix For: 8.0, master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > I think it would be convenient to add equals/hashcode methods to the > TotalHits class. I opened a PR here: > [https://github.com/apache/lucene-solr/pull/552] . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8664) Add equals/hashcode to TotalHits
[ https://issues.apache.org/jira/browse/LUCENE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754987#comment-16754987 ] Simon Willnauer commented on LUCENE-8664: - [~lucacavanna] what's the usecase for this? Why are you trying to put this into a map or something? Can you explain this a bit further? > Add equals/hashcode to TotalHits > > > Key: LUCENE-8664 > URL: https://issues.apache.org/jira/browse/LUCENE-8664 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Luca Cavanna >Priority: Minor > > I think it would be convenient to add equals/hashcode methods to the > TotalHits class. I opened a PR here: > [https://github.com/apache/lucene-solr/pull/552] . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754984#comment-16754984 ] Simon Willnauer commented on LUCENE-8662: - {noformat} If we think that it's a trap, we should remove the default impl and make it abstract (in 8.0). {noformat} I agree with this. I think it can be trappy and such an expert API shouldn't. Let make it abstract? > Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8639) SeqNo accounting in IW is broken if many threads start indexing while we flush.
[ https://issues.apache.org/jira/browse/LUCENE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8639. - Resolution: Fixed Fix Version/s: master (9.0) 7.7 8.0 > SeqNo accounting in IW is broken if many threads start indexing while we > flush. > --- > > Key: LUCENE-8639 > URL: https://issues.apache.org/jira/browse/LUCENE-8639 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: 8.0, 7.7, master (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > While this is rare in the wild we have a test failure that shows that our > seqNo accounting is broken when we carry over seqNo to a new delete queue. > We had this test-failure: > {noformat} > 6:06:08[junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs > 16:06:08[junit4] 2> ??? 14, 2019 9:05:46 ? > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 16:06:08[junit4] 2> WARNING: Uncaught exception in thread: > Thread[Thread-8,5,TGRP-TestIndexTooManyDocs] > 16:06:08[junit4] 2> java.lang.AssertionError: seqNo=7 vs maxSeqNo=6 > 16:06:08[junit4] 2> at > __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494) > 16:06:08[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) > 16:06:08[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586) > 16:06:08[junit4] 2> at > org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70) > 16:06:08[junit4] 2> at java.lang.Thread.run(Thread.java:748) > 16:06:08[junit4] 2> > 16:06:08[junit4] 2> ??? 14, 2019 9:05:46 ? > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 16:06:08[junit4] 2> WARNING: Uncaught exception in thread: > Thread[Thread-9,5,TGRP-TestIndexTooManyDocs] > 16:06:08[junit4] 2> java.lang.AssertionError: seqNo=6 vs maxSeqNo=6 > 16:06:08[junit4] 2> at > __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494) > 16:06:08[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) > 16:06:08[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586) > 16:06:08[junit4] 2> at > org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70) > 16:06:08[junit4] 2> at java.lang.Thread.run(Thread.java:748) > 16:06:08[junit4] 2> > 16:06:08[junit4] 2> ??? 14, 2019 11:05:45 ? > com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate > 16:06:08[junit4] 2> WARNING: Suite execution timed out: > org.apache.lucene.index.TestIndexTooManyDocs > 16:06:08[junit4] 2>1) Thread[id=20, > name=SUITE-TestIndexTooManyDocs-seed#[43B7C75B765AFEBD], state=RUNNABLE, > group=TGRP-TestIndexTooManyDocs] > 16:06:08[junit4] 2> at > java.lang.Thread.getStackTrace(Thread.j
[jira] [Commented] (LUCENE-8639) SeqNo accounting in IW is broken if many threads start indexing while we flush.
[ https://issues.apache.org/jira/browse/LUCENE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743155#comment-16743155 ] Simon Willnauer commented on LUCENE-8639: - [~mikemccand] can you take a look at the PR? > SeqNo accounting in IW is broken if many threads start indexing while we > flush. > --- > > Key: LUCENE-8639 > URL: https://issues.apache.org/jira/browse/LUCENE-8639 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > While this is rare in the wild we have a test failure that shows that our > seqNo accounting is broken when we carry over seqNo to a new delete queue. > We had this test-failure: > {noformat} > 6:06:08[junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs > 16:06:08[junit4] 2> ??? 14, 2019 9:05:46 ? > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 16:06:08[junit4] 2> WARNING: Uncaught exception in thread: > Thread[Thread-8,5,TGRP-TestIndexTooManyDocs] > 16:06:08[junit4] 2> java.lang.AssertionError: seqNo=7 vs maxSeqNo=6 > 16:06:08[junit4] 2> at > __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494) > 16:06:08[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) > 16:06:08[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586) > 16:06:08[junit4] 2> at > org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70) > 16:06:08[junit4] 2> at java.lang.Thread.run(Thread.java:748) > 16:06:08[junit4] 2> > 16:06:08[junit4] 2> ??? 14, 2019 9:05:46 ? > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 16:06:08[junit4] 2> WARNING: Uncaught exception in thread: > Thread[Thread-9,5,TGRP-TestIndexTooManyDocs] > 16:06:08[junit4] 2> java.lang.AssertionError: seqNo=6 vs maxSeqNo=6 > 16:06:08[junit4] 2> at > __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264) > 16:06:08[junit4] 2> at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494) > 16:06:08[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) > 16:06:08[junit4] 2> at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586) > 16:06:08[junit4] 2> at > org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70) > 16:06:08[junit4] 2> at java.lang.Thread.run(Thread.java:748) > 16:06:08[junit4] 2> > 16:06:08[junit4] 2> ??? 14, 2019 11:05:45 ? > com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate > 16:06:08[junit4] 2> WARNING: Suite execution timed out: > org.apache.lucene.index.TestIndexTooManyDocs > 16:06:08[junit4] 2>1) Thread[id=20, > name=SUITE-TestIndexTooManyDocs-seed#[43B7C75B765AFEBD], state=RUNNABLE, > group=TGRP-TestIndexTooManyDocs] > 16:06:08[junit4] 2> at > java.lang.Thread.getStackTrace(Thread.java:1559) > 16:06:08[junit4] 2> at
[jira] [Created] (LUCENE-8639) SeqNo accounting in IW is broken if many threads start indexing while we flush.
Simon Willnauer created LUCENE-8639: --- Summary: SeqNo accounting in IW is broken if many threads start indexing while we flush. Key: LUCENE-8639 URL: https://issues.apache.org/jira/browse/LUCENE-8639 Project: Lucene - Core Issue Type: Improvement Reporter: Simon Willnauer While this is rare in the wild we have a test failure that shows that our seqNo accounting is broken when we carry over seqNo to a new delete queue. We had this test-failure: {noformat} 6:06:08[junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs 16:06:08[junit4] 2> ??? 14, 2019 9:05:46 ? com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException 16:06:08[junit4] 2> WARNING: Uncaught exception in thread: Thread[Thread-8,5,TGRP-TestIndexTooManyDocs] 16:06:08[junit4] 2> java.lang.AssertionError: seqNo=7 vs maxSeqNo=6 16:06:08[junit4] 2> at __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494) 16:06:08[junit4] 2> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) 16:06:08[junit4] 2> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586) 16:06:08[junit4] 2> at org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70) 16:06:08[junit4] 2> at java.lang.Thread.run(Thread.java:748) 16:06:08[junit4] 2> 16:06:08[junit4] 2> ??? 14, 2019 9:05:46 ? com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException 16:06:08[junit4] 2> WARNING: Uncaught exception in thread: Thread[Thread-9,5,TGRP-TestIndexTooManyDocs] 16:06:08[junit4] 2> java.lang.AssertionError: seqNo=6 vs maxSeqNo=6 16:06:08[junit4] 2> at __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264) 16:06:08[junit4] 2> at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494) 16:06:08[junit4] 2> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) 16:06:08[junit4] 2> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586) 16:06:08[junit4] 2> at org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70) 16:06:08[junit4] 2> at java.lang.Thread.run(Thread.java:748) 16:06:08[junit4] 2> 16:06:08[junit4] 2> ??? 14, 2019 11:05:45 ? com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate 16:06:08[junit4] 2> WARNING: Suite execution timed out: org.apache.lucene.index.TestIndexTooManyDocs 16:06:08[junit4] 2>1) Thread[id=20, name=SUITE-TestIndexTooManyDocs-seed#[43B7C75B765AFEBD], state=RUNNABLE, group=TGRP-TestIndexTooManyDocs] 16:06:08[junit4] 2> at java.lang.Thread.getStackTrace(Thread.java:1559) 16:06:08[junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:696) 16:06:08[junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:693) 16:06:08[junit4] 2> at java.security.AccessController.doPrivileged(Native Method) 16:06:08[junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl.getStackTrace(ThreadLeakControl.java:693) 16:06:08[junit4] 2> at
[jira] [Commented] (LUCENE-8525) throw more specific exception on data corruption
[ https://issues.apache.org/jira/browse/LUCENE-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740186#comment-16740186 ] Simon Willnauer commented on LUCENE-8525: - I do agree with [~rcmuir] here. There is not much to do in terms of detecting this particular problem on DataInput and friends. One way to improve this would certainly be the wording on the java doc. We can just clarify that detecting _CorruptIndexException_ is best effort. Another idea is to checksum the entire file before we read the commit we can either do this on the Elasticsearch end or improve _SegmentInfos#readCommit_ . Reading this file twice isn't a big deal I guess. > throw more specific exception on data corruption > > > Key: LUCENE-8525 > URL: https://issues.apache.org/jira/browse/LUCENE-8525 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Vladimir Dolzhenko >Priority: Major > > DataInput throws generic IOException if data looks odd > [DataInput:141|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141] > there are other examples like > [BufferedIndexInput:219|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java#L219], > > [CompressionMode:226|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L226] > and maybe > [DocIdsWriter:81|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L81] > That leads to some difficulties - see [elasticsearch > #34322|https://github.com/elastic/elasticsearch/issues/34322] > It would be better if it throws more specific exception. > As a consequence > [SegmentInfos.readCommit|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L281] > violates its own contract > {code:java} > /** >* @throws CorruptIndexException if the index is corrupt >* @throws IOException if there is a low-level IO error >*/ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8609) Allow getting consistent docstats from IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722290#comment-16722290 ] Simon Willnauer commented on LUCENE-8609: - [~sokolov] I opened [https://github.com/mikemccand/luceneutil/pull/28/] /cc [~mikemccand] > Allow getting consistent docstats from IndexWriter > -- > > Key: LUCENE-8609 > URL: https://issues.apache.org/jira/browse/LUCENE-8609 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (8.0), 7.7 >Reporter: Simon Willnauer >Priority: Major > Fix For: master (8.0), 7.7 > > Time Spent: 50m > Remaining Estimate: 0h > > Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough > to get all stats for the current index but it's subject to concurrency > and might return numbers that are not consistent ie. some cases can > return maxDoc < numDocs which is undesirable. This change adds a > getDocStats() > method to index writer to allow fetching consistent numbers for these > stats. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8609) Allow getting consistent docstats from IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8609. - Resolution: Fixed Fix Version/s: 7.7 master (8.0) thanks everybody > Allow getting consistent docstats from IndexWriter > -- > > Key: LUCENE-8609 > URL: https://issues.apache.org/jira/browse/LUCENE-8609 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (8.0), 7.7 >Reporter: Simon Willnauer >Priority: Major > Fix For: master (8.0), 7.7 > > Time Spent: 50m > Remaining Estimate: 0h > > Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough > to get all stats for the current index but it's subject to concurrency > and might return numbers that are not consistent ie. some cases can > return maxDoc < numDocs which is undesirable. This change adds a > getDocStats() > method to index writer to allow fetching consistent numbers for these > stats. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8609) Allow getting consistent docstats from IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721587#comment-16721587 ] Simon Willnauer commented on LUCENE-8609: - [~mikemccand] [~jpountz] [~dnhatn] I pushed new changes to the PR > Allow getting consistent docstats from IndexWriter > -- > > Key: LUCENE-8609 > URL: https://issues.apache.org/jira/browse/LUCENE-8609 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (8.0), 7.7 >Reporter: Simon Willnauer >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough > to get all stats for the current index but it's subject to concurrency > and might return numbers that are not consistent ie. some cases can > return maxDoc < numDocs which is undesirable. This change adds a > getDocStats() > method to index writer to allow fetching consistent numbers for these > stats. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8609) Allow getting consistent docstats from IndexWriter
Simon Willnauer created LUCENE-8609: --- Summary: Allow getting consistent docstats from IndexWriter Key: LUCENE-8609 URL: https://issues.apache.org/jira/browse/LUCENE-8609 Project: Lucene - Core Issue Type: Improvement Affects Versions: master (8.0), 7.7 Reporter: Simon Willnauer Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough to get all stats for the current index but it's subject to concurrency and might return numbers that are not consistent ie. some cases can return maxDoc < numDocs which is undesirable. This change adds a getDocStats() method to index writer to allow fetching consistent numbers for these stats. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8609) Allow getting consistent docstats from IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720279#comment-16720279 ] Simon Willnauer commented on LUCENE-8609: - one question here is if we should deprecate the `maxDoc` / `numDocs` methods in favor of this? > Allow getting consistent docstats from IndexWriter > -- > > Key: LUCENE-8609 > URL: https://issues.apache.org/jira/browse/LUCENE-8609 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (8.0), 7.7 >Reporter: Simon Willnauer >Priority: Major > > Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough > to get all stats for the current index but it's subject to concurrency > and might return numbers that are not consistent ie. some cases can > return maxDoc < numDocs which is undesirable. This change adds a > getDocStats() > method to index writer to allow fetching consistent numbers for these > stats. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8608) Extract utility class to iterate over terms docs
[ https://issues.apache.org/jira/browse/LUCENE-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8608. - Resolution: Fixed > Extract utility class to iterate over terms docs > > > Key: LUCENE-8608 > URL: https://issues.apache.org/jira/browse/LUCENE-8608 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: master (8.0), 7.7 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Today we re-implement the same algorithm in various places > when we want to consume all docs for a set/list of terms. This > caused serious slowdowns for instance in the case of applying > updates fixed in LUCENE-8602. This change extracts the common > usage and shares the interation code including logic to reuse > Terms and PostingsEnum instances as much as possble and adds > tests for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8608) Extract utility class to iterate over terms docs
Simon Willnauer created LUCENE-8608: --- Summary: Extract utility class to iterate over terms docs Key: LUCENE-8608 URL: https://issues.apache.org/jira/browse/LUCENE-8608 Project: Lucene - Core Issue Type: Improvement Reporter: Simon Willnauer Fix For: master (8.0), 7.7 Today we re-implement the same algorithm in various places when we want to consume all docs for a set/list of terms. This caused serious slowdowns for instance in the case of applying updates fixed in LUCENE-8602. This change extracts the common usage and shares the interation code including logic to reuse Terms and PostingsEnum instances as much as possble and adds tests for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8600) DocValuesFieldUpdates should use a better sort
[ https://issues.apache.org/jira/browse/LUCENE-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718992#comment-16718992 ] Simon Willnauer commented on LUCENE-8600: - to be honest I am not very much concerned about this causing OOMs. In the worst case we would use 4 byte per ord x the number of updates in the package that means we need about 300k updates to consume ~1MB of RAM here. I think that is an unlikely scenario. Additionally this is transient memory so I thin we are good here [~dweiss] +1 to the patch from my side > DocValuesFieldUpdates should use a better sort > -- > > Key: LUCENE-8600 > URL: https://issues.apache.org/jira/browse/LUCENE-8600 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8600.patch > > > This is a follow-up to LUCENE-8598: Simon identified that swaps are a > bottleneck to applying doc-value updates, in particular due to the overhead > of packed ints. It turns out that InPlaceMergeSorter does LOTS of swaps in > order to perform in-place. Replacing with a more efficient sort should help. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8602) Share TermsEnum if possible while applying DV updates
[ https://issues.apache.org/jira/browse/LUCENE-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8602. - Resolution: Fixed > Share TermsEnum if possible while applying DV updates > -- > > Key: LUCENE-8602 > URL: https://issues.apache.org/jira/browse/LUCENE-8602 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: master (8.0), 7.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Today we pull a new terms enum when we apply DV updates even though the > field stays the same which is the common case. Benchmarking this on a > larger term dictionary with a significant number of updates shows a > 2x improvement in performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8602) Share TermsEnum if possible while applying DV updates
Simon Willnauer created LUCENE-8602: --- Summary: Share TermsEnum if possible while applying DV updates Key: LUCENE-8602 URL: https://issues.apache.org/jira/browse/LUCENE-8602 Project: Lucene - Core Issue Type: Improvement Reporter: Simon Willnauer Fix For: master (8.0), 7.7 Today we pull a new terms enum when we apply DV updates even though the field stays the same which is the common case. Benchmarking this on a larger term dictionary with a significant number of updates shows a 2x improvement in performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8599) Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates
[ https://issues.apache.org/jira/browse/LUCENE-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8599. - Resolution: Fixed > Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates > --- > > Key: LUCENE-8599 > URL: https://issues.apache.org/jira/browse/LUCENE-8599 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: master (8.0), 7.7 > > Time Spent: 40m > Remaining Estimate: 0h > > Using a sparse bitset in SingleValueDocValuesFieldUdpates allows storing > which documents have an update much more efficient and prevents the need > to sort the docs array altogether that showed to be a significant > bottleneck > in LUCENE-8598. Using the spares bitset yields another 10x performance > improvement > in applying updates versus the changes proposed in LUCENE-8598. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8598) Improve field updates packed values
[ https://issues.apache.org/jira/browse/LUCENE-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8598. - Resolution: Fixed > Improve field updates packed values > --- > > Key: LUCENE-8598 > URL: https://issues.apache.org/jira/browse/LUCENE-8598 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: master (8.0), 7.7 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > DocValuesFieldUpdats are using compact settings for packet ints that causes > dramatic slowdowns when the updates are finished and sorted. Moving to > the default > accepted overhead ratio yields up to 4x improvements in applying updates. > This change > also improves the packing of numeric values since we know the value range > in advance and > can choose a different packing scheme in such a case. > Overall this change yields a good performance improvement since 99% of > the times of applying > DV field updates are spend in the sort method which essentially makes > applying the updates > 4x faster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8599) Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates
Simon Willnauer created LUCENE-8599: --- Summary: Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates Key: LUCENE-8599 URL: https://issues.apache.org/jira/browse/LUCENE-8599 Project: Lucene - Core Issue Type: Improvement Reporter: Simon Willnauer Fix For: master (8.0), 7.7 Using a sparse bitset in SingleValueDocValuesFieldUdpates allows storing which documents have an update much more efficient and prevents the need to sort the docs array altogether that showed to be a significant bottleneck in LUCENE-8598. Using the spares bitset yields another 10x performance improvement in applying updates versus the changes proposed in LUCENE-8598. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8598) Improve field updates packed values
[ https://issues.apache.org/jira/browse/LUCENE-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714051#comment-16714051 ] Simon Willnauer commented on LUCENE-8598: - I ran a benchmark to update 1 values on a single segment 100 times: ||setup||patch time in ms||master time in ms|| |shared single value|10131|38430| |random values|30985|69600| > Improve field updates packed values > --- > > Key: LUCENE-8598 > URL: https://issues.apache.org/jira/browse/LUCENE-8598 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: master (8.0), 7.7 > > Time Spent: 20m > Remaining Estimate: 0h > > DocValuesFieldUpdats are using compact settings for packet ints that causes > dramatic slowdowns when the updates are finished and sorted. Moving to > the default > accepted overhead ratio yields up to 4x improvements in applying updates. > This change > also improves the packing of numeric values since we know the value range > in advance and > can choose a different packing scheme in such a case. > Overall this change yields a good performance improvement since 99% of > the times of applying > DV field updates are spend in the sort method which essentially makes > applying the updates > 4x faster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8598) Improve field updates packed values
[ https://issues.apache.org/jira/browse/LUCENE-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714051#comment-16714051 ] Simon Willnauer edited comment on LUCENE-8598 at 12/9/18 6:28 PM: -- I ran a benchmark to update 1 values on a single segment 100 times: ||setup||patch time in ms||master time in ms|| |shared single value|10131|38430| |random values|30985|69600| the reason I looked into it is that I wrote the benchmark to test another change that I made and saw the sorting showing up in a profiler spending 99% in the finish method. I also tested other acceptable overhead rations but they didn't show any speedups ie. FAST and FASTEST. was (Author: simonw): I ran a benchmark to update 1 values on a single segment 100 times: ||setup||patch time in ms||master time in ms|| |shared single value|10131|38430| |random values|30985|69600| > Improve field updates packed values > --- > > Key: LUCENE-8598 > URL: https://issues.apache.org/jira/browse/LUCENE-8598 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: master (8.0), 7.7 > > Time Spent: 20m > Remaining Estimate: 0h > > DocValuesFieldUpdats are using compact settings for packet ints that causes > dramatic slowdowns when the updates are finished and sorted. Moving to > the default > accepted overhead ratio yields up to 4x improvements in applying updates. > This change > also improves the packing of numeric values since we know the value range > in advance and > can choose a different packing scheme in such a case. > Overall this change yields a good performance improvement since 99% of > the times of applying > DV field updates are spend in the sort method which essentially makes > applying the updates > 4x faster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8598) Improve field updates packed values
Simon Willnauer created LUCENE-8598: --- Summary: Improve field updates packed values Key: LUCENE-8598 URL: https://issues.apache.org/jira/browse/LUCENE-8598 Project: Lucene - Core Issue Type: Improvement Reporter: Simon Willnauer Fix For: master (8.0), 7.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8598) Improve field updates packed values
[ https://issues.apache.org/jira/browse/LUCENE-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8598: Description: DocValuesFieldUpdats are using compact settings for packet ints that causes dramatic slowdowns when the updates are finished and sorted. Moving to the default accepted overhead ratio yields up to 4x improvements in applying updates. This change also improves the packing of numeric values since we know the value range in advance and can choose a different packing scheme in such a case. Overall this change yields a good performance improvement since 99% of the times of applying DV field updates are spend in the sort method which essentially makes applying the updates 4x faster. > Improve field updates packed values > --- > > Key: LUCENE-8598 > URL: https://issues.apache.org/jira/browse/LUCENE-8598 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: master (8.0), 7.7 > > > DocValuesFieldUpdats are using compact settings for packet ints that causes > dramatic slowdowns when the updates are finished and sorted. Moving to > the default > accepted overhead ratio yields up to 4x improvements in applying updates. > This change > also improves the packing of numeric values since we know the value range > in advance and > can choose a different packing scheme in such a case. > Overall this change yields a good performance improvement since 99% of > the times of applying > DV field updates are spend in the sort method which essentially makes > applying the updates > 4x faster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8592) MultiSorter#sort incorrectly sort Integer/Long#MIN_VALUE when the natural sort is reversed
[ https://issues.apache.org/jira/browse/LUCENE-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8592: Affects Version/s: master (8.0) 7.5 Priority: Blocker (was: Major) Fix Version/s: master (8.0) 7.6 > MultiSorter#sort incorrectly sort Integer/Long#MIN_VALUE when the natural > sort is reversed > -- > > Key: LUCENE-8592 > URL: https://issues.apache.org/jira/browse/LUCENE-8592 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.5, master (8.0) >Reporter: Jim Ferenczi >Priority: Blocker > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8592.patch > > > MultiSorter#getComparableProviders on an integer or long field doesn't handle > MIN_VALUE correctly when the natural order is reversed. To handle reverse > sort we use the negation of the value but there is no check for overflows so > MIN_VALUE for ints and longs are always sorted first (even if the natural > order is reversed). > This method is used by index sorting when merging already sorted segments > together. This means that a sorted index can be incorrectly sorted if it uses > a reverse sort and a missing value set to MIN_VALUE (long or int or values > inside the segment that are equals to MIN_VALUE). > This a bad bug because it affects the documents order inside segments and > only a reindex can restore the correct sort order. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8592) MultiSorter#sort incorrectly sort Integer/Long#MIN_VALUE when the natural sort is reversed
[ https://issues.apache.org/jira/browse/LUCENE-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712870#comment-16712870 ] Simon Willnauer commented on LUCENE-8592: - the patch looks good to me. Yet, I am not 100% on top of this code if there are other places that need to be fixed. Still +1 to commit. > MultiSorter#sort incorrectly sort Integer/Long#MIN_VALUE when the natural > sort is reversed > -- > > Key: LUCENE-8592 > URL: https://issues.apache.org/jira/browse/LUCENE-8592 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi >Priority: Major > Attachments: LUCENE-8592.patch > > > MultiSorter#getComparableProviders on an integer or long field doesn't handle > MIN_VALUE correctly when the natural order is reversed. To handle reverse > sort we use the negation of the value but there is no check for overflows so > MIN_VALUE for ints and longs are always sorted first (even if the natural > order is reversed). > This method is used by index sorting when merging already sorted segments > together. This means that a sorted index can be incorrectly sorted if it uses > a reverse sort and a missing value set to MIN_VALUE (long or int or values > inside the segment that are equals to MIN_VALUE). > This a bad bug because it affects the documents order inside segments and > only a reindex can restore the correct sort order. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8595) TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails
[ https://issues.apache.org/jira/browse/LUCENE-8595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8595. - Resolution: Fixed Fix Version/s: 7.7 master (8.0) 7.6 > TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails > -- > > Key: LUCENE-8595 > URL: https://issues.apache.org/jira/browse/LUCENE-8595 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: master (8.0) >Reporter: Michael McCandless >Priority: Major > Fix For: 7.6, master (8.0), 7.7 > > Time Spent: 20m > Remaining Estimate: 0h > > It does reproduce ... I haven't dug in: > > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestMixedDocValuesUpdates > -Dtests.method=testTryUpdateMultiThreaded -Dtests.seed=E079543483688908 > -Dtests.badapples=true -Dtests.loc\ > ale=mt-MT -Dtests.timezone=VST -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII > [junit4] FAILURE 0.69s | > TestMixedDocValuesUpdates.testTryUpdateMultiThreaded <<< > [junit4] > Throwable #1: java.lang.AssertionError: docID: 63 > [junit4] > at > __randomizedtesting.SeedInfo.seed([E079543483688908:4809171572AE9A81]:0) > [junit4] > at > org.apache.lucene.index.TestMixedDocValuesUpdates.testTryUpdateMultiThreaded(TestMixedDocValuesUpdates.java:526) > [junit4] > at java.lang.Thread.run(Thread.java:745) > [junit4] 2> NOTE: test params are: codec=Asserting(Lucene80): > {id=PostingsFormat(name=LuceneVarGapFixedInterval)}, > docValues:{value=DocValuesFormat(name=Lucene70)}, maxPointsInLeafNode=13\ > 12, maxMBSortInHeap=7.5990910168370895, > sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@e08c0f3), > locale=mt-MT, timezone=VST > [junit4] 2> NOTE: Linux 4.4.0-92-generic amd64/Oracle Corporation > 1.8.0_121 (64-bit)/cpus=8,threads=1,free=446496544,total=514850816 > [junit4] 2> NOTE: All tests run in this JVM: [TestMixedDocValuesUpdates] > [junit4] Completed [1/1 (1!)] in 0.83s, 1 test, 1 failure <<< > FAILURES!{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8595) TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails
[ https://issues.apache.org/jira/browse/LUCENE-8595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712005#comment-16712005 ] Simon Willnauer commented on LUCENE-8595: - [~jpountz] I think the patch is not enough. I attached a PR including tests and an additional fix. can you take a look? > TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails > -- > > Key: LUCENE-8595 > URL: https://issues.apache.org/jira/browse/LUCENE-8595 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: master (8.0) >Reporter: Michael McCandless >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > It does reproduce ... I haven't dug in: > > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestMixedDocValuesUpdates > -Dtests.method=testTryUpdateMultiThreaded -Dtests.seed=E079543483688908 > -Dtests.badapples=true -Dtests.loc\ > ale=mt-MT -Dtests.timezone=VST -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII > [junit4] FAILURE 0.69s | > TestMixedDocValuesUpdates.testTryUpdateMultiThreaded <<< > [junit4] > Throwable #1: java.lang.AssertionError: docID: 63 > [junit4] > at > __randomizedtesting.SeedInfo.seed([E079543483688908:4809171572AE9A81]:0) > [junit4] > at > org.apache.lucene.index.TestMixedDocValuesUpdates.testTryUpdateMultiThreaded(TestMixedDocValuesUpdates.java:526) > [junit4] > at java.lang.Thread.run(Thread.java:745) > [junit4] 2> NOTE: test params are: codec=Asserting(Lucene80): > {id=PostingsFormat(name=LuceneVarGapFixedInterval)}, > docValues:{value=DocValuesFormat(name=Lucene70)}, maxPointsInLeafNode=13\ > 12, maxMBSortInHeap=7.5990910168370895, > sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@e08c0f3), > locale=mt-MT, timezone=VST > [junit4] 2> NOTE: Linux 4.4.0-92-generic amd64/Oracle Corporation > 1.8.0_121 (64-bit)/cpus=8,threads=1,free=446496544,total=514850816 > [junit4] 2> NOTE: All tests run in this JVM: [TestMixedDocValuesUpdates] > [junit4] Completed [1/1 (1!)] in 0.83s, 1 test, 1 failure <<< > FAILURES!{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8595) TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails
[ https://issues.apache.org/jira/browse/LUCENE-8595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711968#comment-16711968 ] Simon Willnauer commented on LUCENE-8595: - ++ to the patch. This makes sense. would be great if we had a test for this. > TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails > -- > > Key: LUCENE-8595 > URL: https://issues.apache.org/jira/browse/LUCENE-8595 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: master (8.0) >Reporter: Michael McCandless >Priority: Major > > It does reproduce ... I haven't dug in: > > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestMixedDocValuesUpdates > -Dtests.method=testTryUpdateMultiThreaded -Dtests.seed=E079543483688908 > -Dtests.badapples=true -Dtests.loc\ > ale=mt-MT -Dtests.timezone=VST -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII > [junit4] FAILURE 0.69s | > TestMixedDocValuesUpdates.testTryUpdateMultiThreaded <<< > [junit4] > Throwable #1: java.lang.AssertionError: docID: 63 > [junit4] > at > __randomizedtesting.SeedInfo.seed([E079543483688908:4809171572AE9A81]:0) > [junit4] > at > org.apache.lucene.index.TestMixedDocValuesUpdates.testTryUpdateMultiThreaded(TestMixedDocValuesUpdates.java:526) > [junit4] > at java.lang.Thread.run(Thread.java:745) > [junit4] 2> NOTE: test params are: codec=Asserting(Lucene80): > {id=PostingsFormat(name=LuceneVarGapFixedInterval)}, > docValues:{value=DocValuesFormat(name=Lucene70)}, maxPointsInLeafNode=13\ > 12, maxMBSortInHeap=7.5990910168370895, > sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@e08c0f3), > locale=mt-MT, timezone=VST > [junit4] 2> NOTE: Linux 4.4.0-92-generic amd64/Oracle Corporation > 1.8.0_121 (64-bit)/cpus=8,threads=1,free=446496544,total=514850816 > [junit4] 2> NOTE: All tests run in this JVM: [TestMixedDocValuesUpdates] > [junit4] Completed [1/1 (1!)] in 0.83s, 1 test, 1 failure <<< > FAILURES!{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8594) DV update are broken for updates on new field
[ https://issues.apache.org/jira/browse/LUCENE-8594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8594. - Resolution: Fixed Fix Version/s: master (8.0) > DV update are broken for updates on new field > - > > Key: LUCENE-8594 > URL: https://issues.apache.org/jira/browse/LUCENE-8594 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: master (8.0) >Reporter: Simon Willnauer >Priority: Blocker > Fix For: master (8.0) > > Time Spent: 1h 40m > Remaining Estimate: 0h > > A segmemnt written with Lucene70Codec failes if it ties to update > a DV field that didn't exist in the index before it was upgraded to > Lucene80Codec. We bake the DV format into the FieldInfo when it's used > the first time and therefor never go to the codec if we need to update. > yet on a field that didn't exist before and was added during an indexing > operation we have to consult the coded and get an exception. > This change fixes this issue and adds the relevant bwc tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8594) DV update are broken for updates on new field
[ https://issues.apache.org/jira/browse/LUCENE-8594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8594: Affects Version/s: (was: 7.7) > DV update are broken for updates on new field > - > > Key: LUCENE-8594 > URL: https://issues.apache.org/jira/browse/LUCENE-8594 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: master (8.0) >Reporter: Simon Willnauer >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > A segmemnt written with Lucene70Codec failes if it ties to update > a DV field that didn't exist in the index before it was upgraded to > Lucene80Codec. We bake the DV format into the FieldInfo when it's used > the first time and therefor never go to the codec if we need to update. > yet on a field that didn't exist before and was added during an indexing > operation we have to consult the coded and get an exception. > This change fixes this issue and adds the relevant bwc tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8594) DV update are broken for updates on new field
Simon Willnauer created LUCENE-8594: --- Summary: DV update are broken for updates on new field Key: LUCENE-8594 URL: https://issues.apache.org/jira/browse/LUCENE-8594 Project: Lucene - Core Issue Type: Improvement Affects Versions: master (8.0), 7.7 Reporter: Simon Willnauer A segmemnt written with Lucene70Codec failes if it ties to update a DV field that didn't exist in the index before it was upgraded to Lucene80Codec. We bake the DV format into the FieldInfo when it's used the first time and therefor never go to the codec if we need to update. yet on a field that didn't exist before and was added during an indexing operation we have to consult the coded and get an exception. This change fixes this issue and adds the relevant bwc tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8594) DV update are broken for updates on new field
[ https://issues.apache.org/jira/browse/LUCENE-8594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8594: Issue Type: Bug (was: Improvement) > DV update are broken for updates on new field > - > > Key: LUCENE-8594 > URL: https://issues.apache.org/jira/browse/LUCENE-8594 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: master (8.0), 7.7 >Reporter: Simon Willnauer >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > A segmemnt written with Lucene70Codec failes if it ties to update > a DV field that didn't exist in the index before it was upgraded to > Lucene80Codec. We bake the DV format into the FieldInfo when it's used > the first time and therefor never go to the codec if we need to update. > yet on a field that didn't exist before and was added during an indexing > operation we have to consult the coded and get an exception. > This change fixes this issue and adds the relevant bwc tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8593) Specialize single value numeric DV updates
[ https://issues.apache.org/jira/browse/LUCENE-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8593. - Resolution: Fixed > Specialize single value numeric DV updates > -- > > Key: LUCENE-8593 > URL: https://issues.apache.org/jira/browse/LUCENE-8593 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Fix For: master (8.0), 7.7 > > Time Spent: 40m > Remaining Estimate: 0h > > The case when all values are the the same on a numeric field update > is common for soft_deletes. With the new infrastucture for buffering > DV updates we can gain an easy win by specializing the applied updates > if all values are the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8593) Specialize single value numeric DV updates
Simon Willnauer created LUCENE-8593: --- Summary: Specialize single value numeric DV updates Key: LUCENE-8593 URL: https://issues.apache.org/jira/browse/LUCENE-8593 Project: Lucene - Core Issue Type: Improvement Reporter: Simon Willnauer Fix For: master (8.0), 7.7 The case when all values are the the same on a numeric field update is common for soft_deletes. With the new infrastucture for buffering DV updates we can gain an easy win by specializing the applied updates if all values are the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org