from:"Simon Willnauer \(JIRA\)"

[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete

2019-08-13 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906502#comment-16906502
 ] 

Simon Willnauer commented on LUCENE-8369:
-

+1 for option 1 above as well. Thanks [~nknize]

> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete

2019-08-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902133#comment-16902133
 ] 

Simon Willnauer commented on LUCENE-8369:
-

I don't think we should scarify the existence of LatLong point searching out of 
core for the sake of code visibility.  I think we should keep it in core and 
open up visibility to enable code-reuse in the modules and use 
_@lucene.internal_ in order to mark classes as internal and prevent users from 
complaining when the API changes. It's not ideal but progress. Can we separate 
the disucssion of getting rid of the spacial module from graduating the various 
shapes from sandbox to wherever? I think keeping a module for 2 classes doesn't 
make sense. We can move those two classes to core too or even get rid of them 
altogether I don't think it should influence the discussion if something else 
should be graduated. 

One other option would be we move all non-core spacials from sandbox to spatial 
as long as they don't add any additional dependency. that would be an 
intermediate step. we can still graduate from there then.

> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8887) CLONE - Add setting for moving FST offheap/onheap

2019-06-27 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8887.
-
Resolution: Duplicate

this seems to be opened accidentially

> CLONE - Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8887
> URL: https://issues.apache.org/jira/browse/LUCENE-8887
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: LuYunCheng
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: master (9.0), 8.1
>
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8865) Use incoming thread for execution if IndexSearcher has an executor

2019-06-25 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872690#comment-16872690
 ] 

Simon Willnauer commented on LUCENE-8865:
-

[~hypothesisx86] I didn't run any benchmarks. maybe [~mikemccand] can provide 
infos if there are improvements. 

>  Use incoming thread for execution if IndexSearcher has an executor
> ---
>
> Key: LUCENE-8865
> URL: https://issues.apache.org/jira/browse/LUCENE-8865
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Today we don't utilize the incoming thread for a search when IndexSearcher
> has an executor. This thread is only idleing but can be used to execute a 
> search
> once all other collectors are dispatched.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-06-20 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868555#comment-16868555
 ] 

Simon Willnauer commented on LUCENE-8857:
-

A couple of comments:

 * can you open a PR and associate it with this issue. Patches are so hard to 
review without context and the ability to comment
 * for the second case in IndexsSearcher should we also tie-break by doc? 
 * Can we replace the verbose comparators with _Comparator.comparingInt(d -> 
d.shardIndex);_ and _Comparator.comparingInt(d -> d.doc);_ respectively?
 * Any chance we can select the tie-breaker based on if one of the TopDocs has 
a shardIndex != -1 and assert that all of them have it or not? Another option 
would be to have only one comparator and first tie-break on shardIndex and then 
on doc since we don't set the shard index it should be fine since they are all 
-1? WDYT?

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, 
> LUCENE-8857.patch, LUCENE-8857.patch
>
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8865) Use incoming thread for execution if IndexSearcher has an executor

2019-06-18 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8865.
-
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

>  Use incoming thread for execution if IndexSearcher has an executor
> ---
>
> Key: LUCENE-8865
> URL: https://issues.apache.org/jira/browse/LUCENE-8865
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Today we don't utilize the incoming thread for a search when IndexSearcher
> has an executor. This thread is only idleing but can be used to execute a 
> search
> once all other collectors are dispatched.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-06-18 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866575#comment-16866575
 ] 

Simon Willnauer commented on LUCENE-8857:
-

Why don't we just use the comparator and have a default and a doc one? like 
this:

{code}
 Comparator defaultComparator = Comparator.comparingInt(d -> 
d.shardIndex);
 Comparator docComparator = Comparator.comparingInt(d -> d.doc);
{code}

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch
>
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8853) FileSwitchDirectory is broken if temp outputs are used

2019-06-17 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8853.
-
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

> FileSwitchDirectory is broken if temp outputs are used
> --
>
> Key: LUCENE-8853
> URL: https://issues.apache.org/jira/browse/LUCENE-8853
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> FileSwitchDirectory basically doesn't work if tmp output are used for files 
> that are explicitly mapped with extensions. here is a failing test:
> {code}
> 16:49:40[junit4] Suite: 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest
> 16:49:40[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=BlendedInfixSuggesterTest 
> -Dtests.method=testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
>  -Dtests.seed=16D8C93DC8FE5192 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=pt-LU -Dtests.timezone=US/Michigan -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1
> 16:49:40[junit4] ERROR   0.05s J1 | 
> BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
>  <<<
> 16:49:40[junit4]> Throwable #1: 
> java.nio.file.AtomicMoveNotSupportedException: _0.fdx__0.tmp -> _0.fdx: 
> source and dest are in different directories
> 16:49:40[junit4]> at 
> __randomizedtesting.SeedInfo.seed([16D8C93DC8FE5192:20E180A9490374CE]:0)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.FileSwitchDirectory.rename(FileSwitchDirectory.java:201)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.rename(LockValidatingDirectoryWrapper.java:56)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.TrackingDirectoryWrapper.rename(TrackingDirectoryWrapper.java:64)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.SortingStoredFieldsConsumer.flush(SortingStoredFieldsConsumer.java:56)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:152)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3199)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.commit(AnalyzingInfixSuggester.java:345)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:315)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.getBlendedInfixSuggester(BlendedInfixSuggesterTest.java:125)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch(BlendedInfixSuggesterTest.java:79)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 16:49:40[junit4]> at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> 16:49:40[junit4]> at 
> java.base/java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-06-17 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866253#comment-16866253
 ] 

Simon Willnauer commented on LUCENE-8857:
-

>From my perspective we should simplify this even more and remove 
>_TieBreakingParameters_. TopDocs can use _Comparator_  and default 
>to the shard index if it's not supplied. That should be sufficient?

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8857.patch, LUCENE-8857.patch
>
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8865) Use incoming thread for execution if IndexSearcher has an executor

2019-06-17 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8865:
---

 Summary:  Use incoming thread for execution if IndexSearcher has 
an executor
 Key: LUCENE-8865
 URL: https://issues.apache.org/jira/browse/LUCENE-8865
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer


Today we don't utilize the incoming thread for a search when IndexSearcher
has an executor. This thread is only idleing but can be used to execute a 
search
once all other collectors are dispatched.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved

2019-06-13 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863067#comment-16863067
 ] 

Simon Willnauer commented on LUCENE-8829:
-

{quote}
Simon Willnauer That is a fun idea, although it would still need a function to 
instruct TopDocs#merge whether to set the shard indices or not.
{quote}

I am not sure we have to. Can't a user initialize it ahead of time if 
necessary. I think if it's necessary to have this we can just iterate over it 
and set it from the outside? That should also be possible no?

> TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
> -
>
> Key: LUCENE-8829
> URL: https://issues.apache.org/jira/browse/LUCENE-8829
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, 
> LUCENE-8829.patch
>
>
> While investigating LUCENE-8819, I understood that TopDocs#merge's order of 
> results are indirectly dependent on the number of collectors involved in the 
> merge. This is troubling because 1) The number of collectors involved in a 
> merge are cost based and directly dependent on the number of slices created 
> for the parallel searcher case. 2) TopN hits code path will invoke merge with 
> a single Collector, so essentially, doing the same TopN query with single 
> threaded and parallel threaded searcher will invoke different order of 
> results, which is a bad invariant that breaks.
>  
> The reason why this happens is because of the subtle way TopDocs#merge sets 
> shardIndex in the ScoreDoc population during populating the priority queue 
> used for merging. ShardIndex is essentially set to the ordinal of the 
> collector which generates the hit. This means that the shardIndex is 
> dependent on the number of collectors, even for the same set of hits.
>  
> In case of no sort order specified, shardIndex is used for tie breaking when 
> scores are equal. This translates to different orders for same hits with 
> different shardIndices.
>  
> I propose that we remove shardIndex from the default tie breaking mechanism 
> and replace it with docID. DocID order is the de facto that is expected 
> during collection, so it might make sense to use the same factor during tie 
> breaking when scores are the same.
>  
> CC: [~ivera]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved

2019-06-12 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861848#comment-16861848
 ] 

Simon Willnauer edited comment on LUCENE-8829 at 6/12/19 8:56 AM:
--

I'd remove the _setShardIndex_ parameter alltogether and don't set it


was (Author: simonw):
I'd remove the _ setShardIndex_ parameter alltogether and don't set it

> TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
> -
>
> Key: LUCENE-8829
> URL: https://issues.apache.org/jira/browse/LUCENE-8829
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, 
> LUCENE-8829.patch
>
>
> While investigating LUCENE-8819, I understood that TopDocs#merge's order of 
> results are indirectly dependent on the number of collectors involved in the 
> merge. This is troubling because 1) The number of collectors involved in a 
> merge are cost based and directly dependent on the number of slices created 
> for the parallel searcher case. 2) TopN hits code path will invoke merge with 
> a single Collector, so essentially, doing the same TopN query with single 
> threaded and parallel threaded searcher will invoke different order of 
> results, which is a bad invariant that breaks.
>  
> The reason why this happens is because of the subtle way TopDocs#merge sets 
> shardIndex in the ScoreDoc population during populating the priority queue 
> used for merging. ShardIndex is essentially set to the ordinal of the 
> collector which generates the hit. This means that the shardIndex is 
> dependent on the number of collectors, even for the same set of hits.
>  
> In case of no sort order specified, shardIndex is used for tie breaking when 
> scores are equal. This translates to different orders for same hits with 
> different shardIndices.
>  
> I propose that we remove shardIndex from the default tie breaking mechanism 
> and replace it with docID. DocID order is the de facto that is expected 
> during collection, so it might make sense to use the same factor during tie 
> breaking when scores are the same.
>  
> CC: [~ivera]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved

2019-06-12 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861848#comment-16861848
 ] 

Simon Willnauer commented on LUCENE-8829:
-

I'd remove the _ setShardIndex_ parameter alltogether and don't set it

> TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
> -
>
> Key: LUCENE-8829
> URL: https://issues.apache.org/jira/browse/LUCENE-8829
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, 
> LUCENE-8829.patch
>
>
> While investigating LUCENE-8819, I understood that TopDocs#merge's order of 
> results are indirectly dependent on the number of collectors involved in the 
> merge. This is troubling because 1) The number of collectors involved in a 
> merge are cost based and directly dependent on the number of slices created 
> for the parallel searcher case. 2) TopN hits code path will invoke merge with 
> a single Collector, so essentially, doing the same TopN query with single 
> threaded and parallel threaded searcher will invoke different order of 
> results, which is a bad invariant that breaks.
>  
> The reason why this happens is because of the subtle way TopDocs#merge sets 
> shardIndex in the ScoreDoc population during populating the priority queue 
> used for merging. ShardIndex is essentially set to the ordinal of the 
> collector which generates the hit. This means that the shardIndex is 
> dependent on the number of collectors, even for the same set of hits.
>  
> In case of no sort order specified, shardIndex is used for tie breaking when 
> scores are equal. This translates to different orders for same hits with 
> different shardIndices.
>  
> I propose that we remove shardIndex from the default tie breaking mechanism 
> and replace it with docID. DocID order is the de facto that is expected 
> during collection, so it might make sense to use the same factor during tie 
> breaking when scores are the same.
>  
> CC: [~ivera]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved

2019-06-12 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861821#comment-16861821
 ] 

Simon Willnauer commented on LUCENE-8829:
-

I do wonder if we can simplify this API now that we have FunctionalInterfaces. 
If we change _TopDocs#merge_ to take a  _ToIntFunction_ we should be 
able to have a default of _ScoreDoc::doc_ and users that want to use the 
shardindex can use _ScoreDoc::shardIndex_ that should also simplify our code I 
guess. Yet, I haven't check if it works across the board just an idea.

> TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
> -
>
> Key: LUCENE-8829
> URL: https://issues.apache.org/jira/browse/LUCENE-8829
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, 
> LUCENE-8829.patch
>
>
> While investigating LUCENE-8819, I understood that TopDocs#merge's order of 
> results are indirectly dependent on the number of collectors involved in the 
> merge. This is troubling because 1) The number of collectors involved in a 
> merge are cost based and directly dependent on the number of slices created 
> for the parallel searcher case. 2) TopN hits code path will invoke merge with 
> a single Collector, so essentially, doing the same TopN query with single 
> threaded and parallel threaded searcher will invoke different order of 
> results, which is a bad invariant that breaks.
>  
> The reason why this happens is because of the subtle way TopDocs#merge sets 
> shardIndex in the ScoreDoc population during populating the priority queue 
> used for merging. ShardIndex is essentially set to the ordinal of the 
> collector which generates the hit. This means that the shardIndex is 
> dependent on the number of collectors, even for the same set of hits.
>  
> In case of no sort order specified, shardIndex is used for tie breaking when 
> scores are equal. This translates to different orders for same hits with 
> different shardIndices.
>  
> I propose that we remove shardIndex from the default tie breaking mechanism 
> and replace it with docID. DocID order is the de facto that is expected 
> during collection, so it might make sense to use the same factor during tie 
> breaking when scores are the same.
>  
> CC: [~ivera]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8853) FileSwitchDirectory is broken if temp outputs are used

2019-06-11 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861785#comment-16861785
 ] 

Simon Willnauer commented on LUCENE-8853:
-

I attached a PR but I am not really happy with it, yet it's my best bet. I am 
wondering sure if we should start a discussion about removal of 
FileSwitchDirectory. It's hard to get right and there are many situtations 
where it can break. I do wonder what's it's usecase other than opening a file 
with NIO vs. MMAP as elasticsearch uses. If that's the main purpose we can 
build a better version of it. /cc [~rcmuir]

> FileSwitchDirectory is broken if temp outputs are used
> --
>
> Key: LUCENE-8853
> URL: https://issues.apache.org/jira/browse/LUCENE-8853
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> FileSwitchDirectory basically doesn't work if tmp output are used for files 
> that are explicitly mapped with extensions. here is a failing test:
> {code}
> 16:49:40[junit4] Suite: 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest
> 16:49:40[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=BlendedInfixSuggesterTest 
> -Dtests.method=testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
>  -Dtests.seed=16D8C93DC8FE5192 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=pt-LU -Dtests.timezone=US/Michigan -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1
> 16:49:40[junit4] ERROR   0.05s J1 | 
> BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
>  <<<
> 16:49:40[junit4]> Throwable #1: 
> java.nio.file.AtomicMoveNotSupportedException: _0.fdx__0.tmp -> _0.fdx: 
> source and dest are in different directories
> 16:49:40[junit4]> at 
> __randomizedtesting.SeedInfo.seed([16D8C93DC8FE5192:20E180A9490374CE]:0)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.FileSwitchDirectory.rename(FileSwitchDirectory.java:201)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.rename(LockValidatingDirectoryWrapper.java:56)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.TrackingDirectoryWrapper.rename(TrackingDirectoryWrapper.java:64)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.SortingStoredFieldsConsumer.flush(SortingStoredFieldsConsumer.java:56)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:152)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3199)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.commit(AnalyzingInfixSuggester.java:345)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:315)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.getBlendedInfixSuggester(BlendedInfixSuggesterTest.java:125)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch(BlendedInfixSuggesterTest.java:79)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 16:49:40[junit4]> at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> 16:49:40[junit4]> at 
> java.base/java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent b

[jira] [Created] (LUCENE-8853) FileSwitchDirectory is broken if temp outputs are used

2019-06-11 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8853:
---

 Summary: FileSwitchDirectory is broken if temp outputs are used
 Key: LUCENE-8853
 URL: https://issues.apache.org/jira/browse/LUCENE-8853
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Simon Willnauer


FileSwitchDirectory basically doesn't work if tmp output are used for files 
that are explicitly mapped with extensions. here is a failing test:

{code}
16:49:40[junit4] Suite: 
org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest
16:49:40[junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=BlendedInfixSuggesterTest 
-Dtests.method=testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
 -Dtests.seed=16D8C93DC8FE5192 -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=pt-LU -Dtests.timezone=US/Michigan -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
16:49:40[junit4] ERROR   0.05s J1 | 
BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
 <<<
16:49:40[junit4]> Throwable #1: 
java.nio.file.AtomicMoveNotSupportedException: _0.fdx__0.tmp -> _0.fdx: source 
and dest are in different directories
16:49:40[junit4]>   at 
__randomizedtesting.SeedInfo.seed([16D8C93DC8FE5192:20E180A9490374CE]:0)
16:49:40[junit4]>   at 
org.apache.lucene.store.FileSwitchDirectory.rename(FileSwitchDirectory.java:201)
16:49:40[junit4]>   at 
org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231)
16:49:40[junit4]>   at 
org.apache.lucene.store.LockValidatingDirectoryWrapper.rename(LockValidatingDirectoryWrapper.java:56)
16:49:40[junit4]>   at 
org.apache.lucene.store.TrackingDirectoryWrapper.rename(TrackingDirectoryWrapper.java:64)
16:49:40[junit4]>   at 
org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89)
16:49:40[junit4]>   at 
org.apache.lucene.index.SortingStoredFieldsConsumer.flush(SortingStoredFieldsConsumer.java:56)
16:49:40[junit4]>   at 
org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:152)
16:49:40[junit4]>   at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468)
16:49:40[junit4]>   at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555)
16:49:40[junit4]>   at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722)
16:49:40[junit4]>   at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3199)
16:49:40[junit4]>   at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444)
16:49:40[junit4]>   at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409)
16:49:40[junit4]>   at 
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.commit(AnalyzingInfixSuggester.java:345)
16:49:40[junit4]>   at 
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:315)
16:49:40[junit4]>   at 
org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.getBlendedInfixSuggester(BlendedInfixSuggesterTest.java:125)
16:49:40[junit4]>   at 
org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch(BlendedInfixSuggesterTest.java:79)
16:49:40[junit4]>   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
16:49:40[junit4]>   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
16:49:40[junit4]>   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:49:40[junit4]>   at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
16:49:40[junit4]>   at 
java.base/java.lang.Thread.run(Thread.java:834)
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8835) Respect file extension when listing files form FileSwitchDirectory

2019-06-11 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8835.
-
   Resolution: Fixed
 Assignee: Simon Willnauer
Fix Version/s: 8.2
   master (9.0)

> Respect file extension when listing files form FileSwitchDirectory
> --
>
> Key: LUCENE-8835
> URL: https://issues.apache.org/jira/browse/LUCENE-8835
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> FileSwitchDirectory splits file actions between 2 directories based on file 
> extensions. The extensions are respected on write operations like delete or 
> create but ignored when we list the content of the directories. Until now we 
> only deduplicated the contents on Directory#listAll which can cause 
> inconsistencies and hard to debug errors due to double deletions in 
> IndexWriter is a file is pending delete in one of the directories but still 
> shows up in the directory listing form the other directory. This case can 
> happen if both directories point to the same underlying FS directory which is 
> a common usecase to split between mmap and noifs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs

2019-06-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858441#comment-16858441
 ] 

Simon Willnauer commented on LUCENE-8833:
-

I do like the idea of #warm but the footprint is much bigger since it's a 
public API. I mean for my specific usecase I'd subclass mmap anyway and it 
would make it easier that way. FileSwitchDirectory is quite heavy and isn't 
really build for what I wanna do. I basically would need a IndexInput factory 
that I can plug into a directory that can alternate between NIOFS and mmap etc. 
and conditionally preload the mmap. Either way I can work with both I just 
think this change is the minimum viable change. lemme know if you are ok moving 
forward.

> Allow subclasses of MMapDirecory to preload individual IndexInputs
> --
>
> Key: LUCENE-8833
> URL: https://issues.apache.org/jira/browse/LUCENE-8833
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it's useful for subclasses to select the preload flag on a per index 
> input basis rather than all or nothing. Here is a patch that has an 
> overloaded protected openInput method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs

2019-06-06 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857525#comment-16857525
 ] 

Simon Willnauer commented on LUCENE-8833:
-

> what would the iocontext provide to base the preload decision on? just 
> curious.

sure, the one I had in mind as an example is merge. I am not sure if it makes a 
big difference I was just thinking if there are other signals than the file 
extension. 
I opened LUCENE-8835 to fix the file listing issue FileSwitchDirectory has.

> Allow subclasses of MMapDirecory to preload individual IndexInputs
> --
>
> Key: LUCENE-8833
> URL: https://issues.apache.org/jira/browse/LUCENE-8833
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it's useful for subclasses to select the preload flag on a per index 
> input basis rather than all or nothing. Here is a patch that has an 
> overloaded protected openInput method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8835) Respect file extension when listing files form FileSwitchDirectory

2019-06-06 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8835:
---

 Summary: Respect file extension when listing files form 
FileSwitchDirectory
 Key: LUCENE-8835
 URL: https://issues.apache.org/jira/browse/LUCENE-8835
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Simon Willnauer


FileSwitchDirectory splits file actions between 2 directories based on file 
extensions. The extensions are respected on write operations like delete or 
create but ignored when we list the content of the directories. Until now we 
only deduplicated the contents on Directory#listAll which can cause 
inconsistencies and hard to debug errors due to double deletions in IndexWriter 
is a file is pending delete in one of the directories but still shows up in the 
directory listing form the other directory. This case can happen if both 
directories point to the same underlying FS directory which is a common usecase 
to split between mmap and noifs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs

2019-06-05 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856781#comment-16856781
 ] 

Simon Willnauer commented on LUCENE-8833:
-

you are correct that's what elasticsearch does. Yet, FileSwitchDirectory had 
many issues in the past and still has (I am working on one issue related to 
[this|https://github.com/elastic/elasticsearch/pull/37140] and will open 
another issue soon. Especially with the push of pending deletes down to 
FSDirectory things became more tricky for FileSwitchDirectory especially. That 
said I think these issue should be fixed and I will work on it it was more of a 
trigger to look closer. I also wanted to make decisions if you preload or not 
based on the IOContext down the road which FileSwitch would not be capable of 
doing in this context. I hope this makes sense?

> Allow subclasses of MMapDirecory to preload individual IndexInputs
> --
>
> Key: LUCENE-8833
> URL: https://issues.apache.org/jira/browse/LUCENE-8833
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it's useful for subclasses to select the preload flag on a per index 
> input basis rather than all or nothing. Here is a patch that has an 
> overloaded protected openInput method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs

2019-06-05 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8833:
---

 Summary: Allow subclasses of MMapDirecory to preload individual 
IndexInputs
 Key: LUCENE-8833
 URL: https://issues.apache.org/jira/browse/LUCENE-8833
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer


I think it's useful for subclasses to select the preload flag on a per index 
input basis rather than all or nothing. Here is a patch that has an overloaded 
protected openInput method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8809) Refresh and rollback concurrently can leave segment states unclosed

2019-06-04 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856364#comment-16856364
 ] 

Simon Willnauer commented on LUCENE-8809:
-

[~dnhatn] can we close this issue?

> Refresh and rollback concurrently can leave segment states unclosed
> ---
>
> Key: LUCENE-8809
> URL: https://issues.apache.org/jira/browse/LUCENE-8809
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.7, 8.1, 8.2
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
> Fix For: 7.7.2, master (9.0), 8.2, 8.1.2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A [failed test|https://github.com/elastic/elasticsearch/issues/30290] from 
> Elasticsearch shows that refresh and rollback concurrently can leave segment 
> states unclosed leads to leaking refCount of some SegmentReaders.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8813) testIndexTooManyDocs fails

2019-05-31 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8813.
-
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

> testIndexTooManyDocs fails
> --
>
> Key: LUCENE-8813
> URL: https://issues.apache.org/jira/browse/LUCENE-8813
> Project: Lucene - Core
>  Issue Type: Test
>  Components: core/index
>Reporter: Nhat Nguyen
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> testIndexTooManyDocs fails on [Elastic 
> CI|https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+branch_8x/6402/console].
>  This failure does not reproduce locally for me.
> {noformat}
> [junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs
>[junit4]   2> KTN 23, 2019 4:09:37 PM 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-612,5,TGRP-TestIndexTooManyDocs]
>[junit4]   2> java.lang.AssertionError: only modifications from the 
> current flushing queue are permitted while doing a full flush
>[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([1F16B1DA7056AA52]:0)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.assertTicketQueueModification(DocumentsWriter.java:683)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:187)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:411)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:514)
>[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
>[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
>[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
>[junit4]   2> at java.base/java.lang.Thread.run(Thread.java:834)
>[junit4]   2> 
>[junit4]   2> KTN 23, 2019 6:09:36 PM 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
>[junit4]   2> WARNING: Suite execution timed out: 
> org.apache.lucene.index.TestIndexTooManyDocs
>[junit4]   2>1) Thread[id=669, 
> name=SUITE-TestIndexTooManyDocs-seed#[1F16B1DA7056AA52], state=RUNNABLE, 
> group=TGRP-TestIndexTooManyDocs]
>[junit4]   2> at 
> java.base/java.lang.Thread.getStackTrace(Thread.java:1606)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:696)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:693)
>[junit4]   2> at 
> java.base/java.security.AccessController.doPrivileged(Native Method)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.getStackTrace(ThreadLeakControl.java:693)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.getThreadsWithTraces(ThreadLeakControl.java:709)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.formatThreadStacksFull(ThreadLeakControl.java:689)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.access$1000(ThreadLeakControl.java:65)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:415)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:708)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:138)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:629)
>[junit4]   2>2) Thread[id=671, name=Thread-606, state=BLOCKED, 
> group=TGRP-TestIndexTooManyDocs]
>[junit4]   2> at 
> app//org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:4945)
>[junit4]   2> at 
> app//org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293)
>[junit4]   2> at 
> app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
>[junit4]   2> at 
> app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:262)
>[junit4]   2> at 
> app//org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:165)
>[junit4]   2> at 
> app//org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTo

[jira] [Commented] (LUCENE-8813) testIndexTooManyDocs fails

2019-05-28 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849506#comment-16849506
 ] 

Simon Willnauer commented on LUCENE-8813:
-

I looked at this and I think the issue here is that we are executing 2 flushes 
very very quickly after another while at the same time a single thread has 
already released it's DWPT before the first flush but has not tried to applying 
deletes before the second flush is done. In this case the assertion doesn't 
hold anymore. The window is super small and that is likely why we never tripped 
this. I don't think we have a correctness issue here but I will still try to 
improve the way we assert/apply deletes. 

> testIndexTooManyDocs fails
> --
>
> Key: LUCENE-8813
> URL: https://issues.apache.org/jira/browse/LUCENE-8813
> Project: Lucene - Core
>  Issue Type: Test
>  Components: core/index
>Reporter: Nhat Nguyen
>Priority: Major
>
> testIndexTooManyDocs fails on [Elastic 
> CI|https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+branch_8x/6402/console].
>  This failure does not reproduce locally for me.
> {noformat}
> [junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs
>[junit4]   2> KTN 23, 2019 4:09:37 PM 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-612,5,TGRP-TestIndexTooManyDocs]
>[junit4]   2> java.lang.AssertionError: only modifications from the 
> current flushing queue are permitted while doing a full flush
>[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([1F16B1DA7056AA52]:0)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.assertTicketQueueModification(DocumentsWriter.java:683)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:187)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:411)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:514)
>[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
>[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
>[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
>[junit4]   2> at java.base/java.lang.Thread.run(Thread.java:834)
>[junit4]   2> 
>[junit4]   2> KTN 23, 2019 6:09:36 PM 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
>[junit4]   2> WARNING: Suite execution timed out: 
> org.apache.lucene.index.TestIndexTooManyDocs
>[junit4]   2>1) Thread[id=669, 
> name=SUITE-TestIndexTooManyDocs-seed#[1F16B1DA7056AA52], state=RUNNABLE, 
> group=TGRP-TestIndexTooManyDocs]
>[junit4]   2> at 
> java.base/java.lang.Thread.getStackTrace(Thread.java:1606)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:696)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:693)
>[junit4]   2> at 
> java.base/java.security.AccessController.doPrivileged(Native Method)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.getStackTrace(ThreadLeakControl.java:693)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.getThreadsWithTraces(ThreadLeakControl.java:709)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.formatThreadStacksFull(ThreadLeakControl.java:689)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.access$1000(ThreadLeakControl.java:65)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:415)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:708)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:138)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:629)
>[junit4]   2>2) Thread[id=671, name=Thread-606, state=BLOCKED, 
> group=TGRP-TestIndexTooManyDocs]
>[junit4]   2> at 
> app//org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:4945)
>[junit4]   2> at 
> app//org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293)
>[junit4]   2> at 
> app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfCha

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843726#comment-16843726
 ] 

Simon Willnauer commented on LUCENE-8757:
-

[~atris] can we instead of asserting the order just sort the slice in LeafSlice 
ctor? This should prevent any issues down the road and it's cheap enough IMO

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-10 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-8757:
---

Assignee: Simon Willnauer

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-10 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837615#comment-16837615
 ] 

Simon Willnauer commented on LUCENE-8757:
-

LGTM I will try to commit this in the coming days

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-10 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837003#comment-16837003
 ] 

Simon Willnauer commented on LUCENE-8757:
-

{quote}
I think there is an important justification for the 2nd criteria (number of 
segments in each work unit / slice), which is if you have an index with some 
large segments, and then with a long tail of small segments (easily happens if 
your machine has substantially CPU concurrency and you use multiple threads), 
since there is a fixed cost for visiting each segment, if you put too many 
small segments into one work unit, those fixed costs multiply and that one work 
unit can become too slow even though it's not actually going to visit too many 
documents.

I think we should keep it?
{quote}

fair enough. lets add it back


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-08 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835894#comment-16835894
 ] 

Simon Willnauer commented on LUCENE-8785:
-

{quote} Please feel free to commit this to the release branch. In case of a 
re-spin, I'll pick this change up. {quote}

[~ichattopadhyaya] done. Thanks.

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 7.7.2, master (9.0), 8.2, 8.1.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-08 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835481#comment-16835481
 ] 

Simon Willnauer commented on LUCENE-8757:
-

Thanks for the additional iteration, now that we simplified this can we remove 
the sorting? I don't necessearily see how the sort makes things simpler. If we 
see a segment > threshold we can just add it as a group? I though you did that 
already and hence my comment about the assertion. WDYT?

I also want to suggest to beef up testing a bit with a randomized version of 
this like this:
{code}
diff --git 
a/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java 
b/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
index 7c63a817adb..76ccca64ee7 100644
--- a/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
+++ b/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
@@ -1933,6 +1933,14 @@ public abstract class LuceneTestCase extends Assert {
 ret = random.nextBoolean()
 ? new AssertingIndexSearcher(random, r, ex)
 : new AssertingIndexSearcher(random, r.getContext(), ex);
+  } else if (random.nextBoolean()) {
+int maxDocPerSlice = 1 + random.nextInt(10);
+ret = new IndexSearcher(r, ex) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+return slices(leaves, maxDocPerSlice);
+  }
+};
   } else {
 ret = random.nextBoolean()
 ? new IndexSearcher(r, ex)
{code}



> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7840) BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 1 MUST/FILTER clause and 0==minShouldMatch

2019-05-08 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835473#comment-16835473
 ] 

Simon Willnauer commented on LUCENE-7840:
-

LGTM

> BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 
> 1 MUST/FILTER clause and 0==minShouldMatch
> ---
>
> Key: LUCENE-7840
> URL: https://issues.apache.org/jira/browse/LUCENE-7840
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-7840.patch, LUCENE-7840.patch, LUCENE-7840.patch
>
>
> I haven't thought this through completely, let alone write up a patch / test 
> case, but IIUC...
> We should be able to optimize  {{ BooleanQuery rewriteNoScoring() }} so that 
> (after converting MUST clauses to FILTER clauses) we can check for the common 
> case of {{0==getMinimumNumberShouldMatch()}} and throw away any SHOULD 
> clauses as long as there is is at least one FILTER clause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-08 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8785.
-
Resolution: Fixed

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 7.7.2, master (9.0), 8.2, 8.1.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-08 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8785:

Fix Version/s: (was: 8.0.1)
   (was: 8.1)
   (was: 7.7.1)
   8.2
   7.7.2
   8.1.1

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 7.7.2, master (9.0), 8.2, 8.1.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7840) BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 1 MUST/FILTER clause and 0==minShouldMatch

2019-05-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834778#comment-16834778
 ] 

Simon Willnauer commented on LUCENE-7840:
-

I think there are some style issues in this patch like here were _else_ shoud 
be on the prev line:

{code:java}
+  }
+}
+else {
+  newQuery.add(clause);
+}
{code}

the other question is if we should use a switch instead of if / else? Otherwise 
it's looking fine




> BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 
> 1 MUST/FILTER clause and 0==minShouldMatch
> ---
>
> Key: LUCENE-7840
> URL: https://issues.apache.org/jira/browse/LUCENE-7840
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-7840.patch, LUCENE-7840.patch
>
>
> I haven't thought this through completely, let alone write up a patch / test 
> case, but IIUC...
> We should be able to optimize  {{ BooleanQuery rewriteNoScoring() }} so that 
> (after converting MUST clauses to FILTER clauses) we can check for the common 
> case of {{0==getMinimumNumberShouldMatch()}} and throw away any SHOULD 
> clauses as long as there is is at least one FILTER clause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834767#comment-16834767
 ] 

Simon Willnauer commented on LUCENE-8757:
-

[~atris] I think the assertion in this part doesn't hold:

{code}
+for (LeafReaderContext ctx : sortedLeaves) {
+  if (ctx.reader().maxDoc() > maxDocsPerSlice) {
+assert group == null;
+List singleSegmentSlice = new ArrayList();
{code}

if the previous segment was smallish then _group_ is non-null? I think you 
should test these cases, maybe add a random test and randomize the order or the 
segments?

This:
{code}
+List singleSegmentSlice = new ArrayList();
+
+singleSegmentSlice.add(ctx);
+groupedLeaves.add(singleSegmentSlice);
{code}
can and should be replaced by:

{code}
groupedLeaves.add(Collections.singletonList(ctx));
{code}


otherwise it looks good.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834525#comment-16834525
 ] 

Simon Willnauer commented on LUCENE-8757:
-

[~atris] actually I thought about these defaults again and I am starting to 
think it's an ok default. The reason for this is that we try to prevent having 
dedicated threads for smallish segments so we group them together. I still do 
wonder if we need to have 2 parameters? Wouldn't it be enough to just say that 
we group things together until we have at least 250k docs per thread to be 
searched? is it really necessary to have another parameter that limits the 
number of segmetns per slice? I think a single parameter would be great and 
simpler. WDYT?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-07 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8785:

Fix Version/s: 7.7.1
   master (9.0)
   8.1
   8.0.1

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 7.7.1, 8.0.1, 8.1, master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-07 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-8785:
---

Assignee: Simon Willnauer

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834467#comment-16834467
 ] 

Simon Willnauer commented on LUCENE-8785:
-

{quote}
If there is another thread coming in after we locked the existent threadstates 
we just issue a new one.

Yuck 
{quote}

I looked at the code again and we actually lock the threadstates for this 
purpose. I implemented this in LUCENE-8639. The issue here is in-fact a race 
condition since we request the number of active threadstates before we lock new 
ones. It's a classic one-line fix. I referenced a PR for this. [~mikemccand] 
would you take a look

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1683#comment-1683
 ] 

Simon Willnauer commented on LUCENE-8757:
-

> Would it make sense to push this patch, and then let users consume it and 
> provide feedback while we iterate on the more sophisticated version? We could 
> even have both of the methods available as options to users, potentially

I don't think we should push this if we already know we wanna do something 
different. That said, I am not convinced the numbers are good defaults. At the 
same time I don't have any numbers here do you have anything to back these 
defaults up?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-03 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832343#comment-16832343
 ] 

Simon Willnauer commented on LUCENE-8757:
-

Thanks [~atris], can you bring back the javadocs for 
{code:java}
protected LeafSlice[] slices(List leaves){code}

please don't reassign an argument like here:


{code:java}
leaves = new ArrayList<>(leaves);
{code}

The rest of the patch looks OK to me yet I am not so sure about the defaults. I 
do wonder if we should look at this from a different perspective. Rather than 
using hard numbers can we try to evenly balance the total number of documents 
across N threads and make N the variable? [~mikemccand] WDYT?


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-03 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832336#comment-16832336
 ] 

Simon Willnauer commented on LUCENE-8785:
-

{quote} I realize neither ES nor Solr expose deleteAll but I don't think that's 
a valid argument to remove it from Lucene.  
{quote}
 huh, I don't think that's a valid argument either, I just re-read my comments 
- sorry if you felt I was alluding to es or solr here. My argument is that if 
you want to do that you should construct a new IndexWriter instead of calling 
deleteAll(). Given this comment on the javadocs:
{noformat}
 Essentially a call to {@link #deleteAll()} is equivalent to creating a new 
{@link IndexWriter} with {@link OpenMode#CREATE} 
{noformat}
I want to understand why, in such a rather edgy case a user can't do exactly 
this. There is no race, no confusion it's very simple from a semantics 
perspective. Currently there are 2 ways and one if confusing. I think we should 
move towards removing the second way.

 
{quote}And for some reason the index is reset once per week, but the devs want 
to allow searching of the old index while the new index is (slowly) built up. 
But if something goes badly wrong, they need to be able to rollback (the 
deleteAll and all subsequently added docs) to the last commit and try again 
later. If instead it succeeds, then a refresh/commit will switch to the new 
index atomically. 
{quote}
 Well, there are tons of ways to do that no? I mean you can have 2 directories? 
Yes it causes some engineering effort but the semantics would be cleaner even 
for the app that does what you explain.

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Priority: Minor
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This m

[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-02 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831635#comment-16831635
 ] 

Simon Willnauer commented on LUCENE-8785:
-

> But at the point we call clear() haven't we already blocked all indexing 
> threads?

no, it might look like we do that but we don't. We block and lock all threads 
up that that point in time. If there is another thread coming in after we 
locked the existent threadstates we just issue a new one.

> I also dislike deleteAll() and you're right a user could deleteByQuery using 
> MatchAllDocsQuery; can we make that close-ish as efficient as deleteAll() is 
> today?

I think we can just do what deleteAll() does today except of not dropping the 
schema on the floor?

> Though indeed that would preserve the schema, while deleteAll() let's you 
> delete docs, delete schema, all under transaction (the change is not visible 
> until commit). 

I want to understand the usecase for this. I can see how somebody wants to drop 
all docs but basically droping all IW state on the floor is difficult in my 
eyes.



> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Priority: Minor
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-02 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831612#comment-16831612
 ] 

Simon Willnauer commented on LUCENE-8785:
-

[~mikemccand] I think this is caused by the fact that we simply call _clear()_ 
during _IW#deleteAll()_. If this happens concurrently to the a document being 
indexed this assertion can trip. I personally always disliked the complexity of 
_IW#deleteAll_ and from my perspective we should remove this method entirely 
and ask users to open a new IW if they want to drop all the information 
including the _schema_. We can still fast-path a _MatchAllQuery_ through 
something like this as we do today (which is a problem IMO since it drops all 
fields map info which it shouldn't?). IMO if you want a fresh index start from 
scratch but to delete all docs go and run DeleteByQueyr and keep the schema.

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Priority: Minor
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8776) Start offset going backwards has a legitimate purpose

2019-05-02 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831604#comment-16831604
 ] 

Simon Willnauer commented on LUCENE-8776:
-

[~venkat11] I do understand your frustration. Believe me, we don't take changes 
like this easily. One persons bug is another persons feature and as we grow and 
mature strong guarantess are essential for a vast majority of users, for future 
developments for faster iterations and more performant code. There might not be 
a tradeoff from your perspective, from the maintainers perspective there is. 
Now we can debate if a major version bump is _enough_ time to migrate or not, 
our policy is that we can make BWC and behavioral changes like this in a major 
release. In-fact we don't do it in minors to provide you the time you need and 
to easy upgrades to minors. We will and have build features on top of this 
guarantee and in order to manage expectations I am pretty sure we won't go back 
an allow negative offsets. I think your best option, if you like it or not, is 
to work towards a fix for your issue with either the tools you have now or 
improve lucene for instance with the suggestion from [~mgibney] regarding 
indexing more information. 

Please don't get mad at me, I am just trying to manage expectations. 

> Start offset going backwards has a legitimate purpose
> -
>
> Key: LUCENE-8776
> URL: https://issues.apache.org/jira/browse/LUCENE-8776
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.6
>Reporter: Ram Venkat
>Priority: Major
>
> Here is the use case where startOffset can go backwards:
> Say there is a line "Organic light-emitting-diode glows", and I want to run 
> span queries and highlight them properly. 
> During index time, light-emitting-diode is split into three words, which 
> allows me to search for 'light', 'emitting' and 'diode' individually. The 
> three words occupy adjacent positions in the index, as 'light' adjacent to 
> 'emitting' and 'light' at a distance of two words from 'diode' need to match 
> this word. So, the order of words after splitting are: Organic, light, 
> emitting, diode, glows. 
> But, I also want to search for 'organic' being adjacent to 
> 'light-emitting-diode' or 'light-emitting-diode' being adjacent to 'glows'. 
> The way I solved this was to also generate 'light-emitting-diode' at two 
> positions: (a) In the same position as 'light' and (b) in the same position 
> as 'glows', like below:
> ||organic||light||emitting||diode||glows||
> | |light-emitting-diode| |light-emitting-diode| |
> |0|1|2|3|4|
> The positions of the two 'light-emitting-diode' are 1 and 3, but the offsets 
> are obviously the same. This works beautifully in Lucene 5.x in both 
> searching and highlighting with span queries. 
> But when I try this in Lucene 7.6, it hits the condition "Offsets must not go 
> backwards" at DefaultIndexingChain:818. This IllegalArgumentException is 
> being thrown without any comments on why this check is needed. As I explained 
> above, startOffset going backwards is perfectly valid, to deal with word 
> splitting and span operations on these specialized use cases. On the other 
> hand, it is not clear what value is added by this check and which highlighter 
> code is affected by offsets going backwards. This same check is done at 
> BaseTokenStreamTestCase:245. 
> I see others talk about how this check found bugs in WordDelimiter etc. but 
> it also prevents legitimate use cases. Can this check be removed?  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-02 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831591#comment-16831591
 ] 

Simon Willnauer commented on LUCENE-8757:
-

Hey Atri,

thanks for putting up this patch, here is some additional feedback:
 - can we stick with an protected non-static method on IndexSearcher subclasses 
should be able to override your impl. I think it's ok to have a static method 
like this:
{code:java}
 public static LeafSlice[] slices (List leaves, int 
maxDocsPerSlice, int maxSegPerSlice){code}
that you can call from the protected method with your defaults?
 - you might want to change your sort to something like this: 
{code:java}
Collections.sort(leaves, Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc(;{code}

 - I think the _Leaves_ class is unnecessary we can just use 
_List_ instead?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-04-15 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8671.
-
   Resolution: Fixed
 Assignee: Simon Willnauer
Fix Version/s: master (9.0)
   8.1

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 8.1, master (9.0)
>
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Time Spent: 5h
>  Remaining Estimate: 19h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8754) SegmentInfo#toString can cause ConcurrentModificationException

2019-04-10 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8754.
-
   Resolution: Fixed
Fix Version/s: master (9.0)
   8.1

> SegmentInfo#toString can cause ConcurrentModificationException
> --
>
> Key: LUCENE-8754
> URL: https://issues.apache.org/jira/browse/LUCENE-8754
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: 8.1, master (9.0)
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> A recent change increased the likelihood for this issue to show up but it can 
> already happen before since we are using the attributes map in the 
> StoredFieldsFormat for quite some time. I found this issue due to a test 
> failure on our CI:
> {noformat}
> 13:11:56[junit4] Suite: org.apache.lucene.index.TestIndexSorting
> 13:11:56[junit4]   2> apr 05, 2019 8:11:53 AM 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
> 13:11:56[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-507,5,TGRP-TestIndexSorting]
> 13:11:56[junit4]   2> java.util.ConcurrentModificationException
> 13:11:56[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([7C25B308F180203B]:0)
> 13:11:56[junit4]   2> at 
> java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
> 13:11:56[junit4]   2> at 
> java.util.HashMap$EntryIterator.next(HashMap.java:1476)
> 13:11:56[junit4]   2> at 
> java.util.HashMap$EntryIterator.next(HashMap.java:1474)
> 13:11:56[junit4]   2> at 
> java.util.AbstractMap.toString(AbstractMap.java:554)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.SegmentInfo.toString(SegmentInfo.java:222)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:345)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:364)
> 13:11:56[junit4]   2> at java.lang.String.valueOf(String.java:2994)
> 13:11:56[junit4]   2> at 
> java.lang.StringBuilder.append(StringBuilder.java:131)
> 13:11:56[junit4]   2> at 
> java.util.AbstractMap.toString(AbstractMap.java:557)
> 13:11:56[junit4]   2> at 
> java.util.Collections$UnmodifiableMap.toString(Collections.java:1493)
> 13:11:56[junit4]   2> at java.lang.String.valueOf(String.java:2994)
> 13:11:56[junit4]   2> at 
> java.lang.StringBuilder.append(StringBuilder.java:131)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.TieredMergePolicy.findForcedMerges(TieredMergePolicy.java:628)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2181)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2154)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1988)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1939)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.TestIndexSorting$UpdateRunnable.run(TestIndexSorting.java:1851)
> 13:11:56[junit4]   2> at java.lang.Thread.run(Thread.java:748)
> 13:11:56[junit4]   2> 
> 13:11:56[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexSorting -Dtests.method=testConcurrentUpdates 
> -Dtests.seed=7C25B308F180203B -Dtests.slow=true -Dtest
> {noformat}
> The issue is that we update the attributes map (also we similarly do the same 
> for diagnostics but it's not necessarily causing the issue since the 
> diagnostics map is never modified) during the merge process but access it in 
> the merge policy when looking at running merges and there we call toString on 
> SegmentCommitInfo which happens without any synchronization. This is 
> technically unsafe publication but IW is a mess along those lines and real 
> fixes would require significant changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8754) SegmentInfo#toString can cause ConcurrentModificationException

2019-04-07 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8754:
---

 Summary: SegmentInfo#toString can cause 
ConcurrentModificationException
 Key: LUCENE-8754
 URL: https://issues.apache.org/jira/browse/LUCENE-8754
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer


A recent change increased the likelihood for this issue to show up but it can 
already happen before since we are using the attributes map in the 
StoredFieldsFormat for quite some time. I found this issue due to a test 
failure on our CI:


{noformat}
13:11:56[junit4] Suite: org.apache.lucene.index.TestIndexSorting
13:11:56[junit4]   2> apr 05, 2019 8:11:53 AM 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
13:11:56[junit4]   2> WARNING: Uncaught exception in thread: 
Thread[Thread-507,5,TGRP-TestIndexSorting]
13:11:56[junit4]   2> java.util.ConcurrentModificationException
13:11:56[junit4]   2>   at 
__randomizedtesting.SeedInfo.seed([7C25B308F180203B]:0)
13:11:56[junit4]   2>   at 
java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
13:11:56[junit4]   2>   at 
java.util.HashMap$EntryIterator.next(HashMap.java:1476)
13:11:56[junit4]   2>   at 
java.util.HashMap$EntryIterator.next(HashMap.java:1474)
13:11:56[junit4]   2>   at 
java.util.AbstractMap.toString(AbstractMap.java:554)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.SegmentInfo.toString(SegmentInfo.java:222)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:345)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:364)
13:11:56[junit4]   2>   at java.lang.String.valueOf(String.java:2994)
13:11:56[junit4]   2>   at 
java.lang.StringBuilder.append(StringBuilder.java:131)
13:11:56[junit4]   2>   at 
java.util.AbstractMap.toString(AbstractMap.java:557)
13:11:56[junit4]   2>   at 
java.util.Collections$UnmodifiableMap.toString(Collections.java:1493)
13:11:56[junit4]   2>   at java.lang.String.valueOf(String.java:2994)
13:11:56[junit4]   2>   at 
java.lang.StringBuilder.append(StringBuilder.java:131)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.TieredMergePolicy.findForcedMerges(TieredMergePolicy.java:628)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2181)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2154)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1988)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1939)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.TestIndexSorting$UpdateRunnable.run(TestIndexSorting.java:1851)
13:11:56[junit4]   2>   at java.lang.Thread.run(Thread.java:748)
13:11:56[junit4]   2> 
13:11:56[junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestIndexSorting -Dtests.method=testConcurrentUpdates 
-Dtests.seed=7C25B308F180203B -Dtests.slow=true -Dtest
{noformat}

The issue is that we update the attributes map (also we similarly do the same 
for diagnostics but it's not necessarily causing the issue since the 
diagnostics map is never modified) during the merge process but access it in 
the merge policy when looking at running merges and there we call toString on 
SegmentCommitInfo which happens without any synchronization. This is 
technically unsafe publication but IW is a mess along those lines and real 
fixes would require significant changes.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8735) FileAlreadyExistsException after opening old commit

2019-03-26 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801820#comment-16801820
 ] 

Simon Willnauer commented on LUCENE-8735:
-

thanks henning

> FileAlreadyExistsException after opening old commit
> ---
>
> Key: LUCENE-8735
> URL: https://issues.apache.org/jira/browse/LUCENE-8735
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Affects Versions: 8.0
>Reporter: Henning Andersen
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: 7.7.1, 7.7.2, 8.0.1, 8.1, master (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> FilterDirectory.getPendingDeletes() does not delegate calls. This in turn 
> means that IndexFileDeleter does not consider those as relevant files.
> When opening an IndexWriter for an older commit, excess files are attempted 
> deleted. If an IndexReader exists using one of the newer commits, the excess 
> files may fail to delete (at least on windows or when using the mocking 
> WindowsFS).
> If then closing and opening the IndexWriter, the information on the pending 
> deletes are gone if a FilterDirectory derivate is used. At the same time, the 
> pending deletes are filtered out of listAll. This leads to a risk of hitting 
> an existing file name, causing a FileAlreadyExistsException.
> This issue likely only exists on windows.
> Will create pull request with fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8735) FileAlreadyExistsException after opening old commit

2019-03-26 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8735.
-
   Resolution: Fixed
 Assignee: Simon Willnauer
Fix Version/s: 7.7.1
   8.1
   8.0.1
   7.7.2

> FileAlreadyExistsException after opening old commit
> ---
>
> Key: LUCENE-8735
> URL: https://issues.apache.org/jira/browse/LUCENE-8735
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Affects Versions: 8.0
>Reporter: Henning Andersen
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: 7.7.2, 8.0.1, 8.1, master (9.0), 7.7.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> FilterDirectory.getPendingDeletes() does not delegate calls. This in turn 
> means that IndexFileDeleter does not consider those as relevant files.
> When opening an IndexWriter for an older commit, excess files are attempted 
> deleted. If an IndexReader exists using one of the newer commits, the excess 
> files may fail to delete (at least on windows or when using the mocking 
> WindowsFS).
> If then closing and opening the IndexWriter, the information on the pending 
> deletes are gone if a FilterDirectory derivate is used. At the same time, the 
> pending deletes are filtered out of listAll. This leads to a risk of hitting 
> an existing file name, causing a FileAlreadyExistsException.
> This issue likely only exists on windows.
> Will create pull request with fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8700) Enable concurrent flushing when no indexing is in progress

2019-03-13 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8700.
-
Resolution: Invalid

We settled on the PR that IndexWriter#flushNextBuffer is sufficient for this 
usecase. I opened a new PR for the test-improvements. here 
https://github.com/apache/lucene-solr/pull/607

> Enable concurrent flushing when no indexing is in progress
> --
>
> Key: LUCENE-8700
> URL: https://issues.apache.org/jira/browse/LUCENE-8700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> As discussed on mailing list, this is for adding a IndexWriter.yield() method 
> that callers can use to enable concurrent flushing. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() may not reflect all corrupting exceptions (notably: NoSuchFileException)

2019-03-12 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790895#comment-16790895
 ] 

Simon Willnauer commented on LUCENE-8692:
-

> rollback gives you a way to close IndexWriter without doing a commit, which 
> seems useful.  If you removed that, what would users do instead?

can't we expend close to close without commit? I mean we can keep rollback but 
bet more strict about exceptions during the commit and friends?

> IndexWriter.getTragicException() may not reflect all corrupting exceptions 
> (notably: NoSuchFileException)
> -
>
> Key: LUCENE-8692
> URL: https://issues.apache.org/jira/browse/LUCENE-8692
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch, 
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's 
> {{corruptFiles}} to introduce corruption into the "leader" node's index and 
> then assert that this solr node gives up it's leadership of the shard and 
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly - see 
> SOLR-13237) due to the leader not giving up it's leadership even after the 
> corruption causes an update/commit to fail. Solr's leadership code makes this 
> decision after encountering an exception from the IndexWriter based on wether 
> {{IndexWriter.getTragicException()}} is (non-)null.
> 
> While investigating this, I created an isolated Lucene-Core equivilent test 
> that demonstrates the same basic situation:
>  * Gradually cause corruption on an index untill (otherwise) valid execution 
> of IW.add() + IW.commit() calls throw an exception to the IW client.
>  * assert that if an exception is thrown to the IW client, 
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly – in every situation 
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the 
> randomly introduced corruption was to delete some file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() may not reflect all corrupting exceptions (notably: NoSuchFileException)

2019-03-11 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789320#comment-16789320
 ] 

Simon Willnauer commented on LUCENE-8692:
-

{quote}
It definitely seems like there should be something we can/should do to better 
recognize situations like this as "unrecoverable" and be more strict in dealing 
with low level exceptions during things like commit – but I'm out definitely 
out of my depth in understanding/suggesting what that might look like.
{quote}

I agree with you here, I personally question the purpose of rollback since all 
the cases I have seen a missing rollback would simply mean dataloss. if 
somebody continues after a failed commit / prepareCommit / reopen they will end 
up with inconsistency and / or dataloss. I can't think of a reason why you 
would want to do it. I am curious what [~mikemccand] [~jpountz] [~rcmuir ] 
think about that. 
If we deprecated and remove rollback() we can be more agressive when it gets to 
tragic events and prevent users from continuing after such an exception by 
closing the writer automatically.



> IndexWriter.getTragicException() may not reflect all corrupting exceptions 
> (notably: NoSuchFileException)
> -
>
> Key: LUCENE-8692
> URL: https://issues.apache.org/jira/browse/LUCENE-8692
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch, 
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's 
> {{corruptFiles}} to introduce corruption into the "leader" node's index and 
> then assert that this solr node gives up it's leadership of the shard and 
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly - see 
> SOLR-13237) due to the leader not giving up it's leadership even after the 
> corruption causes an update/commit to fail. Solr's leadership code makes this 
> decision after encountering an exception from the IndexWriter based on wether 
> {{IndexWriter.getTragicException()}} is (non-)null.
> 
> While investigating this, I created an isolated Lucene-Core equivilent test 
> that demonstrates the same basic situation:
>  * Gradually cause corruption on an index untill (otherwise) valid execution 
> of IW.add() + IW.commit() calls throw an exception to the IW client.
>  * assert that if an exception is thrown to the IW client, 
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly – in every situation 
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the 
> randomly introduced corruption was to delete some file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-03-06 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785703#comment-16785703
 ] 

Simon Willnauer commented on LUCENE-8671:
-

I don't think we should add a setter to FieldInfo. This is a code-private thing 
and should be treated this way. This looks like we need to have a way to pass 
more info down when we open new SegmentReaders. I wonder if we can accept a 
simple Map on 

{noformat}
public static DirectoryReader open(final IndexWriter writer, boolean 
applyAllDeletes, boolean writeAllDeletes) throws IOException
{noformat}

We can then pass it down to the relevant parts and make it part of 
`SegmentReaderState`? This map can also be passed via IndexWriterConfig for the 
NRT case. That way we can pass stuff per DirectoryReader open which is what we 
want I guess. 


> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() nay not reflect all corrupting exceptions (notably: NoSuchFileException)

2019-03-06 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785625#comment-16785625
 ] 

Simon Willnauer commented on LUCENE-8692:
-

{quote}
For now I've updated the patch to take the simplest possible approach to 
checking for MergeAbortedException
{quote}


+1

{quote}
Well, to flip your question around: is there an example of a Throwable you can 
think of bubbling up out of IndexWriter.startCommit() that should NOT be 
considered fatal?
{quote}
I think we need to be careful here. From my perspective there are 3 types of 
exceptions here:
 * unrecoverable exceptions aka. VirtualMachineErrors
 * exceptions that happen during indexing and are not recoverable (these are 
handled in DocumentsWriter)
 * exceptions that cause dataloss or inconsistencies (we didn't handle those as 
fatal yet at least not consistently) but we only catch VirtualMachineError.

Those are in particular:

 * getReader()
 * deleteAll()
 * addIndexes()
 * flushNextBuffer()
 * prepareCommitInternal() 
 * doFlush()
 * startCommit()

Those methods might cause documents go missing etc. but we treated them not as 
fatal or tragic events since a user could always call rollback() to go back the 
the last known safe-point / previous commit. Now we can debate if we want to 
change this and we can, in-fact I am all for making it even more strict 
especially since it's inconsistent with what we do if addDocument fails with an 
aborting exception. 
If we do that we need to see if rollback still has a purpose and maybe remove 
it?

now speaking of maybeMerge I don't see why we need to close the index writer 
with a tragic event, there is no dataloss nor an inconsistency? From that logic 
I don't think we need to handle these exceptions in such a drastic way?

{quote}
I don't use github for lucene development – I track all contributions as 
patches in the official issue tracker for the project as recommended by our 
official guidelines : )  ... but i'll go ahead and create a jira/LUCENE-8692 
branch if that will help you review.
{quote}

Bummer, I am not sure branches help. Working like it's still 1999 is a pain we 
should fix our guidelines.



> IndexWriter.getTragicException() nay not reflect all corrupting exceptions 
> (notably: NoSuchFileException)
> -
>
> Key: LUCENE-8692
> URL: https://issues.apache.org/jira/browse/LUCENE-8692
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch, 
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's 
> {{corruptFiles}} to introduce corruption into the "leader" node's index and 
> then assert that this solr node gives up it's leadership of the shard and 
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly - 
> seeSOLR-13237) due to the leader not giving up it's leadership even after the 
> corruption causes an update/commit to fail.  Solr's leadership code makes 
> this decision after encountering an exception from the IndexWriter based on 
> wether {{IndexWriter.getTragicException()}} is (non-)null.
> 
> While investigating this, I created an isolated Lucene-Core equivilent test 
> that demonstrates the same basic situation:
> * Gradually cause corruption on an index untill (otherwise) valid execution 
> of IW.add() + IW.commit() calls throw an exception to the IW client.
> * assert that if an exception is thrown to the IW client, 
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly -- in every situation 
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the 
> randomly introduced corruption was to delete some file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() nay not reflect all corrupting exceptions (notably: NoSuchFileException)

2019-03-05 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784438#comment-16784438
 ] 

Simon Willnauer commented on LUCENE-8692:
-



{noformat}
I think there is an issue with the patch with MergeAbortedExeption indeed given 
that registerMerge might throw such an exception. Maybe we should move this try 
block to registerMerge instead where we know which OneMerge is being registered 
(and is also where the exception is thrown when estimating the size of the 
merge).
{noformat}

+1

{code:java}
-} catch (VirtualMachineError tragedy) {
+} catch (Throwable tragedy) {
   tragicEvent(tragedy, "startCommit");
{code}

I am not sure why we need to treat every exception as fatal in this case?

I also wonder if we could move this to a PR on github, iterations would be 
simpler and comments too. I can't tell which patch is relevant which one isn't.

> IndexWriter.getTragicException() nay not reflect all corrupting exceptions 
> (notably: NoSuchFileException)
> -
>
> Key: LUCENE-8692
> URL: https://issues.apache.org/jira/browse/LUCENE-8692
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-8692.patch, LUCENE-8692.patch, 
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's 
> {{corruptFiles}} to introduce corruption into the "leader" node's index and 
> then assert that this solr node gives up it's leadership of the shard and 
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly - 
> seeSOLR-13237) due to the leader not giving up it's leadership even after the 
> corruption causes an update/commit to fail.  Solr's leadership code makes 
> this decision after encountering an exception from the IndexWriter based on 
> wether {{IndexWriter.getTragicException()}} is (non-)null.
> 
> While investigating this, I created an isolated Lucene-Core equivilent test 
> that demonstrates the same basic situation:
> * Gradually cause corruption on an index untill (otherwise) valid execution 
> of IW.add() + IW.commit() calls throw an exception to the IW client.
> * assert that if an exception is thrown to the IW client, 
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly -- in every situation 
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the 
> randomly introduced corruption was to delete some file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking

2019-02-20 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773011#comment-16773011
 ] 

Simon Willnauer commented on LUCENE-3041:
-

[~romseygeek] any chance you can open a PR for this. Patches are so hard to 
review and comment on  

> Support Query Visting / Walking
> ---
>
> Key: LUCENE-3041
> URL: https://issues.apache.org/jira/browse/LUCENE-3041
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Chris Male
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 4.9, 6.0
>
> Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, 
> LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch
>
>
> Out of the discussion in LUCENE-2868, it could be useful to add a generic 
> Query Visitor / Walker that could be used for more advanced rewriting, 
> optimizations or anything that requires state to be stored as each Query is 
> visited.
> We could keep the interface very simple:
> {code}
> public interface QueryVisitor {
>   Query visit(Query query);
> }
> {code}
> and then use a reflection based visitor like Earwin suggested, which would 
> allow implementators to provide visit methods for just Querys that they are 
> interested in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods

2019-02-18 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771013#comment-16771013
 ] 

Simon Willnauer commented on LUCENE-8292:
-

[~dsmiley] I coordinated this with [~romseygeek] given that we had to respin 
for https://issues.apache.org/jira/browse/SOLR-13126 anyhow. 

> Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
> --
>
> Key: LUCENE-8292
> URL: https://issues.apache.org/jira/browse/LUCENE-8292
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2.1
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: trunk, 8.0, 8.x, master (9.0)
>
> Attachments: 
> 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, 
> LUCENE-8292.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many 
> methods.
> It misses some seekExact() methods, thus it is not possible to the delegate 
> to override these methods to have specific behavior (unlike the TermsEnum API 
> which allows that).
> The fix is straightforward: simply override these seekExact() methods and 
> delegate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods

2019-02-15 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8292.
-
   Resolution: Fixed
Fix Version/s: master (9.0)
   8.x
   8.0

> Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
> --
>
> Key: LUCENE-8292
> URL: https://issues.apache.org/jira/browse/LUCENE-8292
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2.1
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: trunk, 8.0, 8.x, master (9.0)
>
> Attachments: 
> 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, 
> LUCENE-8292.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many 
> methods.
> It misses some seekExact() methods, thus it is not possible to the delegate 
> to override these methods to have specific behavior (unlike the TermsEnum API 
> which allows that).
> The fix is straightforward: simply override these seekExact() methods and 
> delegate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods

2019-02-15 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769324#comment-16769324
 ] 

Simon Willnauer commented on LUCENE-8292:
-

I opened a PR here https://github.com/apache/lucene-solr/pull/574

> Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
> --
>
> Key: LUCENE-8292
> URL: https://issues.apache.org/jira/browse/LUCENE-8292
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2.1
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: trunk
>
> Attachments: 
> 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, 
> LUCENE-8292.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many 
> methods.
> It misses some seekExact() methods, thus it is not possible to the delegate 
> to override these methods to have specific behavior (unlike the TermsEnum API 
> which allows that).
> The fix is straightforward: simply override these seekExact() methods and 
> delegate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods

2019-02-13 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767061#comment-16767061
 ] 

Simon Willnauer commented on LUCENE-8292:
-

I do see both points here. [~dsmiley] I hate how trappy this is and [~jpountz] 
I completely agree with you. My suggestions here would be to add an additional 
class TermsEnum that has all methods abstract and BaseTermsEnum that can add 
default impls. FilterTermsEnum then subclasses TermsEnum and does the right 
thing. Other classes that don't need to override stuff like seekExact and 
seek(BytesRef, TermState) / TermState termState() can simply subclass 
BaseTermsEnum and we don't have to duplicate code all over the place. I don't 
think we need to do this in other places were we have the same pattern but in 
this case the traps are significant and we can fix it with a simple class 
in-between?



> Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
> --
>
> Key: LUCENE-8292
> URL: https://issues.apache.org/jira/browse/LUCENE-8292
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2.1
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: trunk
>
> Attachments: 
> 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, 
> LUCENE-8292.patch
>
>
> FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many 
> methods.
> It misses some seekExact() methods, thus it is not possible to the delegate 
> to override these methods to have specific behavior (unlike the TermsEnum API 
> which allows that).
> The fix is straightforward: simply override these seekExact() methods and 
> delegate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract + delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-02-08 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763762#comment-16763762
 ] 

Simon Willnauer commented on LUCENE-8662:
-

[~tomasflobbe] yes I think this should go into 8.0 - feel free to pull it in, I 
will do it next week once I am back at the keyboard.

> Change TermsEnum.seekExact(BytesRef) to abstract + delegate 
> seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> ---
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Solr uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
> I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8664) Add equals/hashcode to TotalHits

2019-01-30 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8664.
-
   Resolution: Fixed
Fix Version/s: master (9.0)
   8.0

> Add equals/hashcode to TotalHits
> 
>
> Key: LUCENE-8664
> URL: https://issues.apache.org/jira/browse/LUCENE-8664
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
> Fix For: 8.0, master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it would be convenient to add equals/hashcode methods to the 
> TotalHits class. I opened a PR here: 
> [https://github.com/apache/lucene-solr/pull/552] .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8664) Add equals/hashcode to TotalHits

2019-01-30 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756032#comment-16756032
 ] 

Simon Willnauer commented on LUCENE-8664:
-

pushed - thanks [~lucacavanna]

> Add equals/hashcode to TotalHits
> 
>
> Key: LUCENE-8664
> URL: https://issues.apache.org/jira/browse/LUCENE-8664
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
> Fix For: 8.0, master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it would be convenient to add equals/hashcode methods to the 
> TotalHits class. I opened a PR here: 
> [https://github.com/apache/lucene-solr/pull/552] .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8664) Add equals/hashcode to TotalHits

2019-01-29 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754987#comment-16754987
 ] 

Simon Willnauer commented on LUCENE-8664:
-

[~lucacavanna] what's the usecase for this? Why are you trying to put this into 
a map or something? Can you explain this a bit further?

> Add equals/hashcode to TotalHits
> 
>
> Key: LUCENE-8664
> URL: https://issues.apache.org/jira/browse/LUCENE-8664
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>
> I think it would be convenient to add equals/hashcode methods to the 
> TotalHits class. I opened a PR here: 
> [https://github.com/apache/lucene-solr/pull/552] .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-29 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754984#comment-16754984
 ] 

Simon Willnauer commented on LUCENE-8662:
-


{noformat}
If we think that it's a trap, we should remove the default impl and make it 
abstract (in 8.0).
{noformat}
I agree with this. I think it can be trappy and such an expert API shouldn't. 
Let make it abstract?

> Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
> I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8639) SeqNo accounting in IW is broken if many threads start indexing while we flush.

2019-01-16 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8639.
-
   Resolution: Fixed
Fix Version/s: master (9.0)
   7.7
   8.0

> SeqNo accounting in IW is broken if many threads start indexing while we 
> flush.
> ---
>
> Key: LUCENE-8639
> URL: https://issues.apache.org/jira/browse/LUCENE-8639
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: 8.0, 7.7, master (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While this is rare in the wild we have a test failure that shows that our 
> seqNo accounting is broken  when we carry over seqNo to a new delete queue. 
> We had this test-failure:
> {noformat}
> 6:06:08[junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs
> 16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
> 16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-8,5,TGRP-TestIndexTooManyDocs]
> 16:06:08[junit4]   2> java.lang.AssertionError: seqNo=7 vs maxSeqNo=6
> 16:06:08[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
> 16:06:08[junit4]   2> at java.lang.Thread.run(Thread.java:748)
> 16:06:08[junit4]   2> 
> 16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
> 16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-9,5,TGRP-TestIndexTooManyDocs]
> 16:06:08[junit4]   2> java.lang.AssertionError: seqNo=6 vs maxSeqNo=6
> 16:06:08[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
> 16:06:08[junit4]   2> at java.lang.Thread.run(Thread.java:748)
> 16:06:08[junit4]   2> 
> 16:06:08[junit4]   2> ??? 14, 2019 11:05:45 ? 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
> 16:06:08[junit4]   2> WARNING: Suite execution timed out: 
> org.apache.lucene.index.TestIndexTooManyDocs
> 16:06:08[junit4]   2>1) Thread[id=20, 
> name=SUITE-TestIndexTooManyDocs-seed#[43B7C75B765AFEBD], state=RUNNABLE, 
> group=TGRP-TestIndexTooManyDocs]
> 16:06:08[junit4]   2> at 
> java.lang.Thread.getStackTrace(Thread.j

[jira] [Commented] (LUCENE-8639) SeqNo accounting in IW is broken if many threads start indexing while we flush.

2019-01-15 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743155#comment-16743155
 ] 

Simon Willnauer commented on LUCENE-8639:
-

[~mikemccand] can you take a look at the PR?

> SeqNo accounting in IW is broken if many threads start indexing while we 
> flush.
> ---
>
> Key: LUCENE-8639
> URL: https://issues.apache.org/jira/browse/LUCENE-8639
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While this is rare in the wild we have a test failure that shows that our 
> seqNo accounting is broken  when we carry over seqNo to a new delete queue. 
> We had this test-failure:
> {noformat}
> 6:06:08[junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs
> 16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
> 16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-8,5,TGRP-TestIndexTooManyDocs]
> 16:06:08[junit4]   2> java.lang.AssertionError: seqNo=7 vs maxSeqNo=6
> 16:06:08[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
> 16:06:08[junit4]   2> at java.lang.Thread.run(Thread.java:748)
> 16:06:08[junit4]   2> 
> 16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
> 16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-9,5,TGRP-TestIndexTooManyDocs]
> 16:06:08[junit4]   2> java.lang.AssertionError: seqNo=6 vs maxSeqNo=6
> 16:06:08[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
> 16:06:08[junit4]   2> at java.lang.Thread.run(Thread.java:748)
> 16:06:08[junit4]   2> 
> 16:06:08[junit4]   2> ??? 14, 2019 11:05:45 ? 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
> 16:06:08[junit4]   2> WARNING: Suite execution timed out: 
> org.apache.lucene.index.TestIndexTooManyDocs
> 16:06:08[junit4]   2>1) Thread[id=20, 
> name=SUITE-TestIndexTooManyDocs-seed#[43B7C75B765AFEBD], state=RUNNABLE, 
> group=TGRP-TestIndexTooManyDocs]
> 16:06:08[junit4]   2> at 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> 16:06:08[junit4]   2> at

[jira] [Created] (LUCENE-8639) SeqNo accounting in IW is broken if many threads start indexing while we flush.

2019-01-15 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8639:
---

 Summary: SeqNo accounting in IW is broken if many threads start 
indexing while we flush.
 Key: LUCENE-8639
 URL: https://issues.apache.org/jira/browse/LUCENE-8639
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer


While this is rare in the wild we have a test failure that shows that our seqNo 
accounting is broken  when we carry over seqNo to a new delete queue. We had 
this test-failure:

{noformat}
6:06:08[junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs
16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
Thread[Thread-8,5,TGRP-TestIndexTooManyDocs]
16:06:08[junit4]   2> java.lang.AssertionError: seqNo=7 vs maxSeqNo=6
16:06:08[junit4]   2>   at 
__randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
16:06:08[junit4]   2>   at java.lang.Thread.run(Thread.java:748)
16:06:08[junit4]   2> 
16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
Thread[Thread-9,5,TGRP-TestIndexTooManyDocs]
16:06:08[junit4]   2> java.lang.AssertionError: seqNo=6 vs maxSeqNo=6
16:06:08[junit4]   2>   at 
__randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
16:06:08[junit4]   2>   at java.lang.Thread.run(Thread.java:748)
16:06:08[junit4]   2> 
16:06:08[junit4]   2> ??? 14, 2019 11:05:45 ? 
com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
16:06:08[junit4]   2> WARNING: Suite execution timed out: 
org.apache.lucene.index.TestIndexTooManyDocs
16:06:08[junit4]   2>1) Thread[id=20, 
name=SUITE-TestIndexTooManyDocs-seed#[43B7C75B765AFEBD], state=RUNNABLE, 
group=TGRP-TestIndexTooManyDocs]
16:06:08[junit4]   2> at 
java.lang.Thread.getStackTrace(Thread.java:1559)
16:06:08[junit4]   2> at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:696)
16:06:08[junit4]   2> at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:693)
16:06:08[junit4]   2> at 
java.security.AccessController.doPrivileged(Native Method)
16:06:08[junit4]   2> at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.getStackTrace(ThreadLeakControl.java:693)
16:06:08[junit4]   2> at

[jira] [Commented] (LUCENE-8525) throw more specific exception on data corruption

2019-01-11 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740186#comment-16740186
 ] 

Simon Willnauer commented on LUCENE-8525:
-

I do agree with [~rcmuir] here. There is not much to do in terms of detecting 
this particular problem on DataInput and friends. One way to improve this would 
certainly be the wording on the java doc. We can just clarify that detecting 
_CorruptIndexException_ is best effort. 
Another idea is to checksum the entire file before we read the commit we can 
either do this on the Elasticsearch end or improve _SegmentInfos#readCommit_ . 
Reading this file twice isn't a big deal I guess.

> throw more specific exception on data corruption
> 
>
> Key: LUCENE-8525
> URL: https://issues.apache.org/jira/browse/LUCENE-8525
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Vladimir Dolzhenko
>Priority: Major
>
> DataInput throws generic IOException if data looks odd
> [DataInput:141|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141]
> there are other examples like 
> [BufferedIndexInput:219|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java#L219],
>  
> [CompressionMode:226|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L226]
>  and maybe 
> [DocIdsWriter:81|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L81]
> That leads to some difficulties - see [elasticsearch 
> #34322|https://github.com/elastic/elasticsearch/issues/34322]
> It would be better if it throws more specific exception.
> As a consequence 
> [SegmentInfos.readCommit|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L281]
>  violates its own contract
> {code:java}
> /**
>* @throws CorruptIndexException if the index is corrupt
>* @throws IOException if there is a low-level IO error
>*/
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8609) Allow getting consistent docstats from IndexWriter

2018-12-15 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722290#comment-16722290
 ] 

Simon Willnauer commented on LUCENE-8609:
-

[~sokolov] I opened [https://github.com/mikemccand/luceneutil/pull/28/] /cc 
[~mikemccand]

> Allow getting consistent docstats from IndexWriter
> --
>
> Key: LUCENE-8609
> URL: https://issues.apache.org/jira/browse/LUCENE-8609
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (8.0), 7.7
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
> to get all stats for the current index but it's subject to concurrency
> and might return numbers that are not consistent ie. some cases can
> return maxDoc < numDocs which is undesirable. This change adds a 
> getDocStats()
> method to index writer to allow fetching consistent numbers for these 
> stats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8609) Allow getting consistent docstats from IndexWriter

2018-12-14 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8609.
-
   Resolution: Fixed
Fix Version/s: 7.7
   master (8.0)

thanks everybody

> Allow getting consistent docstats from IndexWriter
> --
>
> Key: LUCENE-8609
> URL: https://issues.apache.org/jira/browse/LUCENE-8609
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (8.0), 7.7
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
> to get all stats for the current index but it's subject to concurrency
> and might return numbers that are not consistent ie. some cases can
> return maxDoc < numDocs which is undesirable. This change adds a 
> getDocStats()
> method to index writer to allow fetching consistent numbers for these 
> stats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8609) Allow getting consistent docstats from IndexWriter

2018-12-14 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721587#comment-16721587
 ] 

Simon Willnauer commented on LUCENE-8609:
-

[~mikemccand] [~jpountz] [~dnhatn] I pushed new changes to the PR 

> Allow getting consistent docstats from IndexWriter
> --
>
> Key: LUCENE-8609
> URL: https://issues.apache.org/jira/browse/LUCENE-8609
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (8.0), 7.7
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
> to get all stats for the current index but it's subject to concurrency
> and might return numbers that are not consistent ie. some cases can
> return maxDoc < numDocs which is undesirable. This change adds a 
> getDocStats()
> method to index writer to allow fetching consistent numbers for these 
> stats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8609) Allow getting consistent docstats from IndexWriter

2018-12-13 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8609:
---

 Summary: Allow getting consistent docstats from IndexWriter
 Key: LUCENE-8609
 URL: https://issues.apache.org/jira/browse/LUCENE-8609
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: master (8.0), 7.7
Reporter: Simon Willnauer


 Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
to get all stats for the current index but it's subject to concurrency
and might return numbers that are not consistent ie. some cases can
return maxDoc < numDocs which is undesirable. This change adds a 
getDocStats()
method to index writer to allow fetching consistent numbers for these stats.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8609) Allow getting consistent docstats from IndexWriter

2018-12-13 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720279#comment-16720279
 ] 

Simon Willnauer commented on LUCENE-8609:
-

one question here is if we should deprecate the `maxDoc` / `numDocs` methods in 
favor of this?

> Allow getting consistent docstats from IndexWriter
> --
>
> Key: LUCENE-8609
> URL: https://issues.apache.org/jira/browse/LUCENE-8609
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (8.0), 7.7
>Reporter: Simon Willnauer
>Priority: Major
>
>  Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
> to get all stats for the current index but it's subject to concurrency
> and might return numbers that are not consistent ie. some cases can
> return maxDoc < numDocs which is undesirable. This change adds a 
> getDocStats()
> method to index writer to allow fetching consistent numbers for these 
> stats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8608) Extract utility class to iterate over terms docs

2018-12-13 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8608.
-
Resolution: Fixed

> Extract utility class to iterate over terms docs
> 
>
> Key: LUCENE-8608
> URL: https://issues.apache.org/jira/browse/LUCENE-8608
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Today we re-implement the same algorithm in various places
> when we want to consume all docs for a set/list of terms. This
> caused serious slowdowns for instance in the case of applying
> updates fixed in LUCENE-8602. This change extracts the common
> usage and shares the interation code including logic to reuse
> Terms and PostingsEnum instances as much as possble and adds
> tests for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8608) Extract utility class to iterate over terms docs

2018-12-12 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8608:
---

 Summary: Extract utility class to iterate over terms docs
 Key: LUCENE-8608
 URL: https://issues.apache.org/jira/browse/LUCENE-8608
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer
 Fix For: master (8.0), 7.7


Today we re-implement the same algorithm in various places
when we want to consume all docs for a set/list of terms. This
caused serious slowdowns for instance in the case of applying
updates fixed in LUCENE-8602. This change extracts the common
usage and shares the interation code including logic to reuse
Terms and PostingsEnum instances as much as possble and adds
tests for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8600) DocValuesFieldUpdates should use a better sort

2018-12-12 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718992#comment-16718992
 ] 

Simon Willnauer commented on LUCENE-8600:
-

to be honest I am not very much concerned about this causing OOMs. In the worst 
case we would use 4 byte per ord x the number of updates in the package that 
means we need about 300k updates to consume ~1MB of RAM here. I think that is 
an unlikely scenario. Additionally this is transient memory so I thin we are 
good here [~dweiss] 

+1 to the patch from my side

> DocValuesFieldUpdates should use a better sort
> --
>
> Key: LUCENE-8600
> URL: https://issues.apache.org/jira/browse/LUCENE-8600
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8600.patch
>
>
> This is a follow-up to LUCENE-8598: Simon identified that swaps are a 
> bottleneck to applying doc-value updates, in particular due to the overhead 
> of packed ints. It turns out that InPlaceMergeSorter does LOTS of swaps in 
> order to perform in-place. Replacing with a more efficient sort should help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8602) Share TermsEnum if possible while applying DV updates

2018-12-11 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8602.
-
Resolution: Fixed

>  Share TermsEnum if possible while applying DV updates
> --
>
> Key: LUCENE-8602
> URL: https://issues.apache.org/jira/browse/LUCENE-8602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  Today we pull a new terms enum when we apply DV updates even though the
> field stays the same which is the common case. Benchmarking this on a
> larger term dictionary with a significant number of updates shows a
> 2x improvement in performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8602) Share TermsEnum if possible while applying DV updates

2018-12-11 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8602:
---

 Summary:  Share TermsEnum if possible while applying DV updates
 Key: LUCENE-8602
 URL: https://issues.apache.org/jira/browse/LUCENE-8602
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer
 Fix For: master (8.0), 7.7


 Today we pull a new terms enum when we apply DV updates even though the
field stays the same which is the common case. Benchmarking this on a
larger term dictionary with a significant number of updates shows a
2x improvement in performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8599) Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates

2018-12-10 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8599.
-
Resolution: Fixed

> Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates
> ---
>
> Key: LUCENE-8599
> URL: https://issues.apache.org/jira/browse/LUCENE-8599
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Using a sparse bitset in SingleValueDocValuesFieldUdpates allows storing
> which documents have an update much more efficient and prevents the need
> to sort the docs array altogether that showed to be a significant 
> bottleneck
> in LUCENE-8598. Using the spares bitset yields another 10x performance 
> improvement
> in applying updates versus the changes proposed in LUCENE-8598.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8598) Improve field updates packed values

2018-12-10 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8598.
-
Resolution: Fixed

> Improve field updates packed values
> ---
>
> Key: LUCENE-8598
> URL: https://issues.apache.org/jira/browse/LUCENE-8598
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
>  DocValuesFieldUpdats are using compact settings for packet ints that causes
> dramatic slowdowns when the updates are finished and sorted. Moving to 
> the default
> accepted overhead ratio yields up to 4x improvements in applying updates. 
> This change
> also improves the packing of numeric values since we know the value range 
> in advance and
> can choose a different packing scheme in such a case.
> Overall this change yields a good performance improvement since 99% of 
> the times of applying
> DV field updates are spend in the sort method which essentially makes 
> applying the updates
> 4x faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8599) Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates

2018-12-10 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8599:
---

 Summary: Use sparse bitset to store docs in 
SingleValueDocValuesFieldUpdates
 Key: LUCENE-8599
 URL: https://issues.apache.org/jira/browse/LUCENE-8599
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer
 Fix For: master (8.0), 7.7


Using a sparse bitset in SingleValueDocValuesFieldUdpates allows storing
which documents have an update much more efficient and prevents the need
to sort the docs array altogether that showed to be a significant bottleneck
in LUCENE-8598. Using the spares bitset yields another 10x performance 
improvement
in applying updates versus the changes proposed in LUCENE-8598.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8598) Improve field updates packed values

2018-12-09 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714051#comment-16714051
 ] 

Simon Willnauer commented on LUCENE-8598:
-

I ran a benchmark to update 1 values on a single segment 100 times:

||setup||patch time in ms||master time in ms||
|shared single value|10131|38430|
|random values|30985|69600|


> Improve field updates packed values
> ---
>
> Key: LUCENE-8598
> URL: https://issues.apache.org/jira/browse/LUCENE-8598
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  DocValuesFieldUpdats are using compact settings for packet ints that causes
> dramatic slowdowns when the updates are finished and sorted. Moving to 
> the default
> accepted overhead ratio yields up to 4x improvements in applying updates. 
> This change
> also improves the packing of numeric values since we know the value range 
> in advance and
> can choose a different packing scheme in such a case.
> Overall this change yields a good performance improvement since 99% of 
> the times of applying
> DV field updates are spend in the sort method which essentially makes 
> applying the updates
> 4x faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8598) Improve field updates packed values

2018-12-09 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714051#comment-16714051
 ] 

Simon Willnauer edited comment on LUCENE-8598 at 12/9/18 6:28 PM:
--

I ran a benchmark to update 1 values on a single segment 100 times:

||setup||patch time in ms||master time in ms||
|shared single value|10131|38430|
|random values|30985|69600|

the reason I looked into it is that I wrote the benchmark to test another 
change that I made and saw the sorting showing up in a profiler spending 99% in 
the finish method. I also tested other acceptable overhead rations but they 
didn't show any speedups ie. FAST and FASTEST. 



was (Author: simonw):
I ran a benchmark to update 1 values on a single segment 100 times:

||setup||patch time in ms||master time in ms||
|shared single value|10131|38430|
|random values|30985|69600|


> Improve field updates packed values
> ---
>
> Key: LUCENE-8598
> URL: https://issues.apache.org/jira/browse/LUCENE-8598
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  DocValuesFieldUpdats are using compact settings for packet ints that causes
> dramatic slowdowns when the updates are finished and sorted. Moving to 
> the default
> accepted overhead ratio yields up to 4x improvements in applying updates. 
> This change
> also improves the packing of numeric values since we know the value range 
> in advance and
> can choose a different packing scheme in such a case.
> Overall this change yields a good performance improvement since 99% of 
> the times of applying
> DV field updates are spend in the sort method which essentially makes 
> applying the updates
> 4x faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8598) Improve field updates packed values

2018-12-09 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8598:
---

 Summary: Improve field updates packed values
 Key: LUCENE-8598
 URL: https://issues.apache.org/jira/browse/LUCENE-8598
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer
 Fix For: master (8.0), 7.7






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8598) Improve field updates packed values

2018-12-09 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8598:

Description: 
 DocValuesFieldUpdats are using compact settings for packet ints that causes
dramatic slowdowns when the updates are finished and sorted. Moving to the 
default
accepted overhead ratio yields up to 4x improvements in applying updates. 
This change
also improves the packing of numeric values since we know the value range 
in advance and
can choose a different packing scheme in such a case.
Overall this change yields a good performance improvement since 99% of the 
times of applying
DV field updates are spend in the sort method which essentially makes 
applying the updates
4x faster.

> Improve field updates packed values
> ---
>
> Key: LUCENE-8598
> URL: https://issues.apache.org/jira/browse/LUCENE-8598
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>
>  DocValuesFieldUpdats are using compact settings for packet ints that causes
> dramatic slowdowns when the updates are finished and sorted. Moving to 
> the default
> accepted overhead ratio yields up to 4x improvements in applying updates. 
> This change
> also improves the packing of numeric values since we know the value range 
> in advance and
> can choose a different packing scheme in such a case.
> Overall this change yields a good performance improvement since 99% of 
> the times of applying
> DV field updates are spend in the sort method which essentially makes 
> applying the updates
> 4x faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8592) MultiSorter#sort incorrectly sort Integer/Long#MIN_VALUE when the natural sort is reversed

2018-12-07 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8592:

Affects Version/s: master (8.0)
   7.5
 Priority: Blocker  (was: Major)
Fix Version/s: master (8.0)
   7.6

> MultiSorter#sort incorrectly sort Integer/Long#MIN_VALUE when the natural 
> sort is reversed
> --
>
> Key: LUCENE-8592
> URL: https://issues.apache.org/jira/browse/LUCENE-8592
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.5, master (8.0)
>Reporter: Jim Ferenczi
>Priority: Blocker
> Fix For: 7.6, master (8.0)
>
> Attachments: LUCENE-8592.patch
>
>
> MultiSorter#getComparableProviders on an integer or long field doesn't handle 
> MIN_VALUE correctly when the natural order is reversed. To handle reverse 
> sort we use the negation of the value but there is no check for overflows so 
> MIN_VALUE for ints and longs are always sorted first (even if the natural 
> order is reversed). 
> This method is used by index sorting when merging already sorted segments 
> together. This means that a sorted index can be incorrectly sorted if it uses 
> a reverse sort and a missing value set to MIN_VALUE (long or int or values 
> inside the segment that are equals to MIN_VALUE).
> This a bad bug because it affects the documents order inside segments and 
> only a reindex can restore the correct sort order. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8592) MultiSorter#sort incorrectly sort Integer/Long#MIN_VALUE when the natural sort is reversed

2018-12-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712870#comment-16712870
 ] 

Simon Willnauer commented on LUCENE-8592:
-

the patch looks good to me. Yet, I am not 100% on top of this code if there are 
other places that need to be fixed. Still +1 to commit.

> MultiSorter#sort incorrectly sort Integer/Long#MIN_VALUE when the natural 
> sort is reversed
> --
>
> Key: LUCENE-8592
> URL: https://issues.apache.org/jira/browse/LUCENE-8592
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
>Priority: Major
> Attachments: LUCENE-8592.patch
>
>
> MultiSorter#getComparableProviders on an integer or long field doesn't handle 
> MIN_VALUE correctly when the natural order is reversed. To handle reverse 
> sort we use the negation of the value but there is no check for overflows so 
> MIN_VALUE for ints and longs are always sorted first (even if the natural 
> order is reversed). 
> This method is used by index sorting when merging already sorted segments 
> together. This means that a sorted index can be incorrectly sorted if it uses 
> a reverse sort and a missing value set to MIN_VALUE (long or int or values 
> inside the segment that are equals to MIN_VALUE).
> This a bad bug because it affects the documents order inside segments and 
> only a reindex can restore the correct sort order. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8595) TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails

2018-12-06 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8595.
-
   Resolution: Fixed
Fix Version/s: 7.7
   master (8.0)
   7.6

> TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails
> --
>
> Key: LUCENE-8595
> URL: https://issues.apache.org/jira/browse/LUCENE-8595
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: master (8.0)
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 7.6, master (8.0), 7.7
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It does reproduce ... I haven't dug in:
>  
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestMixedDocValuesUpdates 
> -Dtests.method=testTryUpdateMultiThreaded -Dtests.seed=E079543483688908 
> -Dtests.badapples=true -Dtests.loc\
> ale=mt-MT -Dtests.timezone=VST -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>    [junit4] FAILURE 0.69s | 
> TestMixedDocValuesUpdates.testTryUpdateMultiThreaded <<<
>    [junit4]    > Throwable #1: java.lang.AssertionError: docID: 63
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([E079543483688908:4809171572AE9A81]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestMixedDocValuesUpdates.testTryUpdateMultiThreaded(TestMixedDocValuesUpdates.java:526)
>    [junit4]    >        at java.lang.Thread.run(Thread.java:745)
>    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene80): 
> {id=PostingsFormat(name=LuceneVarGapFixedInterval)}, 
> docValues:{value=DocValuesFormat(name=Lucene70)}, maxPointsInLeafNode=13\
> 12, maxMBSortInHeap=7.5990910168370895, 
> sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@e08c0f3),
>  locale=mt-MT, timezone=VST
>    [junit4]   2> NOTE: Linux 4.4.0-92-generic amd64/Oracle Corporation 
> 1.8.0_121 (64-bit)/cpus=8,threads=1,free=446496544,total=514850816
>    [junit4]   2> NOTE: All tests run in this JVM: [TestMixedDocValuesUpdates]
>    [junit4] Completed [1/1 (1!)] in 0.83s, 1 test, 1 failure <<< 
> FAILURES!{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8595) TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails

2018-12-06 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712005#comment-16712005
 ] 

Simon Willnauer commented on LUCENE-8595:
-

[~jpountz] I think the patch is not enough. I attached a PR including tests and 
an additional fix. can you take a look?

> TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails
> --
>
> Key: LUCENE-8595
> URL: https://issues.apache.org/jira/browse/LUCENE-8595
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: master (8.0)
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It does reproduce ... I haven't dug in:
>  
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestMixedDocValuesUpdates 
> -Dtests.method=testTryUpdateMultiThreaded -Dtests.seed=E079543483688908 
> -Dtests.badapples=true -Dtests.loc\
> ale=mt-MT -Dtests.timezone=VST -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>    [junit4] FAILURE 0.69s | 
> TestMixedDocValuesUpdates.testTryUpdateMultiThreaded <<<
>    [junit4]    > Throwable #1: java.lang.AssertionError: docID: 63
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([E079543483688908:4809171572AE9A81]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestMixedDocValuesUpdates.testTryUpdateMultiThreaded(TestMixedDocValuesUpdates.java:526)
>    [junit4]    >        at java.lang.Thread.run(Thread.java:745)
>    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene80): 
> {id=PostingsFormat(name=LuceneVarGapFixedInterval)}, 
> docValues:{value=DocValuesFormat(name=Lucene70)}, maxPointsInLeafNode=13\
> 12, maxMBSortInHeap=7.5990910168370895, 
> sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@e08c0f3),
>  locale=mt-MT, timezone=VST
>    [junit4]   2> NOTE: Linux 4.4.0-92-generic amd64/Oracle Corporation 
> 1.8.0_121 (64-bit)/cpus=8,threads=1,free=446496544,total=514850816
>    [junit4]   2> NOTE: All tests run in this JVM: [TestMixedDocValuesUpdates]
>    [junit4] Completed [1/1 (1!)] in 0.83s, 1 test, 1 failure <<< 
> FAILURES!{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8595) TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails

2018-12-06 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711968#comment-16711968
 ] 

Simon Willnauer commented on LUCENE-8595:
-

++ to the patch. This makes sense. would be great if we had a test for this.

> TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails
> --
>
> Key: LUCENE-8595
> URL: https://issues.apache.org/jira/browse/LUCENE-8595
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: master (8.0)
>Reporter: Michael McCandless
>Priority: Major
>
> It does reproduce ... I haven't dug in:
>  
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestMixedDocValuesUpdates 
> -Dtests.method=testTryUpdateMultiThreaded -Dtests.seed=E079543483688908 
> -Dtests.badapples=true -Dtests.loc\
> ale=mt-MT -Dtests.timezone=VST -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>    [junit4] FAILURE 0.69s | 
> TestMixedDocValuesUpdates.testTryUpdateMultiThreaded <<<
>    [junit4]    > Throwable #1: java.lang.AssertionError: docID: 63
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([E079543483688908:4809171572AE9A81]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestMixedDocValuesUpdates.testTryUpdateMultiThreaded(TestMixedDocValuesUpdates.java:526)
>    [junit4]    >        at java.lang.Thread.run(Thread.java:745)
>    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene80): 
> {id=PostingsFormat(name=LuceneVarGapFixedInterval)}, 
> docValues:{value=DocValuesFormat(name=Lucene70)}, maxPointsInLeafNode=13\
> 12, maxMBSortInHeap=7.5990910168370895, 
> sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@e08c0f3),
>  locale=mt-MT, timezone=VST
>    [junit4]   2> NOTE: Linux 4.4.0-92-generic amd64/Oracle Corporation 
> 1.8.0_121 (64-bit)/cpus=8,threads=1,free=446496544,total=514850816
>    [junit4]   2> NOTE: All tests run in this JVM: [TestMixedDocValuesUpdates]
>    [junit4] Completed [1/1 (1!)] in 0.83s, 1 test, 1 failure <<< 
> FAILURES!{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8594) DV update are broken for updates on new field

2018-12-06 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8594.
-
   Resolution: Fixed
Fix Version/s: master (8.0)

> DV update are broken for updates on new field
> -
>
> Key: LUCENE-8594
> URL: https://issues.apache.org/jira/browse/LUCENE-8594
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: master (8.0)
>Reporter: Simon Willnauer
>Priority: Blocker
> Fix For: master (8.0)
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> A segmemnt written with Lucene70Codec failes if it ties to update
> a DV field that didn't exist in the index before it was upgraded to
> Lucene80Codec. We bake the DV format into the FieldInfo when it's used
> the first time and therefor never go to the codec if we need to update.
> yet on a field that didn't exist before and was added during an indexing
> operation we have to consult the coded and get an exception.
> This change fixes this issue and adds the relevant bwc tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8594) DV update are broken for updates on new field

2018-12-06 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8594:

Affects Version/s: (was: 7.7)

> DV update are broken for updates on new field
> -
>
> Key: LUCENE-8594
> URL: https://issues.apache.org/jira/browse/LUCENE-8594
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: master (8.0)
>Reporter: Simon Willnauer
>Priority: Blocker
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A segmemnt written with Lucene70Codec failes if it ties to update
> a DV field that didn't exist in the index before it was upgraded to
> Lucene80Codec. We bake the DV format into the FieldInfo when it's used
> the first time and therefor never go to the codec if we need to update.
> yet on a field that didn't exist before and was added during an indexing
> operation we have to consult the coded and get an exception.
> This change fixes this issue and adds the relevant bwc tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8594) DV update are broken for updates on new field

2018-12-06 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8594:
---

 Summary: DV update are broken for updates on new field
 Key: LUCENE-8594
 URL: https://issues.apache.org/jira/browse/LUCENE-8594
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: master (8.0), 7.7
Reporter: Simon Willnauer


A segmemnt written with Lucene70Codec failes if it ties to update
a DV field that didn't exist in the index before it was upgraded to
Lucene80Codec. We bake the DV format into the FieldInfo when it's used
the first time and therefor never go to the codec if we need to update.
yet on a field that didn't exist before and was added during an indexing
operation we have to consult the coded and get an exception.
This change fixes this issue and adds the relevant bwc tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8594) DV update are broken for updates on new field

2018-12-06 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8594:

Issue Type: Bug  (was: Improvement)

> DV update are broken for updates on new field
> -
>
> Key: LUCENE-8594
> URL: https://issues.apache.org/jira/browse/LUCENE-8594
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: master (8.0), 7.7
>Reporter: Simon Willnauer
>Priority: Blocker
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A segmemnt written with Lucene70Codec failes if it ties to update
> a DV field that didn't exist in the index before it was upgraded to
> Lucene80Codec. We bake the DV format into the FieldInfo when it's used
> the first time and therefor never go to the codec if we need to update.
> yet on a field that didn't exist before and was added during an indexing
> operation we have to consult the coded and get an exception.
> This change fixes this issue and adds the relevant bwc tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8593) Specialize single value numeric DV updates

2018-12-06 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8593.
-
Resolution: Fixed

> Specialize single value numeric DV updates
> --
>
> Key: LUCENE-8593
> URL: https://issues.apache.org/jira/browse/LUCENE-8593
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  The case when all values are the the same on a numeric field update
> is common for soft_deletes. With the new infrastucture for buffering
> DV updates we can gain an easy win by specializing the applied updates
> if all values are the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8593) Specialize single value numeric DV updates

2018-12-05 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8593:
---

 Summary: Specialize single value numeric DV updates
 Key: LUCENE-8593
 URL: https://issues.apache.org/jira/browse/LUCENE-8593
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer
 Fix For: master (8.0), 7.7


 The case when all values are the the same on a numeric field update
is common for soft_deletes. With the new infrastucture for buffering
DV updates we can gain an easy win by specializing the applied updates
if all values are the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2544 matches

Mail list logo