[jira] [Comment Edited] (LUCENE-8688) Forced merges merge more than necessary

2019-03-03 Thread Armin Braun (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783034#comment-16783034
 ] 

Armin Braun edited comment on LUCENE-8688 at 3/4/19 6:56 AM:
-

Thanks [~jpountz] you're right, I accidentally changed behaviour for the case 
of an already running merge. I fixed that now I think by putting in that guard 
for this breakout.
It's a little different than it was before in that not all merges will be of 
the max-merge count before the final merge due to the size check, obviously.

Unfortunately I wasn't quite able to find the new utility methods to simulate 
an ongoing merge, could you point me at what method(s) you had in mind? :)
In and of itself the logic seemed to work fine (I simulated the case by not 
invoking `IndexWriter#forceMergeDeletes` before calling the force merge and 
using a smaller buffered docs count to get larger (in number of segments) 
merges, but obviously a proper test would be great :)


was (Author: original-brownbear):
Thanks [~jpountz] you're right, I accidentally changed behaviourfor the case of 
an already running merge. I fixed that now I think by putting in that guard for 
this breakout.
It's a little different than it was before in that not all merges will be of 
the max-merge count before the final merge due to the size check, obviously.

Unfortunately I wasn't quite able to find the new utility methods to simulate 
an ongoing merge, could you point me at what method(s) you had in mind? :)
In and of itself the logic seemed to work fine (I simulated the case by not 
invoking `IndexWriter#forceMergeDeletes` before calling the force merge and 
using a smaller buffered docs count to get larger (in number of segments) 
merges, but obviously a proper test would be great :)

> Forced merges merge more than necessary
> ---
>
> Key: LUCENE-8688
> URL: https://issues.apache.org/jira/browse/LUCENE-8688
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8688.patch, LUCENE-8688.patch
>
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes 
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would 
> result in an index that has maxSegmentCount segments at most. Now that we 
> share the same logic as regular merges, we are almost sure to pick a 
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that 
> have more segments usually score better. This is due to the fact that natural 
> merges assume that merges that run now save work for later, so the more 
> segments get merged, the better. This assumption doesn't hold for forced 
> merges that should run on read-only indices, so there won't be any future 
> merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8688) Forced merges merge more than necessary

2019-02-24 Thread Armin Braun (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776543#comment-16776543
 ] 

Armin Braun edited comment on LUCENE-8688 at 2/25/19 6:20 AM:
--

Gave this a shot in the attached patch:
 * Basically brought back the old logic (pre LUCENE-7976) of simply collecting 
as many of the smallest segments as possible ("possible" now including the max 
segment size check).
 ** Made the tradeoff of merging the smallest remaining segments  to get to the 
requested segment count
 ** Technically speaking one could do better than the above trade-off (in some 
cases) by using a smarter bin-packing algorithm but the above comment described 
merging large segments close to bin-size with tiny segments as wasteful so I 
didn't try that
 * Added a new rough test that checks that we arrive at the exact max segment 
count and don't exceed max segment size significantly (copied first few lines 
from the above test and should probably clean this up a little if the approach 
is ok)

 ** It's much stricter than the existing test for this size-wise

[^LUCENE-8688.patch]


was (Author: original-brownbear):
Gave this a shot in the attached patch:
 * Basically brought back the old logic (pre LUCENE-7976) of simply collecting 
as many of the smallest segments as possible ("possible" now including the max 
segment size check).
 ** Made the tradeoff of merging the smallest remaining segments  to get to the 
requested segment count
 ** Technically speaking one could do better than the above trade-off (in some 
cases) by using a smarter bin-packing algorithm but the above comment described 
merging large segments close to bin-size with tiny segments as wasteful so I 
didn't try that
 * Added a new rough test that checks that we arrive at the exact max segment 
count and don't exceed max segment size significantly 
 ** It's much stricter than the existing test for this size-wise

[^LUCENE-8688.patch]

> Forced merges merge more than necessary
> ---
>
> Key: LUCENE-8688
> URL: https://issues.apache.org/jira/browse/LUCENE-8688
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8688.patch
>
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes 
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would 
> result in an index that has maxSegmentCount segments at most. Now that we 
> share the same logic as regular merges, we are almost sure to pick a 
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that 
> have more segments usually score better. This is due to the fact that natural 
> merges assume that merges that run now save work for later, so the more 
> segments get merged, the better. This assumption doesn't hold for forced 
> merges that should run on read-only indices, so there won't be any future 
> merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org