[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904776#comment-15904776 ] ASF subversion and git services commented on LUCENE-7700: - Commit 8c5ea32bb9f2d9d8af98162e1e19c9559c8c602d in lucene-solr's branch refs/heads/branch_6x from [~dawid.weiss] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8c5ea32 ] LUCENE-7700: Move throughput control and merge aborting out of IndexWriter's core. > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 6.x, master (7.0), 6.5 > > Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch, > LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly coupled with a few classes -- > MergeRateLimiter, OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904774#comment-15904774 ] ASF subversion and git services commented on LUCENE-7700: - Commit 9540bc37583dfd4e995b893154039fcf031dc3c3 in lucene-solr's branch refs/heads/master from [~dawid.weiss] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9540bc3 ] LUCENE-7700: Move throughput control and merge aborting out of IndexWriter's core. > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 6.x, master (7.0), 6.5 > > Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch, > LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly coupled with a few classes -- > MergeRateLimiter, OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903178#comment-15903178 ] Michael McCandless commented on LUCENE-7700: I think it's fine to backport to 6.x. > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902900#comment-15902900 ] Dawid Weiss commented on LUCENE-7700: - Thanks for reviewing, Mike. I'm guessing this should go to 7.x only, right? Or can we backport it to 6.x as well, risking some minor API incompatibilities (these are internal APIs anyway, but who knows). > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902896#comment-15902896 ] Michael McCandless commented on LUCENE-7700: Thanks [~dweiss] the new patch looks great! +1 to push, and thank you for cleaning this up! > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901248#comment-15901248 ] Dawid Weiss commented on LUCENE-7700: - Ok, that was a trivial regression: {code} --- a/lucene/core/src/java/org/apache/lucene/index/MergePolicy.java +++ b/lucene/core/src/java/org/apache/lucene/index/MergePolicy.java @@ -177,7 +177,7 @@ public abstract class MergePolicy { } final void setMergeThread(Thread owner) { - assert owner == null; + assert this.owner == null; this.owner = owner; } } {code} > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901244#comment-15901244 ] Dawid Weiss commented on LUCENE-7700: - I screwed up something in the latest patch because I'm getting assertion errors, will fix. > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896183#comment-15896183 ] Michael McCandless commented on LUCENE-7700: bq. But the CFS thing – I don't think I changed anything there; Aha! You are correct! I was mis-reading the patch; I didn't realize the CFS change was just for {{addIndexes}}, but you're right. Building CFS for a merged segment is in fact going through the wrapped Directory, so it can be throttled ... good. I agree we should not change {{addIndexes}} behavior here. > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch, LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894758#comment-15894758 ] Dawid Weiss commented on LUCENE-7700: - I'm on short holidays, Mike. Will reply on Tuesday in full. But the CFS thing -- I don't think I changed anything there; that part in addIndexes was never really throttled properly... I think it should just run as fast as possible. Either this, or we should make it comply with mergescheduler's throttling strategy (although this would require creating onemerge, which is odd). > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch, LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894216#comment-15894216 ] Michael McCandless commented on LUCENE-7700: Nice job fixing a few ancient typos :) Looks like javadocs for the private {{MergeRateLimiter.maybePause}} method are stale? Why are we creating {{MergeRateLimiter}} on init of MergeThread and then again in {{CMS.wrapForMerge}}? Seems like we could cast the current thread to {{MergeThread}} and get its already-created instance? Why not {{updateIOThrottle}} in the main CMS thread, not the merge thread? Else, I think we have an added delay, from when a backlog'd merge shows up, to when the already running merge threads kick up their IO throttle? Maybe add a comment to {{OneMergeProgress.owner}} and {{.setMergeThread}} that it's only used for catching misuse? Can we rename {{OneMergeProgress.pauseTimes}} -> {{pauseTimesNanos}} or NS? You can just remove the //private final Directory mergeDirectory from IW. Hmm it looks like CFS building is still unthrottled? > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch, LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884313#comment-15884313 ] Dawid Weiss commented on LUCENE-7700: - No rush. I've corrected a few javadocs (github branch) and the tests and precommit pass for me. > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch, LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883563#comment-15883563 ] Michael McCandless commented on LUCENE-7700: Thanks [~dawid.weiss]; I'll have a look... > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch, LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875907#comment-15875907 ] Dawid Weiss commented on LUCENE-7700: - Mike, even in current master the merge in addIndexes isn't going through mergeDirectory, but through the original directory, suppressing any bandwidth control? {code} // TODO: somehow we should fix this merge so it's // abortable so that IW.close(false) is able to stop it TrackingDirectoryWrapper trackingDir = new TrackingDirectoryWrapper(directory); {code} So it'd make sense to fix both of these, right? > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875888#comment-15875888 ] Dawid Weiss commented on LUCENE-7700: - bq. One difference with your patch is we would now wrap the Directory for merge on every merge, instead of once up front, but that's fine (the cost is tiny vs. cost of the merge) I admit I do have a very specific scenario at hand and you're infinitely more experienced with merging, so if this is a problem we can always change it! The "get-directory-wrapped-for-merging" part is a bit clumsy, but I didn't figure out how to do it better. bq. And it's nice that we can remove IW's ThreadLocal tracking the rate limiters. I think so too. bq. I do think this it's important that the IO throttling applies when building the CFS file? For a large merge, this is a big burst of IO in the end That part I didn't look to closely at, I agree. It should definitely be consistent with the rest of the throughput-control code, but there's no OneMerge instance there to work with... I'll take another look, maybe I'll come up with something. > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875883#comment-15875883 ] Michael McCandless commented on LUCENE-7700: Thank you [~dawid.weiss] for giving this some attention ... this intertwining is horribly messy today! Your patch is a nice step forward. One difference with your patch is we would now wrap the {{Directory}} for merge on every merge, instead of once up front, but that's fine (the cost is tiny vs. cost of the merge), and, possibly powerful, since each merge can now decide what to do about IO throttling / Directory wrapping. And it's nice that we can remove IW's ThreadLocal tracking the rate limiters. bq. // TODO: no throughput control after changes; should we comply with merge scheduler/ policy here? I do think this it's important that the IO throttling applies when building the CFS file? For a large merge, this is a big burst of IO in the end ... too bad we can't use an API like Linux's {{splice}} to efficiently copy bytes (though we'd likely still want throttling there too anyway...). > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
[ https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875838#comment-15875838 ] Dawid Weiss commented on LUCENE-7700: - I created a fork here with up-to-date patch here: https://github.com/dweiss/lucene-solr/tree/LUCENE-7700 Overview: https://github.com/apache/lucene-solr/compare/master...dweiss:LUCENE-7700?expand=1 > Move throughput control and merge aborting out of IndexWriter's core? > - > > Key: LUCENE-7700 > URL: https://issues.apache.org/jira/browse/LUCENE-7700 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-7700.patch > > > Here is a bit of a background: > - I wanted to implement a custom merging strategy that would have a custom > i/o flow control (global), > - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, > OneMerge, IndexWriter. > Looking at the code it seems to me that everything with respect to I/O > control could be nicely pulled out into classes that explicitly control the > merging process, that is only MergePolicy and MergeScheduler. By default, one > could even run without any additional I/O accounting overhead (which is > currently in there, even if one doesn't use the CMS's throughput control). > Such refactoring would also give a chance to nicely move things where they > belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter > lifecycle bound to OneMerge (MergeScheduler could then use per-merge or > global accounting, as it pleases). > Just a thought and some initial refactorings for discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org