[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-03-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904776#comment-15904776
 ] 

ASF subversion and git services commented on LUCENE-7700:
-

Commit 8c5ea32bb9f2d9d8af98162e1e19c9559c8c602d in lucene-solr's branch 
refs/heads/branch_6x from [~dawid.weiss]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8c5ea32 ]

LUCENE-7700: Move throughput control and merge aborting out of IndexWriter's 
core.


> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 6.x, master (7.0), 6.5
>
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch, 
> LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly coupled with a few classes -- 
> MergeRateLimiter, OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-03-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904774#comment-15904774
 ] 

ASF subversion and git services commented on LUCENE-7700:
-

Commit 9540bc37583dfd4e995b893154039fcf031dc3c3 in lucene-solr's branch 
refs/heads/master from [~dawid.weiss]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9540bc3 ]

LUCENE-7700: Move throughput control and merge aborting out of IndexWriter's 
core.


> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 6.x, master (7.0), 6.5
>
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch, 
> LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly coupled with a few classes -- 
> MergeRateLimiter, OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-03-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903178#comment-15903178
 ] 

Michael McCandless commented on LUCENE-7700:


I think it's fine to backport to 6.x.

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-03-09 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902900#comment-15902900
 ] 

Dawid Weiss commented on LUCENE-7700:
-

Thanks for reviewing, Mike. I'm guessing this should go to 7.x only, right? Or 
can we backport it to 6.x as well, risking some minor API incompatibilities 
(these are internal APIs anyway, but who knows).

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-03-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902896#comment-15902896
 ] 

Michael McCandless commented on LUCENE-7700:


Thanks [~dweiss] the new patch looks great!  +1 to push, and thank you for 
cleaning this up!

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-03-08 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901248#comment-15901248
 ] 

Dawid Weiss commented on LUCENE-7700:
-

Ok, that was a trivial regression:
{code}
--- a/lucene/core/src/java/org/apache/lucene/index/MergePolicy.java
+++ b/lucene/core/src/java/org/apache/lucene/index/MergePolicy.java
@@ -177,7 +177,7 @@ public abstract class MergePolicy {
 }

 final void setMergeThread(Thread owner) {
-  assert owner == null;
+  assert this.owner == null;
   this.owner = owner;
 }
   }
{code}

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-03-08 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901244#comment-15901244
 ] 

Dawid Weiss commented on LUCENE-7700:
-

I screwed up something in the latest patch because I'm getting assertion 
errors, will fix.

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-03-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896183#comment-15896183
 ] 

Michael McCandless commented on LUCENE-7700:


bq. But the CFS thing – I don't think I changed anything there; 

Aha!  You are correct!  I was mis-reading the patch; I didn't realize the CFS 
change was just for {{addIndexes}}, but you're right.  Building CFS for a 
merged segment is in fact going through the wrapped Directory, so it can be 
throttled ... good.  I agree we should not change {{addIndexes}} behavior here.

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-03-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894758#comment-15894758
 ] 

Dawid Weiss commented on LUCENE-7700:
-

I'm on short holidays, Mike. Will reply on Tuesday in full. But the CFS thing 
-- I don't think I changed anything there; that part in addIndexes was never 
really throttled properly... I think it should just run as fast as possible. 
Either this, or we should make it comply with mergescheduler's throttling 
strategy (although this would require creating onemerge, which is odd).

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-03-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894216#comment-15894216
 ] 

Michael McCandless commented on LUCENE-7700:


Nice job fixing a few ancient typos :)

Looks like javadocs for the private {{MergeRateLimiter.maybePause}} method are 
stale?

Why are we creating {{MergeRateLimiter}} on init of MergeThread and then again 
in {{CMS.wrapForMerge}}?  Seems like we could cast the current thread to 
{{MergeThread}} and get its already-created instance?

Why not {{updateIOThrottle}} in the main CMS thread, not the merge thread?  
Else, I think we have an added delay, from when a backlog'd merge shows up, to 
when the already running merge threads kick up their IO throttle?

Maybe add a comment to {{OneMergeProgress.owner}} and {{.setMergeThread}} that 
it's only used for catching misuse?

Can we rename {{OneMergeProgress.pauseTimes}} -> {{pauseTimesNanos}} or NS?

You can just remove the //private final Directory mergeDirectory from IW.

Hmm it looks like CFS building is still unthrottled?


> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-02-25 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884313#comment-15884313
 ] 

Dawid Weiss commented on LUCENE-7700:
-

No rush. I've corrected a few javadocs (github branch) and the tests and 
precommit pass for me.

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-02-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883563#comment-15883563
 ] 

Michael McCandless commented on LUCENE-7700:


Thanks [~dawid.weiss]; I'll have a look...

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-02-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875907#comment-15875907
 ] 

Dawid Weiss commented on LUCENE-7700:
-

Mike, even in current master the merge in addIndexes isn't going through 
mergeDirectory, but through the original directory, suppressing any bandwidth 
control?
{code}
  // TODO: somehow we should fix this merge so it's
  // abortable so that IW.close(false) is able to stop it
  TrackingDirectoryWrapper trackingDir = new 
TrackingDirectoryWrapper(directory);
{code}
So it'd make sense to fix both of these, right?

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-02-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875888#comment-15875888
 ] 

Dawid Weiss commented on LUCENE-7700:
-

bq. One difference with your patch is we would now wrap the Directory for merge 
on every merge, instead of once up front, but that's fine (the cost is tiny vs. 
cost of the merge)

I admit I do have a very specific scenario at hand and you're infinitely more 
experienced with merging, so if this is a problem we can always change it! The 
"get-directory-wrapped-for-merging" part is a bit clumsy, but I didn't figure 
out how to do it better.

bq. And it's nice that we can remove IW's ThreadLocal tracking the rate 
limiters.

I think so too.

bq. I do think this it's important that the IO throttling applies when building 
the CFS file? For a large merge, this is a big burst of IO in the end

That part I didn't look to closely at, I agree. It should definitely be 
consistent with the rest of the throughput-control code, but there's no 
OneMerge instance there to work with... I'll take another look, maybe I'll come 
up with something.

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-02-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875883#comment-15875883
 ] 

Michael McCandless commented on LUCENE-7700:


Thank you [~dawid.weiss] for giving this some attention ... this intertwining 
is horribly messy today!  Your patch is a nice step forward.

One difference with your patch is we would now wrap the {{Directory}} for merge 
on every merge, instead of once up front, but that's fine (the cost is tiny vs. 
cost of the merge), and, possibly powerful, since each merge can now decide 
what to do about IO throttling / Directory wrapping.

And it's nice that we can remove IW's ThreadLocal tracking the rate limiters.

bq. // TODO: no throughput control after changes; should we comply with merge 
scheduler/ policy here?

I do think this it's important that the IO throttling applies when building the 
CFS file?  For a large merge, this is a big burst of IO in the end ... too bad 
we can't use an API like Linux's {{splice}} to efficiently copy bytes (though 
we'd likely still want throttling there too anyway...).


> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?

2017-02-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875838#comment-15875838
 ] 

Dawid Weiss commented on LUCENE-7700:
-

I created a fork here with up-to-date patch here:
https://github.com/dweiss/lucene-solr/tree/LUCENE-7700

Overview:
https://github.com/apache/lucene-solr/compare/master...dweiss:LUCENE-7700?expand=1

> Move throughput control and merge aborting out of IndexWriter's core?
> -
>
> Key: LUCENE-7700
> URL: https://issues.apache.org/jira/browse/LUCENE-7700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org