[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506172#comment-16506172 ] ASF subversion and git services commented on SOLR-12338: Commit d1dbef5e4d1a1b2bfac75a59496f86d6edbbc16f in lucene-solr's branch refs/heads/branch_7_4 from [~ctargett] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d1dbef5 ] SOLR-12338: State default value more directly > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506166#comment-16506166 ] ASF subversion and git services commented on SOLR-12338: Commit eb7bb2d90654ec15d25ba947e287bf7d96e07900 in lucene-solr's branch refs/heads/master from [~ctargett] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb7bb2d ] SOLR-12338: State default value more directly > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506171#comment-16506171 ] ASF subversion and git services commented on SOLR-12338: Commit 13cad54a3efb179fdb4da7528d3448b03989c75e in lucene-solr's branch refs/heads/branch_7x from [~ctargett] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=13cad54 ] SOLR-12338: State default value more directly > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494652#comment-16494652 ] ASF subversion and git services commented on SOLR-12338: Commit 04e1b19743e330ce66d199c4dc40bbf394be9ed7 in lucene-solr's branch refs/heads/branch_7x from [~caomanhdat] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=04e1b19 ] SOLR-12338: Replay buffering tlog in parallel > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494650#comment-16494650 ] ASF subversion and git services commented on SOLR-12338: Commit 6084da559c5466551af68c114b7310356c989dec in lucene-solr's branch refs/heads/master from [~caomanhdat] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6084da5 ] SOLR-12338: Replay buffering tlog in parallel > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494631#comment-16494631 ] Cao Manh Dat commented on SOLR-12338: - Thank [~dsmiley] ! > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494618#comment-16494618 ] David Smiley commented on SOLR-12338: - My intention with changing the loop is to reduce duplication of the putIfAbsent line. But it's not a big deal as it's one line and not long. Overall, looks good now. Only one small nitpick: {quote}@param lockId of the \{@code command}, if null then a random hash will be generated {quote} The "random hash" part is no longer accurate. +1 commit at will. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494515#comment-16494515 ] Cao Manh Dat commented on SOLR-12338: - Attached a new patch for this ticket. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494504#comment-16494504 ] Cao Manh Dat commented on SOLR-12338: - {quote}But then I wonder if we can move the sizeSemaphore.release to before the countDown? The principle at play here is to release locks in the reverse order that they were acquired. That's how it's normally done to, I think, prevent deadlock cases, though I'm not sure it's possible here as-coded. {quote} The order or unlocking here does not matter. To call {{remove}} a thread must hold both locks. Therefore won't cause deadlock. {quote}ah; I think I can see why we acquire the size semaphore after getting the striped lock. we don't want to use up a permit that might lock on an ID first. {quote} Correct, multiple threads on a lockId can eat up size semaphore. Thanks [~dsmiley], the replacement of using CountDownlatch and javadocs is good. But I don't see any improvement of using a new loop? > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494172#comment-16494172 ] David Smiley commented on SOLR-12338: - ah; I think I can see why we acquire the size semaphore after getting the striped lock. we don't want to use up a permit that might lock on an ID first. But then I wonder if we can move the sizeSemaphore.release to before the countDown? The principle at play here is to release locks in the reverse order that they were acquired. That's how it's normally done to, I think, prevent deadlock cases, though I'm not sure it's possible here as-coded. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494162#comment-16494162 ] David Smiley commented on SOLR-12338: - Please consider this alternative utility class: {code:java} /** A set of locks by a key {@code T}, kind of like Google Striped but the keys are sparse/lazy. */ private static class SparseStripedLock { private final Semaphore sizeSemaphore; private ConcurrentHashMap map = new ConcurrentHashMap<>(); SparseStripedLock(int maxSize) { this.sizeSemaphore = new Semaphore(maxSize); } void add(T t) throws InterruptedException { if (t != null) { CountDownLatch myLock = new CountDownLatch(1); while (true) { CountDownLatch existingLock = map.putIfAbsent(t, myLock); // returns null if no existing if (existingLock == null) { break;// myLock was successfully inserted (and is pre-locked) already locked, was successfully inserted } existingLock.await();// wait for existing lock/permit to become available (see remove() below) // we will most likely exit in next loop, though if contended then possibly not } } // won the lock sizeSemaphore.acquire(); //nocommit do at start of add()? } void remove(T t) { if (t != null) { map.remove(t).countDown(); // remove and signal to any "await"-ers } sizeSemaphore.release(); } } {code} Notice the comments, the loop, the new name, use of CountDownLatch, and the one nocommit/question. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493180#comment-16493180 ] Cao Manh Dat commented on SOLR-12338: - The latest patch for this ticket. Including some cleanup and fixed precommit. If there are no objection, I will commit the patch soon. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492749#comment-16492749 ] Cao Manh Dat commented on SOLR-12338: - Attached a patch base on [~dsmiley]'s review. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, > SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492538#comment-16492538 ] Cao Manh Dat commented on SOLR-12338: - {quote} The hot while loop of map.putIfAbsent seems fishy to me. Even if it may be rare in practice, I wonder if we can do something simpler? You may get luck with map.compute* methods on ConcurrentHashMap which execute the lambda atomically. Though I don't know if it's bad to block if we try to acquire a lock within there. I see remove() removes the value of the Map but perhaps it the value were a mechanism that tracked that there's a producer pending, then we should not remove the value from the lock? If we did this, then maybe that would simplify add()? I'm not sure. {quote} After putting more thought on this, Change the remove method to this one seems to solve the problem. {code} public void remove(T t) { // There can be many threads are waiting for this lock map.remove(t).release(Integer.MAX_VALUE); sizeLock.release(); } {code} In short of the idea of SetBlockingQueue.add(T t) is # all participations will try to call {{map.putIfAbsent(t, myLock)}}, # only one will win, other participations will have to wait for the lock of the winner # when the winner get removed from the set, it also release + remove its lock # back to 1. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492410#comment-16492410 ] Cao Manh Dat commented on SOLR-12338: - Thanks a lot for your review [~dsmiley], I was too busy recently. {quote} - I think the "hash" variable should not be called this to avoid confusion as there is no hashing. Maybe just "id" or "lockId" - Do we still need the Random stuff? - Maybe rename your "SetBlockingQueue" to "SetSemaphore" or probably better "SetLock" as it does not hold anything (Queues hold stuff) - Can "Semaphore sizeLock" be renamed to "sizeSemaphore" or "sizePermits" is it does not extend Lock? - Can the "closed" state be removed from SetBlockingQueue altogether? It's not clear it actually needs to be "closed". It seems wrong; other concurrent mechanisms don't have this notion (no Queue, Lock, or Semaphore does, etc.) FWIW I stripped this from the class and the test passed. {quote} +1 {quote} Perhaps its better to acquire() the size permit first in add() instead of last to prevent lots of producing threads inserting keys into a map only to eventually wait. Although it might add annoying try-finally to add() to ensure we put the permit back if there's an exception after (e.g. interrupt). Heck; maybe that's an issue no matter what the sequence is. {quote} I don't think we should do that. {{sizeLock}} kinda like the number of maximum threads, if we reached that number, it seems better to let them wait before trying to enqueue more tasks. {quote} Can the value side of the ConcurrentHashMap be a Lock (I guess ReentrantLock impl)? It seems like the most direct concept we want; Semaphore is more than a Lock as it tracks permits that we don't need here? {quote} We can't. Lock or ReetrantLock only allows us to lock and unlock in the same thread. In the OrderedExecutor, we lock first then unlock in the thread of delegate executor. {quote} The hot while loop of map.putIfAbsent seems fishy to me. Even if it may be rare in practice, I wonder if we can do something simpler? You may get luck with map.compute* methods on ConcurrentHashMap which execute the lambda atomically. Though I don't know if it's bad to block if we try to acquire a lock within there. I see remove() removes the value of the Map but perhaps it the value were a mechanism that tracked that there's a producer pending, then we should not remove the value from the lock? If we did this, then maybe that would simplify add()? I'm not sure. {quote} I will think more about this. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480089#comment-16480089 ] Yonik Seeley commented on SOLR-12338: - {quote}This is a very costly/risky logic to handle reordered updates {quote} Indeed. As an aside, my vote for the long term continues to be: "don't reorder updates between leader and replica" :) > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480078#comment-16480078 ] Cao Manh Dat commented on SOLR-12338: - [~ysee...@gmail.com] The need to order things come from how we currently handle reordered in-place updates. Currently, if a replica receives in-place update u2 which point to in-place update u1 which does not arrive yet, the replica will fetch the full document from the leader. This is a very costly/risky logic to handle reordered updates (ie: what if there are no leader to ask for the full document). Luckily for us that reorder is not a common case right now, but if we replay updates in a parallel and non-order way, above case can happen much more frequently. Therefore In my opinion, it should be avoided. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479675#comment-16479675 ] Yonik Seeley commented on SOLR-12338: - I haven't been following this issue, but the need to order things caught my eye, primarily because we have a bunch of logic already that handles reordered updates. I guess the issue is that buffered updates may not have a version (if they haven't been through a leader?) If that's the case, perhaps an easier path would be to assign a version and then let the existing reorder logic do it's thing. I don't have the full picture here, so it's just some input to consider. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479197#comment-16479197 ] David Smiley commented on SOLR-12338: - Maybe you can propose {{SetBlockingQueue}} (or whatever name we settle on) to Guava? Even if it's not accepted ultimately; there might be some great feedback and/or pointers to something similar that proves useful, as this stuff is hard so the more eyes the better. I like that you've avoided hash collisions altogether by not doing hashes! Use of ConcurrentHashMapmakes sense to me for such an approach. However it appears we have some complexity to deal with since keys need to be added and removed on demand, safely, which seems to be quite tricky. * I think the "hash" variable should not be called this to avoid confusion as there is no hashing. Maybe just "id" or "lockId" * Do we still need the Random stuff? * Maybe rename your "SetBlockingQueue" to "SetSemaphore" or probably better "SetLock" as it does not hold anything (Queues hold stuff) * Can "Semaphore sizeLock" be renamed to "sizeSemaphore" or "sizePermits" is it does not extend Lock? * Can the "closed" state be removed from SetBlockingQueue altogether? It's not clear it actually needs to be "closed". It seems wrong; other concurrent mechanisms don't have this notion (no Queue, Lock, or Semaphore does, etc.) FWIW I stripped this from the class and the test passed. * Perhaps its better to acquire() the size permit first in add() instead of last to prevent lots of producing threads inserting keys into a map only to eventually wait. Although it might add annoying try-finally to add() to ensure we put the permit back if there's an exception after (e.g. interrupt). Heck; maybe that's an issue no matter what the sequence is. * Can the value side of the ConcurrentHashMap be a Lock (I guess ReentrantLock impl)? It seems like the most direct concept we want; Semaphore is more than a Lock as it tracks permits that we don't need here? * The hot while loop of map.putIfAbsent seems fishy to me. Even if it may be rare in practice, I wonder if we can do something simpler? You may get luck with map.compute\* methods on ConcurrentHashMap which execute the lambda atomically. Though I don't know if it's bad to block if we try to acquire a lock within there. I see remove() removes the value of the Map but perhaps it the value were a mechanism that tracked that there's a producer pending, then we should not remove the value from the lock? If we did this, then maybe that would simplify add()? I'm not sure. Perhaps a simpler approach would involve involve a Set of weakly referenced objects, and thus we don't need to worry about removal. In such a design add() would need to return a reference to the member of the set, and that object would have a "release()" method when done. I'm not sure if in practice these might be GC'ed fast enough if they end up being usually very temporary? Shrug. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479131#comment-16479131 ] Cao Manh Dat commented on SOLR-12338: - bq. BTW I've twice gotten confused in this issue conversation when you referred to things I didn't know existed before because it was unclear if I simply didn't know about it or if you were adding/introducing some new mechanism. It would be helpful to me if you try to clarify that new things are new things, e.g. "(added in this patch)" or "added a new ..." or some-such. Yeah, sorry about that, I was just to lazy with the detail. bq. It's super tempting to simply use Striped as it's difficult to write & review concurrent control structures such as this. I have a bunch of pending commentary/review for your SetBlockingQueue but are you choosing to not use it because the numThreads * 1000 is too much internal memory/waste? I think current {{SetBlockingQueue}} is quite effective and compact. Can you mention some comments/reviews for {{SetBlockingQueue}}? > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479083#comment-16479083 ] David Smiley commented on SOLR-12338: - {quote}Upload a patch that makes a change from using an array of lock into a {{SetBlockingQueue}}. {quote} BTW I've twice gotten confused in this issue conversation when you referred to things I didn't know existed before because it was unclear if I simply didn't know about it or if you were adding/introducing some new mechanism. It would be helpful to me if you try to clarify that new things are new things, e.g. "(added in this patch)" or "added a new ..." or some-such. It's super tempting to simply use Striped as it's difficult to write & review concurrent control structures such as this. I have a bunch of pending commentary/review for your SetBlockingQueue but are you choosing to not use it because the numThreads * 1000 is too much internal memory/waste? > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478357#comment-16478357 ] Cao Manh Dat commented on SOLR-12338: - [~dsmiley] What do you think about the new patch? > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475965#comment-16475965 ] Cao Manh Dat commented on SOLR-12338: - Interesting result, when I change from {{SetBlockingQueue}} to guava Striped class (its implementation is like an array of lock). The performance is decreased (from 4341ms to 8227ms), if I increase the number of stripes (size of the lock array) to {{numThreads * 1000}}, they will eventually run in the same amount of time. It is a sign that collision does affect the performance! > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475814#comment-16475814 ] Cao Manh Dat commented on SOLR-12338: - Upload the patch that makes a change from using an array of lock into a {{SetBlockingQueue}}. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475423#comment-16475423 ] Cao Manh Dat commented on SOLR-12338: - [~dsmiley] an annoying problem with ExecutorService is that when the number of threads reaches {{maximumPoolSize}} caller we meet RejectedExecutionException instead of waiting for threads to be available (https://stackoverflow.com/questions/44541784/synchronousqueue-does-not-block-when-offered-task-by-threadpoolexecutor). The easy solution then is using {{https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.CallerRunsPolicy.html}}. In current {{OrderedExecutor}} we won't experience that problem, the caller in that case will just wait. But you are right about the collision may affect the performance! > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474832#comment-16474832 ] David Smiley commented on SOLR-12338: - I looked at this again (after a few days of vacation) and I withdraw my concern that there's a bug. The use of ArrayBlockingQueue(1) is acting as a sort of Lock in the same way I suggested to use a Lock. Couldn't you simply replace it with a Lock? The put() becomes a lock(), and the poll() becomes an unlock(); see what I mean?. I think this is clearer since it's a simpler mechanism than an ArrayBlockingQueue, and the use of ABQ in this specific way (size 1) could lend itself to misuse later if someone thinks increasing its size or type gains us parallelism. And I don't think the fairness setting matters here. And although you initialized the size of this array of ABQ to be the number of threads, I think we ought to use a larger array to prevent collisions (prevent needlessly blocking on different docIDs that hash to the same thread). I also was thinking of a way to have more "on-deck" runnables for a given docID, waiting in-line. The Runnable we submit to the delegate could be some inner class OrderedRunnable that has a "next" pointer to the next OrderedRunnable. We could maintain a parallel array of the top OrderedRunnable (parallel to an array of Locks). Manipulating the OrderedRunnable chain requires holding the lock. To ensure we bound these things waiting in-line, we could use one Semaphore for the whole OrderedExecutor instance. There's more to it than this. Of course this adds complexity, but the current approach (either ABQ or Lock) can unfortunately block needlessly if the doc ID is locked yet soon more/different dock IDs will be submitted next and there are available threads. Perhaps this is overthinking it (over optimization / complexity) as this will not be the common case? This would be even more needless if we increase the Lock array to prevent collisions so nevermind I guess. {quote}(RE Submit without ID) This can help us to know how many threads are running (pending). Therefore OrderedExecutor does not execute more than \{{numThreads }}in parallel. It also solves the case when ExecutorService's queue is full it will throw RejectedExecutionException. {quote} Isn't this up to how the backing delegate is configured? If it's using a fixed thread pool, then there won't be more threads running. Likewise for RejectedExecutionException. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471502#comment-16471502 ] Mark Miller commented on SOLR-12338: bq. Yeah, but we do not need that flag for the case of LogReplayer, right? Because we are calling execute method in single-thread. Technically that sounds right, but I'm not sure I read the contract explicitly promises that. If we have good testing, it's not much of a concern. bq. OrderedExecutor ensuring that tasks are kicked off in order for a same id. Yeah, task1 get taken off the queue only after it finishes. Yeah, so I don't think I spot an open issue for a race. bq. I think we will throttle the incoming updates properly by doing SOLR-12305. Ah right, had been looking at that issue recently too and had it on my mind. That is more where that comment belongs. I was thinking these queues would work with documents coming in and getting buffered, but they won't get held up from dropping off the document to the tlog. But anyway, I think that natural throttling is a good first step. I think at the end of the day, we will want to end up with a Filter though that can do QOS and intelligent throttling based on data, but I'm pro whatever gets us out of infinite tlog replay soonest short term. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471495#comment-16471495 ] Cao Manh Dat commented on SOLR-12338: - {quote} Given that some machines these days have dozens of cores and you might have many SolrCores recovering, we may want to cap the number of threads at some number or make it configurable or something. {quote} {{replayUpdatesExecutor }] is shared through all the SolrCores, therefore how many SolrCores are recovering won't affect the max number of threads will be used. Although, make it configurable is a good idea. {quote} Yeah, you need that to ensure FIFO. {quote} Yeah, but we do not need that flag for the case of LogReplayer, right? Because we are calling execute method in single-thread. {quote} I think what David is getting at is that you are ensuring that tasks are kicked off in order, but once they are kicked off, you can't guarantee order. So task1 gets taken off the queue, then task 2 is taken, now task 2 gets executed first when task 1 has it's thread unluckily scheduled by the OS. At least that's how I read it. But that is not an issue right? Because you don't run an item from the queue until the one in front of it is fully run right? {quote} OrderedExecutor ensuring that tasks are kicked off in order *for a same id*. Yeah, task1 get taken off the queue only after it finishes. {quote} I like how this gives us some control to throttle, I wonder how efficient it is as documents keep thundering in though - do we gobble up threads and connections waiting? That is where it's a bummer it's hard to limit those resources. What are you going to do though? Those requests have to wait somewhere or we have to start dropping them - and hopefully with NIO2 it's somewhat efficient to wait on IO. {quote} I think we will throttle the incoming updates properly by doing SOLR-12305. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471481#comment-16471481 ] Mark Miller commented on SOLR-12338: {noformat} + private OrderedExecutor replayUpdatesExecutor = new OrderedExecutor( + Runtime.getRuntime().availableProcessors(), + ExecutorUtil.newMDCAwareCachedThreadPool( + Runtime.getRuntime().availableProcessors(), + new DefaultSolrThreadFactory("replayUpdatesExecutor"))); {noformat} Given that some machines these days have dozens of cores and you might have many SolrCores recovering, we may want to cap the number of threads at some number or make it configurable or something. bq. This seems can be solve by set the fair flag of ArrayBlockingQueue to true Yeah, you need that to ensure FIFO. I like how this gives us some control to throttle, I wonder how efficient it is as documents keep thundering in though - do we gobble up threads and connections waiting? That is where it's a bummer it's hard to limit those resources. What are you going to do though? Those requests have to wait somewhere or we have to start dropping them - and hopefully with NIO2 it's somewhat efficient to wait on IO. I think what David is getting at is that you are ensuring that tasks are kicked off in order, but once they are kicked off, you can't guarantee order. So task1 gets taken off the queue, then task 2 is taken, now task 2 gets executed first when task 1 has it's thread unluckily scheduled by the OS. At least that's how I read it. But that is not an issue right? Because you don't run an item from the queue until the one in front of it is fully run right? > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471392#comment-16471392 ] Cao Manh Dat commented on SOLR-12338: - {quote}I have doubts on the use of a new ArrayBlockingQueue<>(1) per doc ID hash bucket. What if the client adds a Runnable for doc1, then immediately adds another Runnable for doc1. You're intending for the second runnable to block until the first completes to achieve the per-doc ID serialization. But this may not happen; a thread may start on the first runnable (which frees up the second runnable to be submitted), then the thread doesn't get CPU time, and then the other Runnable zooms ahead out-of-order. See what I mean? {quote} It is per threads (which is small), not per bucket. If I understand correctly, what you mean here is two threads waiting for a lock to be released, the one who come late win the lock. This seems can be solve by set the fair flag of {{ArrayBlockingQueue}} to true, right? {quote} Also if you submit without an ID, then it should probably proceed right to the delegate Executor. Why does it pick an ID at random? {quote} This can help us to know how many threads are running (pending). Therefore OrderedExecutor does not execute more than {{numThreads }}in parallel. It also solves the case when ExecutorService's queue is full it will throw RejectedExecutionException. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470528#comment-16470528 ] David Smiley commented on SOLR-12338: - Also if you submit without an ID, then it should probably proceed right to the delegate Executor. Why does it pick an ID at random? > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470518#comment-16470518 ] David Smiley commented on SOLR-12338: - This OrderedExecutor thing is nifty. It needs class-level documentation. I have doubts on the use of a {{new ArrayBlockingQueue<>(1)}} per doc ID hash bucket. What if the client adds a Runnable for doc1, then immediately adds another Runnable for doc1. You're intending for the second runnable to block until the first completes to achieve the per-doc ID serialization. But this may not happen; a thread may start on the first runnable (which frees up the second runnable to be submitted), then the thread doesn't get CPU time, and then the other Runnable zooms ahead out-of-order. See what I mean? Instead of creating a {{new ArrayBlockingQueue<>(1)}} per doc ID hash bucket, lets create an array of Locks. When execute() is called, it immediately grabs the lock, potentially blocking. Then you can submit the provided Runnable with a wrapping Runnable that unlocks when done. This can be made simpler via using {{FutureTask}} subclass to override {{done()}}. To be safe, catch a RejectedExecutionException from execute() to cancel the futuretask. With this scheme, you might initialize the doc ID hash bucket array size to be larg-ish at 32, even if there are fewer threads (less accidental hash collision contention). A Lock is light-weight. The test uses System.currentTimeMillis() but should probably use nanos which the JVM guarantees to be sequential? > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470110#comment-16470110 ] Cao Manh Dat commented on SOLR-12338: - Attached a patch for this ticket, here are some notes: - Thanks to {{OrderedExecutor}}, all updates belong to same docId, it will be executed sequentially. Updates belong to different docId, will be executed in parallel. - The patch adds a new test in TestRecovery, which ensure that even updates are executed in parallel we will have the same index as before. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch, SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470098#comment-16470098 ] Cao Manh Dat commented on SOLR-12338: - Hi [~ichattopadhyaya] yeah I think this is a good idea to do that. It may not solve the case when a {{dbq1}} already re-ordered to ahead of {{add2}} or {{add3}}. But It won't make things worse than today. > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
[ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469869#comment-16469869 ] Ishan Chattopadhyaya commented on SOLR-12338: - There are some situations where if in-place updates and DBQs are re-ordered, then the entire document needs to be fetched from the leader. This is fine when we have an active leader, but in case of tlog replay, we would need to apply those updates in the same order. I think if DBQs are executed in the right order (i.e. all updates before a DBQ was updated before the DBQ, and all updates after the DBQ are executed after the DBQ), then we can run the other updates in parallel. Example: {code:java} add1 add2 add3 dbq1 add4 add5 add6 .. add20 dbq2 {code} Here, add# are either full document updates or in-place updates. I suggest: we run updates add1-add3 in parallel, and then wait till they are done before executing db1, and then add4-add20 parallely and then wait and execute dbq2. This should be fine, I think. (CC [~hossman], wdyt?) > Replay buffering tlog in parallel > - > > Key: SOLR-12338 > URL: https://issues.apache.org/jira/browse/SOLR-12338 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Attachments: SOLR-12338.patch > > > Since updates with different id are independent, therefore it is safe to > replay them in parallel. This will significantly reduce recovering time of > replicas in high load indexing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org