[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-06-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506172#comment-16506172
 ] 

ASF subversion and git services commented on SOLR-12338:


Commit d1dbef5e4d1a1b2bfac75a59496f86d6edbbc16f in lucene-solr's branch 
refs/heads/branch_7_4 from [~ctargett]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d1dbef5 ]

SOLR-12338: State default value more directly


> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-06-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506166#comment-16506166
 ] 

ASF subversion and git services commented on SOLR-12338:


Commit eb7bb2d90654ec15d25ba947e287bf7d96e07900 in lucene-solr's branch 
refs/heads/master from [~ctargett]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb7bb2d ]

SOLR-12338: State default value more directly


> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-06-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506171#comment-16506171
 ] 

ASF subversion and git services commented on SOLR-12338:


Commit 13cad54a3efb179fdb4da7528d3448b03989c75e in lucene-solr's branch 
refs/heads/branch_7x from [~ctargett]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=13cad54 ]

SOLR-12338: State default value more directly


> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494652#comment-16494652
 ] 

ASF subversion and git services commented on SOLR-12338:


Commit 04e1b19743e330ce66d199c4dc40bbf394be9ed7 in lucene-solr's branch 
refs/heads/branch_7x from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=04e1b19 ]

SOLR-12338: Replay buffering tlog in parallel


> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494650#comment-16494650
 ] 

ASF subversion and git services commented on SOLR-12338:


Commit 6084da559c5466551af68c114b7310356c989dec in lucene-solr's branch 
refs/heads/master from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6084da5 ]

SOLR-12338: Replay buffering tlog in parallel


> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-29 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494631#comment-16494631
 ] 

Cao Manh Dat commented on SOLR-12338:
-

Thank [~dsmiley] !

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-29 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494618#comment-16494618
 ] 

David Smiley commented on SOLR-12338:
-

My intention with changing the loop is to reduce duplication of the putIfAbsent 
line. But it's not a big deal as it's one line and not long.
 Overall, looks good now. Only one small nitpick:
{quote}@param lockId of the \{@code command}, if null then a random hash will 
be generated
{quote}
The "random hash" part is no longer accurate.

+1 commit at will.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-29 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494515#comment-16494515
 ] 

Cao Manh Dat commented on SOLR-12338:
-

Attached a new patch for this ticket.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-29 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494504#comment-16494504
 ] 

Cao Manh Dat commented on SOLR-12338:
-

{quote}But then I wonder if we can move the sizeSemaphore.release to before the 
countDown? The principle at play here is to release locks in the reverse order 
that they were acquired. That's how it's normally done to, I think, prevent 
deadlock cases, though I'm not sure it's possible here as-coded.
{quote}
The order or unlocking here does not matter. To call {{remove}} a thread must 
hold both locks. Therefore won't cause deadlock.
{quote}ah; I think I can see why we acquire the size semaphore after getting 
the striped lock. we don't want to use up a permit that might lock on an ID 
first. 
{quote}
Correct, multiple threads on a lockId can eat up size semaphore.

Thanks [~dsmiley], the replacement of using CountDownlatch and javadocs is 
good. But I don't see any improvement of using a new loop? 

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-29 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494172#comment-16494172
 ] 

David Smiley commented on SOLR-12338:
-

ah; I think I can see why we acquire the size semaphore after getting the 
striped lock.  we don't want to use up a permit that might lock on an ID first. 
 But then I wonder if we can move the sizeSemaphore.release to before the 
countDown?  The principle at play here is to release locks in the reverse order 
that they were acquired.  That's how it's normally done to, I think, prevent 
deadlock cases, though I'm not sure it's possible here as-coded.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-29 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494162#comment-16494162
 ] 

David Smiley commented on SOLR-12338:
-

Please consider this alternative utility class:

{code:java}
/** A set of locks by a key {@code T}, kind of like Google Striped but the keys 
are sparse/lazy. */
private static class SparseStripedLock {
  private final Semaphore sizeSemaphore;
  private ConcurrentHashMap map = new ConcurrentHashMap<>();

  SparseStripedLock(int maxSize) {
this.sizeSemaphore = new Semaphore(maxSize);
  }

  void add(T t) throws InterruptedException {
if (t != null) {
  CountDownLatch myLock = new CountDownLatch(1);
  while (true) {
CountDownLatch existingLock = map.putIfAbsent(t, myLock); // returns 
null if no existing
if (existingLock == null) {
  break;// myLock was successfully inserted (and is pre-locked) already 
locked, was successfully inserted
}
existingLock.await();// wait for existing lock/permit to become 
available (see remove() below)
// we will most likely exit in next loop, though if contended then 
possibly not
  }
}

// won the lock
sizeSemaphore.acquire();  //nocommit do at start of add()?
  }

  void remove(T t) {
if (t != null) {
  map.remove(t).countDown(); // remove and signal to any "await"-ers
}

sizeSemaphore.release();
  }
}
{code}

Notice the comments, the loop, the new name, use of CountDownLatch, and the one 
nocommit/question.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-29 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493180#comment-16493180
 ] 

Cao Manh Dat commented on SOLR-12338:
-

The latest patch for this ticket. Including some cleanup and fixed precommit.
If there are no objection, I will commit the patch soon.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-28 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492749#comment-16492749
 ] 

Cao Manh Dat commented on SOLR-12338:
-

Attached a patch base on [~dsmiley]'s review.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch, 
> SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-28 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492538#comment-16492538
 ] 

Cao Manh Dat commented on SOLR-12338:
-

{quote}
The hot while loop of map.putIfAbsent seems fishy to me. Even if it may be rare 
in practice, I wonder if we can do something simpler? You may get luck with 
map.compute* methods on ConcurrentHashMap which execute the lambda atomically. 
Though I don't know if it's bad to block if we try to acquire a lock within 
there. I see remove() removes the value of the Map but perhaps it the value 
were a mechanism that tracked that there's a producer pending, then we should 
not remove the value from the lock? If we did this, then maybe that would 
simplify add()? I'm not sure.
{quote}
After putting more thought on this, Change the remove method to this one seems 
to solve the problem.
{code}
public void remove(T t) {
  // There can be many threads are waiting for this lock
  map.remove(t).release(Integer.MAX_VALUE);
  sizeLock.release();
}
{code}
In short of the idea of SetBlockingQueue.add(T t) is 
# all participations will try to call {{map.putIfAbsent(t, myLock)}}, 
# only one will win, other participations will have to wait for the lock of the 
winner
# when the winner get removed from the set, it also release + remove its lock
# back to 1.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-28 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492410#comment-16492410
 ] 

Cao Manh Dat commented on SOLR-12338:
-

Thanks a lot for your review [~dsmiley], I was too busy recently.
{quote}
- I think the "hash" variable should not be called this to avoid confusion as 
there is no hashing. Maybe just "id" or "lockId"
- Do we still need the Random stuff?
- Maybe rename your "SetBlockingQueue" to "SetSemaphore" or probably better 
"SetLock" as it does not hold anything (Queues hold stuff)
- Can "Semaphore sizeLock" be renamed to "sizeSemaphore" or "sizePermits" is it 
does not extend Lock?
- Can the "closed" state be removed from SetBlockingQueue altogether? It's not 
clear it actually needs to be "closed". It seems wrong; other concurrent 
mechanisms don't have this notion (no Queue, Lock, or Semaphore does, etc.) 
FWIW I stripped this from the class and the test passed.
{quote}
+1

{quote}
Perhaps its better to acquire() the size permit first in add() instead of last 
to prevent lots of producing threads inserting keys into a map only to 
eventually wait. Although it might add annoying try-finally to add() to ensure 
we put the permit back if there's an exception after (e.g. interrupt). Heck; 
maybe that's an issue no matter what the sequence is.
{quote}
I don't think we should do that. {{sizeLock}} kinda like the number of maximum 
threads, if we reached that number, it seems better to let them wait before 
trying to enqueue more tasks.

{quote}
Can the value side of the ConcurrentHashMap be a Lock (I guess ReentrantLock 
impl)? It seems like the most direct concept we want; Semaphore is more than a 
Lock as it tracks permits that we don't need here?
{quote}
We can't. Lock or ReetrantLock only allows us to lock and unlock in the same 
thread. In the OrderedExecutor, we lock first then unlock in the thread of 
delegate executor.

{quote}
The hot while loop of map.putIfAbsent seems fishy to me. Even if it may be rare 
in practice, I wonder if we can do something simpler? You may get luck with 
map.compute* methods on ConcurrentHashMap which execute the lambda atomically. 
Though I don't know if it's bad to block if we try to acquire a lock within 
there. I see remove() removes the value of the Map but perhaps it the value 
were a mechanism that tracked that there's a producer pending, then we should 
not remove the value from the lock? If we did this, then maybe that would 
simplify add()? I'm not sure.
{quote}
I will think more about this.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480089#comment-16480089
 ] 

Yonik Seeley commented on SOLR-12338:
-

{quote}This is a very costly/risky logic to handle reordered updates
{quote}
Indeed.  As an aside, my vote for the long term continues to be: "don't reorder 
updates between leader and replica" :)

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-17 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480078#comment-16480078
 ] 

Cao Manh Dat commented on SOLR-12338:
-

[~ysee...@gmail.com] The need to order things come from how we currently handle 
reordered in-place updates. Currently, if a replica receives in-place update u2 
which point to in-place update u1 which does not arrive yet, the replica will 
fetch the full document from the leader. This is a very costly/risky logic to 
handle reordered updates (ie: what if there are no leader to ask for the full 
document). Luckily for us that reorder is not a common case right now, but if 
we replay updates in a parallel and non-order way, above case can happen much 
more frequently. Therefore In my opinion, it should be avoided. 

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479675#comment-16479675
 ] 

Yonik Seeley commented on SOLR-12338:
-

I haven't been following this issue, but the need to order things caught my 
eye, primarily because we have a bunch of logic already that handles reordered 
updates.  I guess the issue is that buffered updates may not have a version (if 
they haven't been through a leader?)  If that's the case, perhaps an easier 
path would be to assign a version and then let the existing reorder logic do 
it's thing.  I don't have the full picture here, so it's just some input to 
consider.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-17 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479197#comment-16479197
 ] 

David Smiley commented on SOLR-12338:
-

Maybe you can propose {{SetBlockingQueue}} (or whatever name we settle on) to 
Guava?  Even if it's not accepted ultimately; there might be some great 
feedback and/or pointers to something similar that proves useful, as this stuff 
is hard so the more eyes the better.

I like that you've avoided hash collisions altogether by not doing hashes!  Use 
of ConcurrentHashMap makes sense to me for such an approach.  
However it appears we have some complexity to deal with since keys need to be 
added and removed on demand, safely, which seems to be quite tricky.

* I think the "hash" variable should not be called this to avoid confusion as 
there is no hashing.  Maybe just "id" or "lockId"
* Do we still need the Random stuff?
* Maybe rename your "SetBlockingQueue" to "SetSemaphore" or probably better 
"SetLock" as it does not hold anything (Queues hold stuff)
* Can "Semaphore sizeLock" be renamed to "sizeSemaphore" or "sizePermits" is it 
does not extend Lock?
* Can the "closed" state be removed from SetBlockingQueue altogether?  It's not 
clear it actually needs to be "closed".  It seems wrong; other concurrent 
mechanisms don't have this notion (no Queue, Lock, or Semaphore does, etc.)  
FWIW I stripped this from the class and the test passed.
* Perhaps its better to acquire() the size permit first in add() instead of 
last to prevent lots of producing threads inserting keys into a map only to 
eventually wait.  Although it might add annoying try-finally to add() to ensure 
we put the permit back if there's an exception after (e.g. interrupt).  Heck; 
maybe that's an issue no matter what the sequence is.
* Can the value side of the ConcurrentHashMap be a Lock (I guess ReentrantLock 
impl)?  It seems like the most direct concept we want; Semaphore is more than a 
Lock as it tracks permits that we don't need here?
* The hot while loop of map.putIfAbsent seems fishy to me.  Even if it may be 
rare in practice, I wonder if we can do something simpler?  You may get luck 
with map.compute\* methods on ConcurrentHashMap which execute the lambda 
atomically.  Though I don't know if it's bad to block if we try to acquire a 
lock within there.  I see remove() removes the value of the Map but perhaps it 
the value were a mechanism that tracked that there's a producer pending, then 
we should not remove the value from the lock?  If we did this, then maybe that 
would simplify add()?  I'm not sure.

Perhaps a simpler approach would involve involve a Set of weakly referenced 
objects, and thus we don't need to worry about removal.  In such a design add() 
would need to return a reference to the member of the set, and that object 
would have a "release()" method when done.  I'm not sure if in practice these 
might be GC'ed fast enough if they end up being usually very temporary?  Shrug.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-17 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479131#comment-16479131
 ] 

Cao Manh Dat commented on SOLR-12338:
-

bq. BTW I've twice gotten confused in this issue conversation when you referred 
to things I didn't know existed before because it was unclear if I simply 
didn't know about it or if you were adding/introducing some new mechanism. It 
would be helpful to me if you try to clarify that new things are new things, 
e.g. "(added in this patch)" or "added a new ..." or some-such.
Yeah, sorry about that, I was just to lazy with the detail.

bq. It's super tempting to simply use Striped as it's difficult to write & 
review concurrent control structures such as this. I have a bunch of pending 
commentary/review for your SetBlockingQueue but are you choosing to not use it 
because the numThreads * 1000 is too much internal memory/waste?
I think current {{SetBlockingQueue}} is quite effective and compact. Can you 
mention some comments/reviews for {{SetBlockingQueue}}?


> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-17 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479083#comment-16479083
 ] 

David Smiley commented on SOLR-12338:
-

{quote}Upload a patch that makes a change from using an array of lock into a 
{{SetBlockingQueue}}.
{quote}
BTW I've twice gotten confused in this issue conversation when you referred to 
things I didn't know existed before because it was unclear if I simply didn't 
know about it or if you were adding/introducing some new mechanism.  It would 
be helpful to me if you try to clarify that new things are new things, e.g. 
"(added in this patch)" or "added a new ..." or some-such.

It's super tempting to simply use Striped as it's difficult to write & review 
concurrent control structures such as this.  I have a bunch of pending 
commentary/review for your SetBlockingQueue but are you choosing to not use it 
because the numThreads * 1000 is too much internal memory/waste?

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-16 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478357#comment-16478357
 ] 

Cao Manh Dat commented on SOLR-12338:
-

[~dsmiley] What do you think about the new patch?


> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-15 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475965#comment-16475965
 ] 

Cao Manh Dat commented on SOLR-12338:
-

Interesting result, when I change from {{SetBlockingQueue}} to guava Striped 
class (its implementation is like an array of lock). The performance is 
decreased (from 4341ms to 8227ms), if I increase the number of stripes (size of 
the lock array) to {{numThreads * 1000}}, they will eventually run in the same 
amount of time.  It is a sign that collision does affect the performance!

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-15 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475814#comment-16475814
 ] 

Cao Manh Dat commented on SOLR-12338:
-

Upload the patch that makes a change from using an array of lock into a 
{{SetBlockingQueue}}.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-15 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475423#comment-16475423
 ] 

Cao Manh Dat commented on SOLR-12338:
-

[~dsmiley] an annoying problem with ExecutorService is that when the number of 
threads reaches {{maximumPoolSize}} caller we meet RejectedExecutionException 
instead of waiting for threads to be available 
(https://stackoverflow.com/questions/44541784/synchronousqueue-does-not-block-when-offered-task-by-threadpoolexecutor).
 The easy solution then is using 
{{https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.CallerRunsPolicy.html}}.
 
In current {{OrderedExecutor}} we won't experience that problem, the caller in 
that case will just wait. 

But you are right about the collision may affect the performance!

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474832#comment-16474832
 ] 

David Smiley commented on SOLR-12338:
-

I looked at this again (after a few days of vacation) and I withdraw my concern 
that there's a bug.  The use of ArrayBlockingQueue(1) is acting as a sort of 
Lock in the same way I suggested to use a Lock.  Couldn't you simply replace it 
with a Lock?  The put() becomes a lock(), and the poll() becomes an unlock(); 
see what I mean?.  I think this is clearer since it's a simpler mechanism than 
an ArrayBlockingQueue, and the use of ABQ in this specific way (size 1) could 
lend itself to misuse later if someone thinks increasing its size or type gains 
us parallelism.  And I don't think the fairness setting matters here.  And 
although you initialized the size of this array of ABQ to be the number of 
threads, I think we ought to use a larger array to prevent collisions (prevent 
needlessly blocking on different docIDs that hash to the same thread).

I also was thinking of a way to have more "on-deck" runnables for a given 
docID, waiting in-line.  The Runnable we submit to the delegate could be some 
inner class OrderedRunnable that has a "next" pointer to the next 
OrderedRunnable.  We could maintain a parallel array of the top OrderedRunnable 
(parallel to an array of Locks).  Manipulating the OrderedRunnable chain 
requires holding the lock.  To ensure we bound these things waiting in-line, we 
could use one Semaphore for the whole OrderedExecutor instance.  There's more 
to it than this.  Of course this adds complexity, but the current approach 
(either ABQ or Lock) can unfortunately block needlessly if the doc ID is locked 
yet soon more/different dock IDs will be submitted next and there are available 
threads.  Perhaps this is overthinking it (over optimization / complexity) as 
this will not be the common case?  This would be even more needless if we 
increase the Lock array to prevent collisions so nevermind I guess.

 
{quote}(RE Submit without ID) This can help us to know how many threads are 
running (pending). Therefore OrderedExecutor does not execute more than 
\{{numThreads }}in parallel. It also solves the case when ExecutorService's 
queue is full it will throw RejectedExecutionException.
{quote}
Isn't this up to how the backing delegate is configured?  If it's using a fixed 
thread pool, then there won't be more threads running.  Likewise for 
RejectedExecutionException.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471502#comment-16471502
 ] 

Mark Miller commented on SOLR-12338:


bq. Yeah, but we do not need that flag for the case of LogReplayer, right? 
Because we are calling execute method in single-thread.

Technically that sounds right, but I'm not sure I read the contract explicitly 
promises that. If we have good testing, it's not much of a concern.

bq. OrderedExecutor ensuring that tasks are kicked off in order for a same id. 
Yeah, task1 get taken off the queue only after it finishes.

Yeah, so I don't think I spot an open issue for a race.

bq. I think we will throttle the incoming updates properly by doing SOLR-12305.

Ah right, had been looking at that issue recently too and had it on my mind. 
That is more where that comment belongs. I was thinking these queues would work 
with documents coming in and getting buffered, but they won't get held up from 
dropping off the document to the tlog. But anyway, I think that natural 
throttling is a good first step. I think at the end of the day, we will want to 
end up with a Filter though that can do QOS and intelligent throttling based on 
data, but I'm pro whatever gets us out of infinite tlog replay soonest short 
term.



> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-10 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471495#comment-16471495
 ] 

Cao Manh Dat commented on SOLR-12338:
-

{quote}
Given that some machines these days have dozens of cores and you might have 
many SolrCores recovering, we may want to cap the number of threads at some 
number or make it configurable or something.
{quote}
{{replayUpdatesExecutor }] is shared through all the SolrCores, therefore how 
many SolrCores are recovering won't affect the max number of threads will be 
used. Although, make it configurable is a good idea.

{quote}
Yeah, you need that to ensure FIFO.
{quote}
Yeah, but we do not need that flag for the case of LogReplayer, right? Because 
we are calling execute method in single-thread.

{quote}
I think what David is getting at is that you are ensuring that tasks are kicked 
off in order, but once they are kicked off, you can't guarantee order. So task1 
gets taken off the queue, then task 2 is taken, now task 2 gets executed first 
when task 1 has it's thread unluckily scheduled by the OS. At least that's how 
I read it. But that is not an issue right? Because you don't run an item from 
the queue until the one in front of it is fully run right?
{quote}
OrderedExecutor ensuring that tasks are kicked off in order *for a same id*. 
Yeah, task1 get taken off the queue only after it finishes.

{quote}
I like how this gives us some control to throttle, I wonder how efficient it is 
as documents keep thundering in though - do we gobble up threads and 
connections waiting? That is where it's a bummer it's hard to limit those 
resources. What are you going to do though? Those requests have to wait 
somewhere or we have to start dropping them - and hopefully with NIO2 it's 
somewhat efficient to wait on IO.
{quote}
I think we will throttle the incoming updates properly by doing SOLR-12305. 


> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471481#comment-16471481
 ] 

Mark Miller commented on SOLR-12338:


{noformat}
+  private OrderedExecutor replayUpdatesExecutor = new OrderedExecutor(
+  Runtime.getRuntime().availableProcessors(),
+  ExecutorUtil.newMDCAwareCachedThreadPool(
+  Runtime.getRuntime().availableProcessors(),
+  new DefaultSolrThreadFactory("replayUpdatesExecutor")));
{noformat}

Given that some machines these days have dozens of cores and you might have 
many SolrCores recovering, we may want to cap the number of threads at some 
number or make it configurable or something.

bq. This seems can be solve by set the fair flag of ArrayBlockingQueue to true

Yeah, you need that to ensure FIFO.

I like how this gives us some control to throttle, I wonder how efficient it is 
as documents keep thundering in though - do we gobble up threads and 
connections waiting? That is where it's a bummer it's hard to limit those 
resources. What are you going to do though? Those requests have to wait 
somewhere or we have to start dropping them - and hopefully with NIO2 it's 
somewhat efficient to wait on IO.

I think what David is getting at is that you are ensuring that tasks are kicked 
off in order, but once they are kicked off, you can't guarantee order. So task1 
gets taken off the queue, then task 2 is taken, now task 2 gets executed first 
when task 1 has it's thread unluckily scheduled by the OS. At least that's how 
I read it. But that is not an issue right? Because you don't run an item from 
the queue until the one in front of it is fully run right?


> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-10 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471392#comment-16471392
 ] 

Cao Manh Dat commented on SOLR-12338:
-

{quote}I have doubts on the use of a new ArrayBlockingQueue<>(1) per doc ID 
hash bucket. What if the client adds a Runnable for doc1, then immediately adds 
another Runnable for doc1. You're intending for the second runnable to block 
until the first completes to achieve the per-doc ID serialization. But this may 
not happen; a thread may start on the first runnable (which frees up the second 
runnable to be submitted), then the thread doesn't get CPU time, and then the 
other Runnable zooms ahead out-of-order. See what I mean?
{quote}
It is per threads (which is small), not per bucket. If I understand correctly, 
what you mean here is two threads waiting for a lock to be released, the one 
who come late win the lock. This seems can be solve by set the fair flag of 
{{ArrayBlockingQueue}} to true, right?

{quote}
Also if you submit without an ID, then it should probably proceed right to the 
delegate Executor.  Why does it pick an ID at random?
{quote}
This can help us to know how many threads are running (pending). Therefore 
OrderedExecutor does not execute more than {{numThreads }}in parallel. It also 
solves the case when ExecutorService's queue is full it will throw 
RejectedExecutionException.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-10 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470528#comment-16470528
 ] 

David Smiley commented on SOLR-12338:
-

Also if you submit without an ID, then it should probably proceed right to the 
delegate Executor.  Why does it pick an ID at random?

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-10 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470518#comment-16470518
 ] 

David Smiley commented on SOLR-12338:
-

This OrderedExecutor thing is nifty. It needs class-level documentation.
 I have doubts on the use of a {{new ArrayBlockingQueue<>(1)}} per doc ID hash 
bucket. What if the client adds a Runnable for doc1, then immediately adds 
another Runnable for doc1. You're intending for the second runnable to block 
until the first completes to achieve the per-doc ID serialization. But this may 
not happen; a thread may start on the first runnable (which frees up the second 
runnable to be submitted), then the thread doesn't get CPU time, and then the 
other Runnable zooms ahead out-of-order. See what I mean?

Instead of creating a {{new ArrayBlockingQueue<>(1)}} per doc ID hash bucket, 
lets create an array of Locks. When execute() is called, it immediately grabs 
the lock, potentially blocking. Then you can submit the provided Runnable with 
a wrapping Runnable that unlocks when done. This can be made simpler via using 
{{FutureTask}} subclass to override {{done()}}.  To be safe, catch a 
RejectedExecutionException from execute() to cancel the futuretask.  With this 
scheme, you might initialize the doc ID hash bucket array size to be larg-ish 
at 32, even if there are fewer threads (less accidental hash collision 
contention).  A Lock is light-weight.

The test uses System.currentTimeMillis() but should probably use nanos which 
the JVM guarantees to be sequential?

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-10 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470110#comment-16470110
 ] 

Cao Manh Dat commented on SOLR-12338:
-

Attached a patch for this ticket, here are some notes:
- Thanks to {{OrderedExecutor}}, all updates belong to same docId, it will be 
executed sequentially. Updates belong to different docId, will be executed in 
parallel.
- The patch adds a new test in TestRecovery, which ensure that even updates are 
executed in parallel we will have the same index as before.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-10 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470098#comment-16470098
 ] 

Cao Manh Dat commented on SOLR-12338:
-

Hi [~ichattopadhyaya] yeah I think this is a good idea to do that. It may not 
solve the case when a {{dbq1}} already re-ordered to ahead of {{add2}} or 
{{add3}}. But It won't make things worse than today.

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

2018-05-09 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469869#comment-16469869
 ] 

Ishan Chattopadhyaya commented on SOLR-12338:
-

There are some situations where if in-place updates and DBQs are re-ordered, 
then the entire document needs to be fetched from the leader. This is fine when 
we have an active leader, but in case of tlog replay, we would need to apply 
those updates in the same order.

I think if DBQs are executed in the right order (i.e. all updates before a DBQ 
was updated before the DBQ, and all updates after the DBQ are executed after 
the DBQ), then we can run the other updates in parallel.

Example:
{code:java}
add1
add2
add3
dbq1
add4
add5
add6
..
add20
dbq2
{code}
Here, add# are either full document updates or in-place updates. I suggest: we 
run updates add1-add3 in parallel, and then wait till they are done before 
executing db1, and then add4-add20 parallely and then wait and execute dbq2. 
This should be fine, I think. (CC [~hossman], wdyt?)

> Replay buffering tlog in parallel
> -
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org