[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-15 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: parallelize-peersync.patch

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2, 
> parallelize-peersync.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-15 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: (was: parallelize-peersync.patch)

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2, 
> parallelize-peersync.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-15 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: parallelize-peersync.patch

Attached working patch. 
For my tests I didn't see much improvement (in fact in some cases performance 
degraded) with parallelization. I could not find any hotspot in the profile.

My theory is documents in test are so shorts and simple, that although 
parallelizing is working functionally, we need to test this with more complex 
documents and verify performance gains. 

Most of the parallelization parameters would be subjective and people need to 
verify which ones work better for them.

It also seems performance would suffer if there are relatively high DBQs to  
applied during DBQs, since updates are applied out of order.

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2, 
> parallelize-peersync.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-04 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: SOLR-9689.patch2

A new patch with configurable threshold for parallelism  

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-10-24 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: SOLR-9689.patch

POC for applying updates concurrently. 
Please review it and let me know if there are gaping issues. 

I would also appreciate any suggestions to handle out of order {{DBQ} (I think 
by default we keep a few {{DBQs}} around to account for out of order upates), 
may be we can increase the number of {{DBQs}} we keep around if {{DBQs}} have 
{{PEER_SYNC}} flag set on it.

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-10-24 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Summary: Process updates concurrently during PeerSync  (was: Process 
updates concurrently during {{PeerSync}})

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org