[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread nickva
Github user nickva commented on the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#issuecomment-215300013
  
@kxepal

Ah I see what you meant. Yeah it might be possible to have a batcher for 
doc updates. 

In general it would be better to not have to write back do docs on every 
error. Or not have to write back to docs at all ;-) Now when retries happen 
(especially if they happen at the same time due to common source failure) all 
retires will write errors to docs, which will kick off change feeds, which will 
be noticed by replicator manager at the same time restarts are also happening. 
So it creates a positive feedback loop.

As for other question, noticed issues on a cluster with 1000 or so 
replications.  There were other pecuilarities about it, but it manifested these 
sharp load spikes -- when a few things failed, it triggered other events, which 
added even load and then more failures and so on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread kxepal
Github user kxepal commented on the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#issuecomment-215289304
  
@nickva 
> We are thinking how to built that however that will take longer and will 
be closer to a complete re-write of the replicator code.

I don't think you need to rewrite replicator code to add another manager 
that receives messages with what update_rep_doc/3 gets and then by recurrent 
event folds own queue into batch and reuses existed bulk_docs call against 
replicator db (: Basically, sort of reducer. 

But that's true, some more work that will require while random is quick fix 
for the issue that just works in most of setups. I just though that Cloudant 
has exactly large setups where that fix won't work well (;

But ok, thank you for explanation of the idea.

Few more questions, if you don't mind (:
What's the critical amount of replications when they massive update cause 
issues? And the following one: does it worth to not sleep for quite short 
periods e.g. small replications amount if they couldn't cause any issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread nickva
Github user nickva commented on the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#issuecomment-215287320
  
@kxepal 

I tried to accomodate small and larger setups by making the range  
proportional to number of replications on current node. So small setups of only 
a few replications, won't have to sleep for long and waste time. And large 
number of replications won't wreck havok either and they'll exibit more 
graceful failures.  That is why it is not a fixed parameter or a configuration 
setting (also there are already dozens of configuration parameters which are 
hard to remember and know what they do).

> And will batching ever helps. Do you have resources for such kind of 
testing?

Good point, agreed as well!  We are thinking how to built that however that 
will take longer and will be closer to a complete re-write of the replicator 
code. In the meantime, it seems it is beneficial to have some smaller, less 
invasive fixes to some issues we've encountered. This is one of those issues.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread kxepal
Github user kxepal commented on the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#issuecomment-215283702
  
@nickva 
Aha, I see. But I think these numbers wouldn't be a something special for 
big setups. Also, another point: I don't think that there be much difference 
between 57134 and 57266 time for sleep because under load time acts in own 
fashion, so it's possible that these two will get waked up at the same time 
actually. So much likely we actually will get a lot of small clusters on 
0..6 range and in the end we'll fill all the range with one-two processes. 
What returns us to the initial problem again (:

I think at least, it worth to try this on large setup like you said, with 
thousands replications per node, to see how it will actually go. And will 
batching ever helps. Do you have resources for such kind of testing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread nickva
Github user nickva commented on the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#issuecomment-215282100
  
@kxepal 

Good point. Agreed. I meant to do not min(uniform(Range), Max) but 
uniform(min(Range,Max)).  Fixed it. Here is how it looks now:


``` 
 RepCount=1000, rp([random:uniform(min(RepCount*200, 6)) || _ <- 
lists:seq(0,100)]).

[38394,16327,5235,54911,43938,37439,49599,25058,29127,28940,
 27670,27339,12362,13214,24914,57804,34424,13094,27521,24637,
 7739,19875,4917,10425,56944,47949,50781,56566,2593,24202,
 17239,46795,53108,3802,52754,12959,50775,8052,29843,17078,
 25021,17026,58712,3914,15789,28334,10623,37113,44233,45063,
 53042,50468,42049,49150,16030,12299,17236,18713,1647,13074,
 38997,343,21045,3184,35015,36434,7195,29336,58034,2690,
 9,42243,31052,9085,35359,57134,57266,17889,28703,20612,
 40081,14827,32685,25217,16623,9715,24076,44923,18544,12058,
 58077,34974,36248,43335,56866,2442,2207,12111,2853,15403,
 43411]
```
RepCount is also replications on one (local) node. In a 3 node cluster if 
RepCount=1000, it means there are 3000 replications. If 12 node cluster, 12k 
and so on.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-mrview pull request: Make view updater couch_work_qu...

2016-04-27 Thread kxepal
Github user kxepal commented on the pull request:


https://github.com/apache/couchdb-couch-mrview/pull/45#issuecomment-215278234
  
Thank you! That's something to figure the direction to start dig into.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-mrview pull request: Make view updater couch_work_qu...

2016-04-27 Thread iilyak
Github user iilyak commented on the pull request:


https://github.com/apache/couchdb-couch-mrview/pull/45#issuecomment-215277748
  
> I see that, but how to adjust?

I am afraid I don't have a good recipe for this one. 

> Will multiple defaults by 2 will makes things better and with which cost?

You usually want to reduce the values not to increase them. If you have a 
case when you update db way to fast so compaction cannot keep up and your disk 
storage approaching point of no return. You would want to slow down updates to 
let compaction finish and free up some space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-mrview pull request: Make view updater couch_work_qu...

2016-04-27 Thread kxepal
Github user kxepal commented on the pull request:


https://github.com/apache/couchdb-couch-mrview/pull/45#issuecomment-215274620
  
I see that, but how to adjust? (: Will multiple defaults by 2 will makes 
things better and with which cost?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (COUCHDB-3005) Make couch_work_queue options configurable for couch_mrview_updater

2016-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261306#comment-15261306
 ] 

ASF GitHub Bot commented on COUCHDB-3005:
-

Github user asfgit closed the pull request at:

https://github.com/apache/couchdb-couch-mrview/pull/45


> Make couch_work_queue options configurable for couch_mrview_updater
> ---
>
> Key: COUCHDB-3005
> URL: https://issues.apache.org/jira/browse/COUCHDB-3005
> Project: CouchDB
>  Issue Type: Improvement
>Reporter: ILYA
>
> For performance reasons in some cases there is a need to put a cap on queue 
> size. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3005) Make couch_work_queue options configurable for couch_mrview_updater

2016-04-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261305#comment-15261305
 ] 

ASF subversion and git services commented on COUCHDB-3005:
--

Commit 3f8230cbfc5b226080364cc9801cb0eeafc1985f in couchdb-couch-mrview's 
branch refs/heads/master from [~iilyak]
[ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-mrview.git;h=3f8230c ]

Make view updater couch_work_queue configurable

For performance reasons in some cases there is a need to put a cap
on a queue size.

COUCHDB-3005


> Make couch_work_queue options configurable for couch_mrview_updater
> ---
>
> Key: COUCHDB-3005
> URL: https://issues.apache.org/jira/browse/COUCHDB-3005
> Project: CouchDB
>  Issue Type: Improvement
>Reporter: ILYA
>
> For performance reasons in some cases there is a need to put a cap on queue 
> size. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] couchdb-couch-mrview pull request: Make view updater couch_work_qu...

2016-04-27 Thread iilyak
Github user iilyak commented on the pull request:


https://github.com/apache/couchdb-couch-mrview/pull/45#issuecomment-215273659
  
The defaults are the same as before 
[see](https://github.com/apache/couchdb-couch-mrview/pull/45/files#diff-45d9d72f19400a814928819905d39307L24).
 You might want to adjust the values if you want to prioritize compaction for 
example. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread kxepal
Github user kxepal commented on the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#issuecomment-215263623
  
```
14> RepCount2 = 300.
300
15> Delays2 = [lists:min([random:uniform(RepCount2 * 2 * 
AVG_ERROR_DELAY_MSEC), MAX_ERROR_DELAY_MSEC]) || _ <- lists:seq(0, RepCount2)].
[9485,56602,52750,39220,30653,11672,8528,41518,11949,53177,
 23181,14300,32533,40462,39026,13175,39478,11391,5080,26598,
 3089,18850,27536,53811,29149,10794,40307,30398,474|...]
17> lists:foldl(fun(I,A) when I =:= MAX_ERROR_DELAY_MSEC -> A + 1; (I, A) 
-> A end, 0, Delays2).
1
18> RepCount3 = 500.

500
19> Delays3 = [lists:min([random:uniform(RepCount3 * 2 * 
AVG_ERROR_DELAY_MSEC), MAX_ERROR_DELAY_MSEC]) || _ <- lists:seq(0, RepCount3)].
[27337,448,39373,37148,18554,17181,6,52388,30241,6,
 6,6,6,43381,56073,32443,48254,6,42480,6,
 6,27602,24357,4499,4179,15436,6,6,10043|...]
20> lists:foldl(fun(I,A) when I =:= MAX_ERROR_DELAY_MSEC -> A + 1; (I, A) 
-> A end, 0, Delays3).
200
21> RepCount4 = 400.

400
22> Delays4 = [lists:min([random:uniform(RepCount4 * 2 * 
AVG_ERROR_DELAY_MSEC), MAX_ERROR_DELAY_MSEC]) || _ <- lists:seq(0, RepCount4)].
[57108,6,42816,39892,6,26370,58639,33867,23407,
 6,6,27858,57792,6,4635,32163,49462,6,2840,
 33347,59350,51291,59550,39674,11517,30974,18879,48330,9700|...]
23> lists:foldl(fun(I,A) when I =:= MAX_ERROR_DELAY_MSEC -> A + 1; (I, A) 
-> A end, 0, Delays4).
120
```

For 300 replications it's only, one, but after that number grows fast: 400 
gets you ~100 and 500 gets you ~200 one-minute-sleepers. So we're returning to 
the same problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread kxepal
Github user kxepal commented on the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#issuecomment-215262746
  
```
8> AVG_ERROR_DELAY_MSEC = 100.
100
9> RepCount = 1000.   
1000
10> MAX_ERROR_DELAY_MSEC = 6.
6
11> Delays = [lists:min([random:uniform(RepCount * 2 * 
AVG_ERROR_DELAY_MSEC), MAX_ERROR_DELAY_MSEC]) || _ <- lists:seq(0, RepCount)].
[6,6,553,6,6,26538,6,6,6,6,
 6,6,6,6,6,6,54556,25234,6,6,
 6,6,50489,6,6,6,6,47442,6|...]
12> lists:foldl(fun(I,A) when I =:= MAX_ERROR_DELAY_MSEC -> A + 1; (I, A) 
-> A end, 0, Delays).  
707
```
So, for 1000 replications we'll have at least ~700 which sleeps for one 
minute.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread kxepal
Github user kxepal commented on the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#issuecomment-215260015
  
I like the fix of problem, but I don't like the way it happens. Actually, I 
don't like two things: random and sleep. The random is completely 
unpredictable. The sleeps is the way to make system hanged for no actual 
reason. 

The problem of this algorithm is that as more replications you'll have, as 
more chances that you'll return to the stampede effect that you try to avoid. 
After point 300 you'll rise chances that more and more replications will always 
sleep for 1 minute and all will get awaken at the same time with the all same 
effect.

I think the proper fix should be at least as like as delayed commits works: 
accumulate updates for some short, but fixed time frame and flush them bulk on 
disk. No matter how much replications we'll have running, we're not depended on 
their number here and we don't looks hanged, but responsive as soon as we can. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-mrview pull request: Make view updater couch_work_qu...

2016-04-27 Thread kxepal
Github user kxepal commented on the pull request:


https://github.com/apache/couchdb-couch-mrview/pull/45#issuecomment-215260829
  
+1, but what's recommendations for how to configure these options? Why 
defaults are picked so and when to change them to make things better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread nickva
Github user nickva commented on a diff in the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#discussion_r61345382
  
--- Diff: src/couch_replicator_manager.erl ---
@@ -124,13 +126,25 @@ replication_error(#rep{id = {BaseId, _} = RepId}, 
Error) ->
 nil ->
 ok;
 #rep_state{rep = #rep{db_name = DbName, doc_id = DocId}} ->
+add_error_jitter(),
 update_rep_doc(DbName, DocId, [
 {<<"_replication_state">>, <<"error">>},
 {<<"_replication_state_reason">>, 
to_binary(error_reason(Error))},
 {<<"_replication_id">>, ?l2b(BaseId)}]),
 ok = gen_server:call(?MODULE, {rep_error, RepId, Error}, infinity)
 end.
 
+% Add random delay proportional to the number of replications
+% on current node, in order to prevent a stampede when a source
+% with multiple replication targets fails
+add_error_jitter() ->
+RepCount = ets:info(?REP_TO_STATE, size),
+Range = RepCount * 2 * ?AVG_ERROR_DELAY_MSEC,
--- End diff --

Since config macro is called "average error", then made the random uniform 
range * 2. For example if need an average of 50,  sample from a uniform 
interval of 0 to 100.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread sagelywizard
Github user sagelywizard commented on a diff in the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#discussion_r61341684
  
--- Diff: src/couch_replicator_manager.erl ---
@@ -124,13 +126,25 @@ replication_error(#rep{id = {BaseId, _} = RepId}, 
Error) ->
 nil ->
 ok;
 #rep_state{rep = #rep{db_name = DbName, doc_id = DocId}} ->
+add_error_jitter(),
 update_rep_doc(DbName, DocId, [
 {<<"_replication_state">>, <<"error">>},
 {<<"_replication_state_reason">>, 
to_binary(error_reason(Error))},
 {<<"_replication_id">>, ?l2b(BaseId)}]),
 ok = gen_server:call(?MODULE, {rep_error, RepId, Error}, infinity)
 end.
 
+% Add random delay proportional to the number of replications
+% on current node, in order to prevent a stampede when a source
+% with multiple replication targets fails
+add_error_jitter() ->
+RepCount = ets:info(?REP_TO_STATE, size),
+Range = RepCount * 2 * ?AVG_ERROR_DELAY_MSEC,
--- End diff --

Why `* 2`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-mrview pull request: Make view updater couch_work_qu...

2016-04-27 Thread tonysun83
Github user tonysun83 commented on the pull request:


https://github.com/apache/couchdb-couch-mrview/pull/45#issuecomment-215228272
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread nickva
Github user nickva commented on a diff in the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#discussion_r61329387
  
--- Diff: src/couch_replicator_manager.erl ---
@@ -124,13 +127,30 @@ replication_error(#rep{id = {BaseId, _} = RepId}, 
Error) ->
 nil ->
 ok;
 #rep_state{rep = #rep{db_name = DbName, doc_id = DocId}} ->
+add_error_jitter(RepId, DbName, DocId),
 update_rep_doc(DbName, DocId, [
 {<<"_replication_state">>, <<"error">>},
 {<<"_replication_state_reason">>, 
to_binary(error_reason(Error))},
 {<<"_replication_id">>, ?l2b(BaseId)}]),
 ok = gen_server:call(?MODULE, {rep_error, RepId, Error}, infinity)
 end.
 
+% Add random delay proportional to the number of replications
+% on current node, in order to prevent a stampede when a source
+% with multiple replication targets fails
+add_error_jitter(RepId, DbName, DocId) ->
+case ets:info(?REP_TO_STATE, size) of
--- End diff --

Agreed, it means table is deleted or not created yet. We'd want to crash in 
either case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-couch-replicator pull request: Add jittered delay during r...

2016-04-27 Thread rnewson
Github user rnewson commented on a diff in the pull request:


https://github.com/apache/couchdb-couch-replicator/pull/37#discussion_r61328162
  
--- Diff: src/couch_replicator_manager.erl ---
@@ -124,13 +127,30 @@ replication_error(#rep{id = {BaseId, _} = RepId}, 
Error) ->
 nil ->
 ok;
 #rep_state{rep = #rep{db_name = DbName, doc_id = DocId}} ->
+add_error_jitter(RepId, DbName, DocId),
 update_rep_doc(DbName, DocId, [
 {<<"_replication_state">>, <<"error">>},
 {<<"_replication_state_reason">>, 
to_binary(error_reason(Error))},
 {<<"_replication_id">>, ?l2b(BaseId)}]),
 ok = gen_server:call(?MODULE, {rep_error, RepId, Error}, infinity)
 end.
 
+% Add random delay proportional to the number of replications
+% on current node, in order to prevent a stampede when a source
+% with multiple replication targets fails
+add_error_jitter(RepId, DbName, DocId) ->
+case ets:info(?REP_TO_STATE, size) of
--- End diff --

can ets:info(Tab, size) ever return a non-integer? I think `RepCoun = 
ets:info(?REP_TO_STATE, size)` is better, we should have boiler-plate error 
handling for errors that can't happen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] couchdb-mem3 pull request: Improve mem3_sync event listener perfor...

2016-04-27 Thread banjiewen
Github user banjiewen commented on the pull request:

https://github.com/apache/couchdb-mem3/pull/19#issuecomment-215213835
  
@kocolosk: Mind giving a vote on this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (COUCHDB-3006) Source failure in one source to many target replications causes a stampede

2016-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260804#comment-15260804
 ] 

ASF GitHub Bot commented on COUCHDB-3006:
-

GitHub user nickva opened a pull request:

https://github.com/apache/couchdb-couch-replicator/pull/37

Add jittered delay during replication error handling

For one-to-many replications, when source fails, it
can create a stampede effect. A jittered delay is
used to avoid that.

Delay is random, in a range proportional to
current number of replications, with a maximum of
1 minute.

Seed random number generator within each
replication process with a non-deterministic value,
otherwise the same sequence of delays is generated
for all replications.

Jira: COUCHDB-3006

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3006

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/couchdb-couch-replicator/pull/37.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #37


commit 0f9bc69f37ae970371af64525ae6c77332e60a07
Author: Nick Vatamaniuc 
Date:   2016-04-27T19:21:14Z

Add jittered delay during replication error handling

For one-to-many replications, when source fails, it
can create a stampede effect. A jittered delay is
used to avoid that.

Delay is random, in a range proportional to
current number of replications, with a maximum of
1 minute.

Seed random number generator within each
replication process with a non-deterministic value,
otherwise the same sequence of delays is generated
for all replications.

Jira: COUCHDB-3006




> Source failure in one source to many target replications causes a stampede
> --
>
> Key: COUCHDB-3006
> URL: https://issues.apache.org/jira/browse/COUCHDB-3006
> Project: CouchDB
>  Issue Type: Bug
>Reporter: Nick Vatamaniuc
>
> For multiple replications from a single source to multiple targets. If source 
> fails, all replications post an error state back their replication document 
> and attempt to restart. This creates a stampede effect and causes sharp load 
> spikes on the replication cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COUCHDB-3006) Source failure in one source to many target replications causes a stampede

2016-04-27 Thread Nick Vatamaniuc (JIRA)
Nick Vatamaniuc created COUCHDB-3006:


 Summary: Source failure in one source to many target replications 
causes a stampede
 Key: COUCHDB-3006
 URL: https://issues.apache.org/jira/browse/COUCHDB-3006
 Project: CouchDB
  Issue Type: Bug
Reporter: Nick Vatamaniuc


For multiple replications from a single source to multiple targets. If source 
fails, all replications post an error state back their replication document and 
attempt to restart. This creates a stampede effect and causes sharp load spikes 
on the replication cluster.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (COUCHDB-3004) fewfrererer

2016-04-27 Thread Jan Lehnardt (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lehnardt closed COUCHDB-3004.
-
Resolution: Invalid

> fewfrererer
> ---
>
> Key: COUCHDB-3004
> URL: https://issues.apache.org/jira/browse/COUCHDB-3004
> Project: CouchDB
>  Issue Type: Bug
>Reporter: ASF subversion and git services
>
> fewfrererer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)