On 02/02/2009, at 5:01 AM, Adam Kocoloski wrote:
[...]
That's odd. I tried setting a 120 second timeout and didn't have
any trouble. Then again, I only ran the test suite; I didn't
actually force a timeout to occur or anything. Sorry, I don't have
any hints at the moment.
Guh. I'm an idiot. I'd forgotten to create the destination database.
IIn my haste to test it I used futon, not my normal script, and of
course, interpreted the error as something with the code I'd changed.
Sorry about that. : /
With the changes, it worked first time .. although did give a spurious
error about how a server had restarted.
Multipart won't solve the problem where ibrowse throws a timeout
error even while it's still sending data. That seems like a pretty
curious choice on ibrowse' part to me. Maybe when I have some more
free time I can look into the timeout algo and see if it can be
tweaked so that it only starts after the request has been fully
transmitted. I think that would pretty much solve this problem.
Barring that, I agree that some sort of back-off algorithm that
lengthens the timeout after each failed request is warranted.
There's also one more knob we can turn. During replication we are
checking the memory consumption of the process collecting docs to
send to the target. If it hits 10MB we send the bulk immediately,
regardless of whether it's 1 doc, 10, or 99. 10MB may be much too
high given a 30 second timeout window in which we have to transmit
the data; 1MB is possibly a better fit for home broadband users. If
you want to fiddle with that knob instead of the ibrowse timeout you
can try changing line 224 of couch_rep.erl so that instead of
couch_util:should_flush()
it would read (value is in bytes)
couch_util:should_flush(1000000)
Awesome tip. Thanks. Yeah, I had never noticed any problem with server
to server replication... only when I then tried to do it from home...
I don't have a strong opinion at this point in time about how many
of these parameters ought to be tunable in local.ini. Best,
My opinion is usually that pretty much everything with a big effect,
like this, should have a sensible default, but overrideable in config.
Failing that, maybe the default timeout should be raised?
Thanks heaps for your help.
Sho