On May 14, 2009, at 6:26 PM, Matt Goodall wrote:
Secondly, if the network connection fails in the middle of
replication
(closing an ssh tunnel is a good way to test this ;-)) then it
seems
to retry a few (10) times before the replicator process
terminates. If
the network connection becomes available again (restart the ssh
tunnel) the replicator doesn't seem to notice. Also, I just
noticed
that Futon still lists the replication on its status page.
That's correct, the replicator does try to ignore transient
failures.
Hmm, it seemed to fail on transient failures here. After killing and
restarting my ssh tunnel I left the replication a while and it never
seemed to continue, and the only way to clear it from the status
list
was to restart the couchdb server. I'll check again though.
Ok, I misread you earlier. It's possible that CouchDB or ibrowse
is trying
to reuse a socket when it really should be opening a new one. That
would be
a bug.
This one definitely seems like a bug. Killing and restarting my SSH
tunnel basically kills the replication, I can see no sign of it
resuming.
You get this in the log ...
<snip>
and then nothing.
Worst of all is that couch still thinks the replication is running and
refuses to start another one. Currently, the only solution is to
restart the couch server :-/.
Thanks again for catching this bug, Matt. The example you showed
occurs when we record a checkpoint record, but there was also a
similar problem with writing attachments to disk. I've committed a
very simplistic fix for the problem; the replicator should now realize
that these requests are never going to complete and commit seppuku.
Not the most elegant solution, perhaps, but it's certainly better than
restarting the server. The error message should take one of the
following forms (still working on standardizing these error messages,
of course):
{"error":"replication_link_failure", "reason":"{gen_server, call ...}"}
{"error":"internal_server_error", "reason":"replication_link_failure"}
{"error":"attachment_request_failed", "reason":"failed to replicate
http://..."}
{"error":"attachment_request_failed", "reason":"ibrowse error on
http://... : Reason"}
We'll work on a more fine-grained failure mode in the future. Best,
Adam