Re: partial replications

Matt Aimonetti Mon, 28 Sep 2009 15:41:21 -0700

in the meantime, you might want to try a push replication just in case.
- Matt


On Mon, Sep 28, 2009 at 3:38 PM, Adam Kocoloski <[email protected]> wrote:

> On Sep 28, 2009, at 4:44 PM, Ning Tan wrote:
>
>  On Mon, Sep 28, 2009 at 2:41 PM, Adam Kocoloski <[email protected]>
>> wrote:
>>
>>> On Sep 28, 2009, at 1:21 PM, Ning Tan wrote:
>>>
>>>  Hi,
>>>>
>>>> When we replicate between a remote database and a local one (pulling
>>>> from remote into local), we are observing partial replications,
>>>> meaning that we have to issue repeated _replicate calls for the
>>>> replication to complete. For a database with 10,000 documents, for
>>>> example, it could take up to 7 calls for the entire database to
>>>> replicate into an empty one. Each time, the number of documents
>>>> replicated over seemed random.
>>>>
>>>> Thanks.
>>>>
>>>
>>> Hi, it's certainly not the expected behavior.  When the POST to
>>> _replicate
>>> returns and not all documents have been replicated, what does the
>>> response
>>> look like?  Is there anything in the target log indicating a crash?  Can
>>> you
>>> be more specific about the versions you are using?
>>>
>>> Best, Adam
>>>
>>>
>> Nothing indicated a crash. We have 0.10.0a818506 on a Mac, and
>> something very close on an Ubuntu (I'll find the exact version later).
>>
>> Here's the replication response as well as the interesting logs on the
>> target machine. It seems to me that every (not all) partial
>> replication process is associated with a corresponding entry in the
>> log that says "recording a checkpoint at source update_seq .....".
>> (i.e. you can match the recorded_seq number in the replication
>> response with the checkpoint update_seq numbers in the log).
>>
>> {"session_id":"439d41bad454ea5d5dcb16a154800a23","start_time":"Wed, 23
>> Sep 2009 18:07:33 GMT","end_time":"Wed, 23 Sep 2009 18:07:53
>>
>> GMT","start_last_seq":8663,"end_last_seq":17619,"recorded_seq":17619,"missing_checked":0,"missing_found":8952,"docs_read":8952,"docs_written":8952,"doc_write_failures":0}
>> {"session_id":"f85e575614479547d70277d24bff2d51","start_time":"Wed, 23
>> Sep 2009 18:07:12 GMT","end_time":"Wed, 23 Sep 2009 18:07:17
>>
>> GMT","start_last_seq":7710,"end_last_seq":8663,"recorded_seq":8663,"missing_checked":0,"missing_found":953,"docs_read":953,"docs_written":953,"doc_write_failures":0}
>> {"session_id":"84dc053e810b8a46f19c95ef560d42d5","start_time":"Wed, 23
>> Sep 2009 18:06:32 GMT","end_time":"Wed, 23 Sep 2009 18:06:37
>>
>> GMT","start_last_seq":7021,"end_last_seq":7710,"recorded_seq":7710,"missing_checked":0,"missing_found":689,"docs_read":689,"docs_written":689,"doc_write_failures":0}
>> {"session_id":"e72b655988ecc26b85b412fcaf05018a","start_time":"Wed, 23
>> Sep 2009 18:05:47 GMT","end_time":"Wed, 23 Sep 2009 18:05:52
>>
>> GMT","start_last_seq":5792,"end_last_seq":7021,"recorded_seq":7021,"missing_checked":0,"missing_found":1229,"docs_read":1229,"docs_written":1229,"doc_write_failures":0}
>> {"session_id":"8fd5d827721e70a28735ad4c3a291c3f","start_time":"Wed, 23
>> Sep 2009 18:05:30 GMT","end_time":"Wed, 23 Sep 2009 18:05:35
>>
>> GMT","start_last_seq":4875,"end_last_seq":5792,"recorded_seq":5792,"missing_checked":0,"missing_found":917,"docs_read":917,"docs_written":917,"doc_write_failures":0}
>> {"session_id":"187faed013cb2b63b714aab7845e3f56","start_time":"Wed, 23
>> Sep 2009 18:05:02 GMT","end_time":"Wed, 23 Sep 2009 18:05:07
>>
>> GMT","start_last_seq":4539,"end_last_seq":4875,"recorded_seq":4875,"missing_checked":0,"missing_found":336,"docs_read":336,"docs_written":336,"doc_write_failures":0}
>> {"session_id":"e30ee09b3da0dd979d655382bc3dadc8","start_time":"Wed, 23
>> Sep 2009 18:04:23 GMT","end_time":"Wed, 23 Sep 2009 18:04:34
>>
>> GMT","start_last_seq":1590,"end_last_seq":4539,"recorded_seq":4539,"missing_checked":0,"missing_found":2949,"docs_read":2949,"docs_written":2949,"doc_write_failures":0}
>> {"session_id":"3486a3b8d8a1e5eee05b82dcf4c66153","start_time":"Wed, 23
>> Sep 2009 18:02:17 GMT","end_time":"Wed, 23 Sep 2009 18:02:22
>>
>> GMT","start_last_seq":0,"end_last_seq":1590,"recorded_seq":1590,"missing_checked":0,"missing_found":1590,"docs_read":1590,"docs_written":1590,"doc_write_failures":0}
>>
>> [Wed, 23 Sep 2009 18:04:28 GMT] [info] [<0.1959.0>] recording a
>> checkpoint at source update_seq 3632
>>
>> [Wed, 23 Sep 2009 18:04:34 GMT] [info] [<0.1959.0>] recording a
>> checkpoint at source update_seq 4539
>>
>> [Wed, 23 Sep 2009 18:04:41 GMT] [info] [<0.1941.0>] 127.0.0.1 - -
>> 'POST' /_replicate 200
>>
>> Wed, 23 Sep 2009 18:05:02 GMT] [info] [<0.1941.0>] starting
>> replication "9577548b0faafa46430af6d8b2898a47" at <0.4981.0>
>>
>> [Wed, 23 Sep 2009 18:05:07 GMT] [info] [<0.4981.0>] recording a
>> checkpoint at source update_seq 4875
>>
>> [Wed, 23 Sep 2009 18:05:17 GMT] [info] [<0.1941.0>] 127.0.0.1 - -
>> 'POST' /_replicate 200
>>
>> [Wed, 23 Sep 2009 18:05:30 GMT] [info] [<0.1941.0>] starting
>> replication "9577548b0faafa46430af6d8b2898a47" at <0.5376.0>
>>
>> [Wed, 23 Sep 2009 18:05:35 GMT] [info] [<0.5376.0>] recording a
>> checkpoint at source update_seq 5792
>>
>> Wed, 23 Sep 2009 18:05:43 GMT] [info] [<0.1941.0>] 127.0.0.1 - -
>> 'POST' /_replicate 200
>>
>> [Wed, 23 Sep 2009 18:05:47 GMT] [info] [<0.1941.0>] starting
>> replication "9577548b0faafa46430af6d8b2898a47" at <0.6322.0>
>>
>> [Wed, 23 Sep 2009 18:05:52 GMT] [info] [<0.6322.0>] recording a
>> checkpoint at source update_seq 7021
>>
>> [Wed, 23 Sep 2009 18:05:59 GMT] [info] [<0.1941.0>] 127.0.0.1 - -
>> 'POST' /_replicate 200
>>
>> [Wed, 23 Sep 2009 18:06:32 GMT] [info] [<0.1945.0>] starting
>> replication "9577548b0faafa46430af6d8b2898a47" at <0.7609.0>
>>
>> [Wed, 23 Sep 2009 18:06:37 GMT] [info] [<0.7609.0>] recording a
>> checkpoint at source update_seq 7710
>>
>> Wed, 23 Sep 2009 18:06:41 GMT] [info] [<0.1945.0>] 127.0.0.1 - -
>> 'POST' /_replicate 200
>>
>> [Wed, 23 Sep 2009 18:07:12 GMT] [info] [<0.7608.0>] starting
>> replication "9577548b0faafa46430af6d8b2898a47" at <0.8369.0>
>>
>> [Wed, 23 Sep 2009 18:07:17 GMT] [info] [<0.8369.0>] recording a
>> checkpoint at source update_seq 8663
>>
>> [Wed, 23 Sep 2009 18:07:20 GMT] [info] [<0.7608.0>] 127.0.0.1 - -
>> 'POST' /_replicate 200
>>
>> [Wed, 23 Sep 2009 18:07:23 GMT] [info] [<0.7608.0>] 127.0.0.1 - -
>> 'GET' /_utils/image/delete-mini.png 304
>>
>> [Wed, 23 Sep 2009 18:07:33 GMT] [info] [<0.7608.0>] starting
>> replication "9577548b0faafa46430af6d8b2898a47" at <0.9376.0>
>>
>> [Wed, 23 Sep 2009 18:07:38 GMT] [info] [<0.9376.0>] recording a
>> checkpoint at source update_seq 10821
>>
>> [Wed, 23 Sep 2009 18:07:44 GMT] [info] [<0.9376.0>] recording a
>> checkpoint at source update_seq 13507
>>
>> [Wed, 23 Sep 2009 18:07:50 GMT] [info] [<0.9376.0>] recording a
>> checkpoint at source update_seq 16222
>>
>> [Wed, 23 Sep 2009 18:07:53 GMT] [info] [<0.9376.0>] recording a
>> checkpoint at source update_seq 17619
>>
>
> Hmm, I must admit I'm stumped so far.  Are you by any chance building from
> SVN repeatedly and installing into the same prefix?  Please feel free to
> file a ticket in JIRA[1] so we don't forget about this.  You might try again
> with the log level on the target set to debug, although I'm not certain it
> will tell us anything.  I'll see if I can find a way to reproduce this.
>  Best,
>
> Adam
>
> [1]: https://issues.apache.org/jira/browse/COUCHDB
>
>

Re: partial replications

Reply via email to