in the meantime, you might want to try a push replication just in case. - Matt
On Mon, Sep 28, 2009 at 3:38 PM, Adam Kocoloski <[email protected]> wrote: > On Sep 28, 2009, at 4:44 PM, Ning Tan wrote: > > On Mon, Sep 28, 2009 at 2:41 PM, Adam Kocoloski <[email protected]> >> wrote: >> >>> On Sep 28, 2009, at 1:21 PM, Ning Tan wrote: >>> >>> Hi, >>>> >>>> When we replicate between a remote database and a local one (pulling >>>> from remote into local), we are observing partial replications, >>>> meaning that we have to issue repeated _replicate calls for the >>>> replication to complete. For a database with 10,000 documents, for >>>> example, it could take up to 7 calls for the entire database to >>>> replicate into an empty one. Each time, the number of documents >>>> replicated over seemed random. >>>> >>>> Thanks. >>>> >>> >>> Hi, it's certainly not the expected behavior. When the POST to >>> _replicate >>> returns and not all documents have been replicated, what does the >>> response >>> look like? Is there anything in the target log indicating a crash? Can >>> you >>> be more specific about the versions you are using? >>> >>> Best, Adam >>> >>> >> Nothing indicated a crash. We have 0.10.0a818506 on a Mac, and >> something very close on an Ubuntu (I'll find the exact version later). >> >> Here's the replication response as well as the interesting logs on the >> target machine. It seems to me that every (not all) partial >> replication process is associated with a corresponding entry in the >> log that says "recording a checkpoint at source update_seq .....". >> (i.e. you can match the recorded_seq number in the replication >> response with the checkpoint update_seq numbers in the log). >> >> {"session_id":"439d41bad454ea5d5dcb16a154800a23","start_time":"Wed, 23 >> Sep 2009 18:07:33 GMT","end_time":"Wed, 23 Sep 2009 18:07:53 >> >> GMT","start_last_seq":8663,"end_last_seq":17619,"recorded_seq":17619,"missing_checked":0,"missing_found":8952,"docs_read":8952,"docs_written":8952,"doc_write_failures":0} >> {"session_id":"f85e575614479547d70277d24bff2d51","start_time":"Wed, 23 >> Sep 2009 18:07:12 GMT","end_time":"Wed, 23 Sep 2009 18:07:17 >> >> GMT","start_last_seq":7710,"end_last_seq":8663,"recorded_seq":8663,"missing_checked":0,"missing_found":953,"docs_read":953,"docs_written":953,"doc_write_failures":0} >> {"session_id":"84dc053e810b8a46f19c95ef560d42d5","start_time":"Wed, 23 >> Sep 2009 18:06:32 GMT","end_time":"Wed, 23 Sep 2009 18:06:37 >> >> GMT","start_last_seq":7021,"end_last_seq":7710,"recorded_seq":7710,"missing_checked":0,"missing_found":689,"docs_read":689,"docs_written":689,"doc_write_failures":0} >> {"session_id":"e72b655988ecc26b85b412fcaf05018a","start_time":"Wed, 23 >> Sep 2009 18:05:47 GMT","end_time":"Wed, 23 Sep 2009 18:05:52 >> >> GMT","start_last_seq":5792,"end_last_seq":7021,"recorded_seq":7021,"missing_checked":0,"missing_found":1229,"docs_read":1229,"docs_written":1229,"doc_write_failures":0} >> {"session_id":"8fd5d827721e70a28735ad4c3a291c3f","start_time":"Wed, 23 >> Sep 2009 18:05:30 GMT","end_time":"Wed, 23 Sep 2009 18:05:35 >> >> GMT","start_last_seq":4875,"end_last_seq":5792,"recorded_seq":5792,"missing_checked":0,"missing_found":917,"docs_read":917,"docs_written":917,"doc_write_failures":0} >> {"session_id":"187faed013cb2b63b714aab7845e3f56","start_time":"Wed, 23 >> Sep 2009 18:05:02 GMT","end_time":"Wed, 23 Sep 2009 18:05:07 >> >> GMT","start_last_seq":4539,"end_last_seq":4875,"recorded_seq":4875,"missing_checked":0,"missing_found":336,"docs_read":336,"docs_written":336,"doc_write_failures":0} >> {"session_id":"e30ee09b3da0dd979d655382bc3dadc8","start_time":"Wed, 23 >> Sep 2009 18:04:23 GMT","end_time":"Wed, 23 Sep 2009 18:04:34 >> >> GMT","start_last_seq":1590,"end_last_seq":4539,"recorded_seq":4539,"missing_checked":0,"missing_found":2949,"docs_read":2949,"docs_written":2949,"doc_write_failures":0} >> {"session_id":"3486a3b8d8a1e5eee05b82dcf4c66153","start_time":"Wed, 23 >> Sep 2009 18:02:17 GMT","end_time":"Wed, 23 Sep 2009 18:02:22 >> >> GMT","start_last_seq":0,"end_last_seq":1590,"recorded_seq":1590,"missing_checked":0,"missing_found":1590,"docs_read":1590,"docs_written":1590,"doc_write_failures":0} >> >> [Wed, 23 Sep 2009 18:04:28 GMT] [info] [<0.1959.0>] recording a >> checkpoint at source update_seq 3632 >> >> [Wed, 23 Sep 2009 18:04:34 GMT] [info] [<0.1959.0>] recording a >> checkpoint at source update_seq 4539 >> >> [Wed, 23 Sep 2009 18:04:41 GMT] [info] [<0.1941.0>] 127.0.0.1 - - >> 'POST' /_replicate 200 >> >> Wed, 23 Sep 2009 18:05:02 GMT] [info] [<0.1941.0>] starting >> replication "9577548b0faafa46430af6d8b2898a47" at <0.4981.0> >> >> [Wed, 23 Sep 2009 18:05:07 GMT] [info] [<0.4981.0>] recording a >> checkpoint at source update_seq 4875 >> >> [Wed, 23 Sep 2009 18:05:17 GMT] [info] [<0.1941.0>] 127.0.0.1 - - >> 'POST' /_replicate 200 >> >> [Wed, 23 Sep 2009 18:05:30 GMT] [info] [<0.1941.0>] starting >> replication "9577548b0faafa46430af6d8b2898a47" at <0.5376.0> >> >> [Wed, 23 Sep 2009 18:05:35 GMT] [info] [<0.5376.0>] recording a >> checkpoint at source update_seq 5792 >> >> Wed, 23 Sep 2009 18:05:43 GMT] [info] [<0.1941.0>] 127.0.0.1 - - >> 'POST' /_replicate 200 >> >> [Wed, 23 Sep 2009 18:05:47 GMT] [info] [<0.1941.0>] starting >> replication "9577548b0faafa46430af6d8b2898a47" at <0.6322.0> >> >> [Wed, 23 Sep 2009 18:05:52 GMT] [info] [<0.6322.0>] recording a >> checkpoint at source update_seq 7021 >> >> [Wed, 23 Sep 2009 18:05:59 GMT] [info] [<0.1941.0>] 127.0.0.1 - - >> 'POST' /_replicate 200 >> >> [Wed, 23 Sep 2009 18:06:32 GMT] [info] [<0.1945.0>] starting >> replication "9577548b0faafa46430af6d8b2898a47" at <0.7609.0> >> >> [Wed, 23 Sep 2009 18:06:37 GMT] [info] [<0.7609.0>] recording a >> checkpoint at source update_seq 7710 >> >> Wed, 23 Sep 2009 18:06:41 GMT] [info] [<0.1945.0>] 127.0.0.1 - - >> 'POST' /_replicate 200 >> >> [Wed, 23 Sep 2009 18:07:12 GMT] [info] [<0.7608.0>] starting >> replication "9577548b0faafa46430af6d8b2898a47" at <0.8369.0> >> >> [Wed, 23 Sep 2009 18:07:17 GMT] [info] [<0.8369.0>] recording a >> checkpoint at source update_seq 8663 >> >> [Wed, 23 Sep 2009 18:07:20 GMT] [info] [<0.7608.0>] 127.0.0.1 - - >> 'POST' /_replicate 200 >> >> [Wed, 23 Sep 2009 18:07:23 GMT] [info] [<0.7608.0>] 127.0.0.1 - - >> 'GET' /_utils/image/delete-mini.png 304 >> >> [Wed, 23 Sep 2009 18:07:33 GMT] [info] [<0.7608.0>] starting >> replication "9577548b0faafa46430af6d8b2898a47" at <0.9376.0> >> >> [Wed, 23 Sep 2009 18:07:38 GMT] [info] [<0.9376.0>] recording a >> checkpoint at source update_seq 10821 >> >> [Wed, 23 Sep 2009 18:07:44 GMT] [info] [<0.9376.0>] recording a >> checkpoint at source update_seq 13507 >> >> [Wed, 23 Sep 2009 18:07:50 GMT] [info] [<0.9376.0>] recording a >> checkpoint at source update_seq 16222 >> >> [Wed, 23 Sep 2009 18:07:53 GMT] [info] [<0.9376.0>] recording a >> checkpoint at source update_seq 17619 >> > > Hmm, I must admit I'm stumped so far. Are you by any chance building from > SVN repeatedly and installing into the same prefix? Please feel free to > file a ticket in JIRA[1] so we don't forget about this. You might try again > with the log level on the target set to debug, although I'm not certain it > will tell us anything. I'll see if I can find a way to reproduce this. > Best, > > Adam > > [1]: https://issues.apache.org/jira/browse/COUCHDB > >
