These problems appear to be due to the replicator crashing
with {error,{conn_failed,{error,emfile}}}, which apparently
means that I surpassed an open file limit.The replications were successful if I executed ulimit -Sn 4096 prior to launching CouchDB, in the same shell. I'm a bit surprised the replication can't recover after some files are closed; regular DB gets and puts still worked. On Wed, 15 Mar 2017 19:43:27 -0400 "Christopher D. Malon" <[email protected]> wrote: > Those both return > > {"error":"not_found","reason":"missing"} > > In the latest example, I have a database where the source has > doc_count 226, the target gets doc_count 222, and the task reports > > docs_read: 230 > docs_written: 230 > missing_revisions_found: 230 > revisions_checked: 231 > > but the missing documents don't show up as deleted. > > > On Wed, 15 Mar 2017 23:13:57 +0000 > Robert Samuel Newson <[email protected]> wrote: > > > Hi, > > > > the presence of; > > > > >>> docs_read: 12 > > >>> docs_written: 12 > > > > Is what struck me here. the replicator claims to have replicated 12 docs, > > which is your expectation and mine, and yet you say they don't appear in > > the target. > > > > Do you know the doc ids of these missing documents? if so, try GET > > /dbname/docid?deleted=true and GET /dbname/docid?open_revs=all > > > > B. > > > > > On 15 Mar 2017, at 18:45, Christopher D. Malon <[email protected]> > > > wrote: > > > > > > Could you explain the meaning of source_seq, checkpointed_source_seq, > > > and through_seq in more detail? This problem has happened several times, > > > with slightly different statuses in _active_tasks, and slightly different > > > numbers of documents succesfully copied. On the most recent attempt, > > > checkpointed_source_seq and through_seq are 61-* (matching the source's > > > update_seq), but source_seq is 0, and just 9 of the 12 documents are > > > copied. > > > > > > When a replication task is in _replicator but is not listed in > > > _active_tasks > > > within two minutes, a script of mine deletes the job from _replicator > > > and re-submits it. In Couch DB 1.6, this seemed to resolve some kinds > > > of stalled replications. Now I wonder if the replication is not resuming > > > properly after the deletion and resubmission. > > > > > > Christopher > > > > > > > > > On Fri, 10 Mar 2017 06:40:49 +0000 > > > Robert Newson <[email protected]> wrote: > > > > > >> Were the six missing documents newer on the target? That is, did you > > >> delete them on the target and expect another replication to restore them? > > >> > > >> Sent from my iPhone > > >> > > >>> On 9 Mar 2017, at 22:08, Christopher D. Malon <[email protected]> > > >>> wrote: > > >>> > > >>> I replicated a database (continuously), but ended up with fewer > > >>> documents in the target than in the source. Even if I wait, > > >>> the remaining documents don't appear. > > >>> > > >>> 1. Here's the DB entry on the source machine, showing 12 documents: > > >>> > > >>> {"db_name":"library","update_seq":"61-g1AAAAFTeJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8rkQGPoiQFIJlkD1bHjE-dA0hdPFgdIz51CSB19WB1BnjU5bEASYYGIAVUOh-_mRC1CyBq9-P3D0TtAYja-1mJbATVPoCoBbqXKQsA-0Fvaw","sizes":{"file":181716,"external":11524,"active":60098},"purge_seq":0,"other":{"data_size":11524},"doc_del_count":0,"doc_count":12,"disk_size":181716,"disk_format_version":6,"data_size":60098,"compact_running":false,"instance_start_time":"0"} > > >>> > > >>> 2. Here's the DB entry on the target machine, showing 6 documents: > > >>> > > >>> {"db_name":"library","update_seq":"6-g1AAAAFTeJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8rkQGPoiQFIJlkD1bHhE-dA0hdPFgdIz51CSB19QTV5bEASYYGIAVUOh-_GyFqF0DU7idG7QGI2vvEqH0AUQvyfxYA1_dvNA","sizes":{"file":82337,"external":2282,"active":5874},"purge_seq":0,"other":{"data_size":2282},"doc_del_count":0,"doc_count":6,"disk_size":82337,"disk_format_version":6,"data_size":5874,"compact_running":false,"instance_start_time":"0"} > > >>> > > >>> 3. Here's _active_tasks for the task, converted to YAML for readability: > > >>> > > >>> - changes_pending: 0 > > >>> checkpoint_interval: 30000 > > >>> checkpointed_source_seq: > > >>> 61-g1AAAAJTeJyd0EsOgjAQBuAqxsfSE-gRKK08VnIT7UwhSBAWyl > > >>> pvojfRm-hNsLQkbAgRNtOkk__L5M8IIcvEkmSNRYmJhDArUGRJcblmajUVBDZVVaWJJchZfSwAucPQkW > > >>> RV5jKKT3kke-KwVRP2jWBpgdMAwcOuTJ8U1tKhkSZaYhS5x2GodKylWyPZWnJ9QW3KBkr5TE1yV4_CHu > > >>> 1dMeyQ-c4o7Wm0V9u4F9setaM_GzfK2yifWplrxYeAcuGOuulrNN3X1PTFgXPqd-XSHxdwuSQ > > >>> continuous: !!perl/scalar:JSON::PP::Boolean 1 > > >>> database: shards/00000000-1fffffff/_replicator.1489086006 > > >>> doc_id: 172.16.100.222_library > > >>> doc_write_failures: 0 > > >>> docs_read: 12 > > >>> docs_written: 12 > > >>> missing_revisions_found: 12 > > >>> node: couchdb@localhost > > >>> pid: <0.5521.0> > > >>> replication_id: > > >>> c60427215125bd97559d069f6fb3ddb4+continuous+create_target > > >>> revisions_checked: 12 > > >>> source: http://172.16.100.222:5984/library/ > > >>> source_seq: > > >>> 61-g1AAAAJTeJyd0EsOgjAQBuAqxsfSE-gRKK08VnIT7UwhSBAWylpvojfRm-hNsLQkbAgRNtOkk__L5M8IIcvEkmSNRYmJhDArUGRJcblmajUVBDZVVaWJJchZfSwAucPQkWRV5jKKT3kke-KwVRP2jWBpgdMAwcOuTJ8U1tKhkSZaYhS5x2GodKylWyPZWnJ9QW3KBkr5TE1yV4_CHu1dMeyQ-c4o7Wm0V9u4F9setaM_GzfK2yifWplrxYeAcuGOuulrNN3X1PTFgXPqd-XSHxdwuSQ > > >>> started_on: 1489086008 > > >>> target: http://localhost:5984/library/ > > >>> through_seq: > > >>> 61-g1AAAAJTeJyd0EsOgjAQBuAqxsfSE-gRKK08VnIT7UwhSBAWylpvojfRm-hNsLQkbAgRNtOkk__L5M8IIcvEkmSNRYmJhDArUGRJcblmajUVBDZVVaWJJchZfSwAucPQkWRV5jKKT3kke-KwVRP2jWBpgdMAwcOuTJ8U1tKhkSZaYhS5x2GodKylWyPZWnJ9QW3KBkr5TE1yV4_CHu1dMeyQ-c4o7Wm0V9u4F9setaM_GzfK2yifWplrxYeAcuGOuulrNN3X1PTFgXPqd-XSHxdwuSQ > > >>> type: replication > > >>> updated_on: 1489096815 > > >>> user: peer > > >>> > > >>> 4. Here's the _replicator record for the task: > > >>> > > >>> {"_id":"172.16.100.222_library","_rev":"2-8e6cf63bc167c7c7e4bd38242218572c","schema":1,"storejson":null,"source":"http://172.16.100.222:5984/library","target":"http://localhost:5984/library","create_target":true,"dont_storejson":1,"wholejson":{},"user_ctx":{"roles":["_admin"],"name":"peer"},"continuous":true,"owner":null,"_replication_state":"triggered","_replication_state_time":"2017-03-09T19:00:08+00:00","_replication_id":"c60427215125bd97559d069f6fb3ddb4"} > > >>> > > >>> There should have been no conflicting transactions on the target host. > > >>> The appearance of "61-*" in through_seq of the _active_tasks entry > > >>> gives me a false sense of security; I only noticed the missing documents > > >>> by chance. > > >>> > > >>> A fresh replication to a different target succeeded without any > > >>> missing documents. > > >>> > > >>> Is there anything here that would tip me off that the target wasn't > > >>> in sync with the source? Is there a good way to resolve the condition? > > >>> > > >>> Thanks, > > >>> Christopher > > >> > >
