sorry for late reply. That's very curious. Can you file a JIRA for this? If the replicator says it replicated to the target, that should always be true. I can't immediately think why emfile would wreck that (I'd expect the writes to either fail or succeed and for the replicator to agree).
B. > On 21 Mar 2017, at 16:26, Christopher D. Malon <[email protected]> wrote: > > These problems appear to be due to the replicator crashing > with {error,{conn_failed,{error,emfile}}}, which apparently > means that I surpassed an open file limit. > > The replications were successful if I executed > > ulimit -Sn 4096 > > prior to launching CouchDB, in the same shell. > > I'm a bit surprised the replication can't recover after some > files are closed; regular DB gets and puts still worked. > > > On Wed, 15 Mar 2017 19:43:27 -0400 > "Christopher D. Malon" <[email protected]> wrote: > >> Those both return >> >> {"error":"not_found","reason":"missing"} >> >> In the latest example, I have a database where the source has >> doc_count 226, the target gets doc_count 222, and the task reports >> >> docs_read: 230 >> docs_written: 230 >> missing_revisions_found: 230 >> revisions_checked: 231 >> >> but the missing documents don't show up as deleted. >> >> >> On Wed, 15 Mar 2017 23:13:57 +0000 >> Robert Samuel Newson <[email protected]> wrote: >> >>> Hi, >>> >>> the presence of; >>> >>>>>> docs_read: 12 >>>>>> docs_written: 12 >>> >>> Is what struck me here. the replicator claims to have replicated 12 docs, >>> which is your expectation and mine, and yet you say they don't appear in >>> the target. >>> >>> Do you know the doc ids of these missing documents? if so, try GET >>> /dbname/docid?deleted=true and GET /dbname/docid?open_revs=all >>> >>> B. >>> >>>> On 15 Mar 2017, at 18:45, Christopher D. Malon <[email protected]> wrote: >>>> >>>> Could you explain the meaning of source_seq, checkpointed_source_seq, >>>> and through_seq in more detail? This problem has happened several times, >>>> with slightly different statuses in _active_tasks, and slightly different >>>> numbers of documents succesfully copied. On the most recent attempt, >>>> checkpointed_source_seq and through_seq are 61-* (matching the source's >>>> update_seq), but source_seq is 0, and just 9 of the 12 documents are >>>> copied. >>>> >>>> When a replication task is in _replicator but is not listed in >>>> _active_tasks >>>> within two minutes, a script of mine deletes the job from _replicator >>>> and re-submits it. In Couch DB 1.6, this seemed to resolve some kinds >>>> of stalled replications. Now I wonder if the replication is not resuming >>>> properly after the deletion and resubmission. >>>> >>>> Christopher >>>> >>>> >>>> On Fri, 10 Mar 2017 06:40:49 +0000 >>>> Robert Newson <[email protected]> wrote: >>>> >>>>> Were the six missing documents newer on the target? That is, did you >>>>> delete them on the target and expect another replication to restore them? >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On 9 Mar 2017, at 22:08, Christopher D. Malon <[email protected]> >>>>>> wrote: >>>>>> >>>>>> I replicated a database (continuously), but ended up with fewer >>>>>> documents in the target than in the source. Even if I wait, >>>>>> the remaining documents don't appear. >>>>>> >>>>>> 1. Here's the DB entry on the source machine, showing 12 documents: >>>>>> >>>>>> {"db_name":"library","update_seq":"61-g1AAAAFTeJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8rkQGPoiQFIJlkD1bHjE-dA0hdPFgdIz51CSB19WB1BnjU5bEASYYGIAVUOh-_mRC1CyBq9-P3D0TtAYja-1mJbATVPoCoBbqXKQsA-0Fvaw","sizes":{"file":181716,"external":11524,"active":60098},"purge_seq":0,"other":{"data_size":11524},"doc_del_count":0,"doc_count":12,"disk_size":181716,"disk_format_version":6,"data_size":60098,"compact_running":false,"instance_start_time":"0"} >>>>>> >>>>>> 2. Here's the DB entry on the target machine, showing 6 documents: >>>>>> >>>>>> {"db_name":"library","update_seq":"6-g1AAAAFTeJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8rkQGPoiQFIJlkD1bHhE-dA0hdPFgdIz51CSB19QTV5bEASYYGIAVUOh-_GyFqF0DU7idG7QGI2vvEqH0AUQvyfxYA1_dvNA","sizes":{"file":82337,"external":2282,"active":5874},"purge_seq":0,"other":{"data_size":2282},"doc_del_count":0,"doc_count":6,"disk_size":82337,"disk_format_version":6,"data_size":5874,"compact_running":false,"instance_start_time":"0"} >>>>>> >>>>>> 3. Here's _active_tasks for the task, converted to YAML for readability: >>>>>> >>>>>> - changes_pending: 0 >>>>>> checkpoint_interval: 30000 >>>>>> checkpointed_source_seq: >>>>>> 61-g1AAAAJTeJyd0EsOgjAQBuAqxsfSE-gRKK08VnIT7UwhSBAWyl >>>>>> pvojfRm-hNsLQkbAgRNtOkk__L5M8IIcvEkmSNRYmJhDArUGRJcblmajUVBDZVVaWJJchZfSwAucPQkW >>>>>> RV5jKKT3kke-KwVRP2jWBpgdMAwcOuTJ8U1tKhkSZaYhS5x2GodKylWyPZWnJ9QW3KBkr5TE1yV4_CHu >>>>>> 1dMeyQ-c4o7Wm0V9u4F9setaM_GzfK2yifWplrxYeAcuGOuulrNN3X1PTFgXPqd-XSHxdwuSQ >>>>>> continuous: !!perl/scalar:JSON::PP::Boolean 1 >>>>>> database: shards/00000000-1fffffff/_replicator.1489086006 >>>>>> doc_id: 172.16.100.222_library >>>>>> doc_write_failures: 0 >>>>>> docs_read: 12 >>>>>> docs_written: 12 >>>>>> missing_revisions_found: 12 >>>>>> node: couchdb@localhost >>>>>> pid: <0.5521.0> >>>>>> replication_id: c60427215125bd97559d069f6fb3ddb4+continuous+create_target >>>>>> revisions_checked: 12 >>>>>> source: http://172.16.100.222:5984/library/ >>>>>> source_seq: >>>>>> 61-g1AAAAJTeJyd0EsOgjAQBuAqxsfSE-gRKK08VnIT7UwhSBAWylpvojfRm-hNsLQkbAgRNtOkk__L5M8IIcvEkmSNRYmJhDArUGRJcblmajUVBDZVVaWJJchZfSwAucPQkWRV5jKKT3kke-KwVRP2jWBpgdMAwcOuTJ8U1tKhkSZaYhS5x2GodKylWyPZWnJ9QW3KBkr5TE1yV4_CHu1dMeyQ-c4o7Wm0V9u4F9setaM_GzfK2yifWplrxYeAcuGOuulrNN3X1PTFgXPqd-XSHxdwuSQ >>>>>> started_on: 1489086008 >>>>>> target: http://localhost:5984/library/ >>>>>> through_seq: >>>>>> 61-g1AAAAJTeJyd0EsOgjAQBuAqxsfSE-gRKK08VnIT7UwhSBAWylpvojfRm-hNsLQkbAgRNtOkk__L5M8IIcvEkmSNRYmJhDArUGRJcblmajUVBDZVVaWJJchZfSwAucPQkWRV5jKKT3kke-KwVRP2jWBpgdMAwcOuTJ8U1tKhkSZaYhS5x2GodKylWyPZWnJ9QW3KBkr5TE1yV4_CHu1dMeyQ-c4o7Wm0V9u4F9setaM_GzfK2yifWplrxYeAcuGOuulrNN3X1PTFgXPqd-XSHxdwuSQ >>>>>> type: replication >>>>>> updated_on: 1489096815 >>>>>> user: peer >>>>>> >>>>>> 4. Here's the _replicator record for the task: >>>>>> >>>>>> {"_id":"172.16.100.222_library","_rev":"2-8e6cf63bc167c7c7e4bd38242218572c","schema":1,"storejson":null,"source":"http://172.16.100.222:5984/library","target":"http://localhost:5984/library","create_target":true,"dont_storejson":1,"wholejson":{},"user_ctx":{"roles":["_admin"],"name":"peer"},"continuous":true,"owner":null,"_replication_state":"triggered","_replication_state_time":"2017-03-09T19:00:08+00:00","_replication_id":"c60427215125bd97559d069f6fb3ddb4"} >>>>>> >>>>>> There should have been no conflicting transactions on the target host. >>>>>> The appearance of "61-*" in through_seq of the _active_tasks entry >>>>>> gives me a false sense of security; I only noticed the missing documents >>>>>> by chance. >>>>>> >>>>>> A fresh replication to a different target succeeded without any >>>>>> missing documents. >>>>>> >>>>>> Is there anything here that would tip me off that the target wasn't >>>>>> in sync with the source? Is there a good way to resolve the condition? >>>>>> >>>>>> Thanks, >>>>>> Christopher >>>>> >>>
