I'm in the process of writing an external program in Java that reconciles data 
in CouchDB to a source system. One of the basic parts is to determine what data 
needs to be removed from CouchDB. The good thing is that the Ids in CouchDB are 
the same as the Ids in the source system. However, some initial test seem that 
the process is very slow in determining what needs to be removed.

Basically, here are the steps that I'm using:
Get all ids from the sources system
Get all ids from CouchDB using _all_docs and paging with the fast paging 
approach (e.g. start_key_docid and limit)
Loop through the ids from Couch to see if they are not in the source id list
Using modify_docs to delete

The basic logic is using a NOT IN like in sql. However, I'm trying to determine 
if there is a faster way of doing this directly in CouchDB. For example, how 
might we use the MapReduce (View) capability to performing this delete. Or any 
other thoughts on syncing data in a fastest way possible with CouchDB>

Oh NOTE: we can not delete the whole db 1st as we have mobile clients that used 
the _changes and are bandwidth constraint.

Thanks,

Henri

Reply via email to