Thanks, very helpfull! I'll try it.
2013/9/24 Jason Smith <[email protected]> > On Tue, Sep 24, 2013 at 10:18 PM, Alexey Elfman <[email protected]> wrote: > > Hello, > > > > I'm using CouchDB for our company's billing platform. > > > > We have 4 dedicated servers (32-64 GB of ram, 3-8 TB of disks with ssh > > cache) in the same datacenter. > > All servers serve same set of databases (about 40 databases per machine) > > with all-to-all replications via _replicator database. > > > > Databases are different - from several documents to several hundreds > > million documents. 2 databases are 500GB+. Documents are simple, without > > complex structure and almost none attaches. > > > > We have application to maintain all of this replications and thats what > for: > > We are expecting usual unpredictable failures of replications. > > For example, document in _replicator database can have status = > > "triggered", but there are none tasks with such data at that moment at > > server. > > Or even document without "source" field for a few minutes every day at > > every server. > > > > Replications crached every hours due unclear errors like "source database > > is out of sync, please encrease max_dbs_open". max_dbs_open is 800 at > every > > server and databases are less than 50. So even if 50 database multiply > to 3 > > replications is less than limit. > > > > Creating documents in _replicator database is hard too. Example: > > > > > > # first, deleting old one > > [Fri, 20 Sep 2013 15:41:21 GMT] [info] [<0.24052.0>] 83.240.73.210 - - > > DELETE > /_replicator/example.com_db?rev=10-89450b554d11bf9a6d7e15a136ae663f > > 200 > > > > # deleted > > [Fri, 20 Sep 2013 15:41:24 GMT] [info] [<0.22050.0>] 83.240.73.210 - - > GET > > /_replicator/example.com_db?revs_info=true 404 > > > > # creating new one with same id > > [Fri, 20 Sep 2013 15:41:45 GMT] [info] [<0.25844.0>] 176.9.143.85 - - > HEAD > > /_replicator/example.com_db 404 > > # seams created > > [Fri, 20 Sep 2013 15:41:45 GMT] [info] [<0.25845.0>] 176.9.143.85 - - PUT > > /_replicator/example.com_db 201 > > > > # where is it?.. > > [Fri, 20 Sep 2013 15:41:51 GMT] [info] [<0.22050.0>] 83.240.73.210 - - > GET > > /_replicator/example.com_db?revs_info=true 404 > > I have seen this too. The only thing I can guess is that if you make a > document but it is identical to something that was already deleted, > then it remains deleted. > > Imagine a create, an update, and a delete: > > doc@1 -> doc@2 -> doc@3 (_deleted) > > Now suppose I create doc@1 again, identical to the first: every > key/val is the same as before, so therefore _rev is identical since > _rev is just a checksum of all the key/vals. > > doc@1 -> [CouchDB helpfully says "oh that has already been deleted, so > "fast forward"] -> doc@3 (still _deleted) > > When you replicate doc, this is what you want (old revisions from the > source do not magically come back to life on the target). > > The workaround I have found is to force a unique _rev every time. For > me, I just added "created_at":"2013-09-24T15:28:12" in my replication > docs. You could also use a UUID. > > Happily, this will not change the replication ID. The timestamp value > is ignored. (Although maybe I could use it later as an audit or > something.) > > Side note, since you seem to be serious about replicating. If you *do* > want to change the replication ID (force a complete restart) then you > must change either the (a) source, (b) target, (c) filter, or (d) > query_params. > > Usually you cannot change (a), (b), or (c). So once again you can drop > a timestamp or UUID into query_params. HOWEVER, query_params only > affects the replication ID if you ALSO have a filter option. > > So in other words: you need a no-op filter just so that you can add > no_op query params to force a new replication ID. > > function(doc, req) { > // A no-op filter; req.query.created_at is present but I don't care. > return true > } > > However, once again, since you are serious about replicating and you > already took the trouble to write a filter function, you may as well > log stuff to help you troubleshoot later. > > function(doc, req) { > // A logging filter; req.query comes from the document .query_params > object. > // In my own code, I put .source and .target in my .query_params > object so I can log it. > var id_and_rev = doc._id + "@" + doc._rev > var source = req.query.source || '(unknown source)' > var target = req.query.target || '(unknown target)' > var dir = source + " -> " + target > > log('Replicate ' + dir + ': ' + id_and_rev) > return true > } > > > > > > > # next try, creating > > [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.27720.0>] 176.9.143.85 - - > HEAD > > /_replicator/example.com_db 404 > > # and now it created > > [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.27730.0>] 176.9.143.85 - - PUT > > /_replicator/example.com_db 201 > > > > # because replication starts successfully > > Yeah, no idea there. Once I did my created_at trick I had worked > around this problem for myself and I moved on to other problems. > -- ---------------- Best regards Alexey Elfman mailto:[email protected]
