There's a multipart API which allows for a single PUT request containing the document body as JSON and all its attachments in their raw form. Documentation is pretty thin at the moment, and unfortunately I think it doesn't quite allow for a pipe(). Would be really nice if it did, though.
On Wednesday, September 14, 2011 at 1:16 PM, Mikeal Rogers wrote: > npm is mostly attachments and I haven't seen any issues so far. > > I wish there was a better way to replicate attachments atomically for a > single revision but if there is, I don't know about it. > > It's probably a huge JSON operation and it sucks, but I don't have to parse > it in node.js, I just pipe() the body right along. > > -Mikeal > > On Sep 14, 2011, at September 14, 20118:42 AM, Adam Kocoloski wrote: > > > Hi Mikeal, I just took a quick peek at your code. It looks like you handle > > attachments by inlining all of them into the JSON representation of the > > document. Does that ever cause problems when dealing with the ~100 MB > > attachments in the npm repo? > > > > I've certainly seen my fair share of problems with attachment replication > > in CouchDB 1.0.x. I have a sneaking suspicion that there are latent bugs > > related to incorrect determinations of Content-Length under various > > compression scenarios. > > > > Adam > > > > On Tuesday, September 13, 2011 at 5:08 PM, Mikeal Rogers wrote: > > > > > My replicator is fairly young so I think calling it "reliable" might be a > > > little misleading. > > > > > > It does less, I don't ever attempt to cache the high watermark (last seq > > > written) and start over from there. If the process crashes just start > > > over from scratch. This can lead to a delay after restart but I find that > > > it's much simpler and more reliable on failure. > > > > > > It's also simpler because it doesn't have to content with being an http > > > client and a client of the internal couchdb erlang API. It just proxies > > > requests from one couch to another. > > > > > > While I'm sure there are bugs that I haven't found yet in it, I can say > > > that it replicates the npm repository quite well and I'm using it in > > > production. > > > > > > -Mikeal > > > > > > On Sep 13, 2011, at September 13, 201111:44 AM, Max Ogden wrote: > > > > > > > Hi Chris, > > > > > > > > From what I understand the current state of the replicator (as of 1.1) > > > > is > > > > that for certain types of collections of documents it can be somewhat > > > > fragile. In the case of the node.js package repository, > > > > http://npmjs.org, > > > > there are many relatively large (~100MB) documents that would sometimes > > > > throw errors or timeout during replication and crash the replicator, at > > > > which point the replicator would restart and attempt to pick up where it > > > > left off. I am not an expert in the internals of the replicator but > > > > apparently the cumulative time required for the replicator to repeatedly > > > > crash and then subsequently relocate itself in _changes feed in the > > > > case of > > > > replicating the node package manager was making the built in couch > > > > replicator unusable for the task. > > > > > > > > Two solutions exist that I know of. There is a new replicator in trunk > > > > (not > > > > to be confused with the _replicator db from 1.1 -- it is still using > > > > the old > > > > replicator algorithms) and there is also a more reliable replicator > > > > written > > > > in node.js https://github.com/mikeal/replicate that was was written > > > > specifically to replicate the node package repository between hosting > > > > providers. > > > > > > > > Additionally it may be useful if you could describe the 'fingerprint' of > > > > your documents a bit. How many documents are in the failing databases? > > > > are > > > > the documents large or small? do they have many attachments? how large > > > > is > > > > your _changes feed? > > > > > > > > Cheers, > > > > > > > > Max > > > > > > > > On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton > > > > <[email protected] (mailto:[email protected])>wrote: > > > > > > > > > Hello, > > > > > > > > > > We now have about 150 dbs that are refusing to replicate with random > > > > > crashes, which provide really zero debug information. The error is db > > > > > not found, but I know its available. Does anyone know how can I > > > > > trouble shoot this? Do we just have to many databases replicating for > > > > > couchdb to handle? 4000 is a small number for the massive hardware > > > > > these are running on. > > > > > > > > > > -Chris
