HAHA! I already forgot that we did this. -Mikeal
On Sep 14, 2011, at September 14, 201112:51 PM, Randall Leeds wrote: > On Wed, Sep 14, 2011 at 12:19, Adam Kocoloski <[email protected]>wrote: > >> There's a multipart API which allows for a single PUT request containing >> the document body as JSON and all its attachments in their raw form. >> Documentation is pretty thin at the moment, and unfortunately I think it >> doesn't quite allow for a pipe(). Would be really nice if it did, though. >> > > It does. We figured it out together a couple weeks ago and that's when this > code came into being. > Requesting a _specific_ revision with ?revs=true will give you a > multipart/related response suitable for passing straight into a > ?new_edits=false&rev= PUT. > See https://github.com/mikeal/replicate/blob/master/main.js#L49 > > >> >> On Wednesday, September 14, 2011 at 1:16 PM, Mikeal Rogers wrote: >> >>> npm is mostly attachments and I haven't seen any issues so far. >>> >>> I wish there was a better way to replicate attachments atomically for a >> single revision but if there is, I don't know about it. >>> >>> It's probably a huge JSON operation and it sucks, but I don't have to >> parse it in node.js, I just pipe() the body right along. >>> >>> -Mikeal >>> >>> On Sep 14, 2011, at September 14, 20118:42 AM, Adam Kocoloski wrote: >>> >>>> Hi Mikeal, I just took a quick peek at your code. It looks like you >> handle attachments by inlining all of them into the JSON representation of >> the document. Does that ever cause problems when dealing with the ~100 MB >> attachments in the npm repo? >>>> >>>> I've certainly seen my fair share of problems with attachment >> replication in CouchDB 1.0.x. I have a sneaking suspicion that there are >> latent bugs related to incorrect determinations of Content-Length under >> various compression scenarios. >>>> >>>> Adam >>>> >>>> On Tuesday, September 13, 2011 at 5:08 PM, Mikeal Rogers wrote: >>>> >>>>> My replicator is fairly young so I think calling it "reliable" might >> be a little misleading. >>>>> >>>>> It does less, I don't ever attempt to cache the high watermark (last >> seq written) and start over from there. If the process crashes just start >> over from scratch. This can lead to a delay after restart but I find that >> it's much simpler and more reliable on failure. >>>>> >>>>> It's also simpler because it doesn't have to content with being an >> http client and a client of the internal couchdb erlang API. It just proxies >> requests from one couch to another. >>>>> >>>>> While I'm sure there are bugs that I haven't found yet in it, I can >> say that it replicates the npm repository quite well and I'm using it in >> production. >>>>> >>>>> -Mikeal >>>>> >>>>> On Sep 13, 2011, at September 13, 201111:44 AM, Max Ogden wrote: >>>>> >>>>>> Hi Chris, >>>>>> >>>>>> From what I understand the current state of the replicator (as of >> 1.1) is >>>>>> that for certain types of collections of documents it can be >> somewhat >>>>>> fragile. In the case of the node.js package repository, >> http://npmjs.org, >>>>>> there are many relatively large (~100MB) documents that would >> sometimes >>>>>> throw errors or timeout during replication and crash the >> replicator, at >>>>>> which point the replicator would restart and attempt to pick up >> where it >>>>>> left off. I am not an expert in the internals of the replicator but >>>>>> apparently the cumulative time required for the replicator to >> repeatedly >>>>>> crash and then subsequently relocate itself in _changes feed in the >> case of >>>>>> replicating the node package manager was making the built in couch >>>>>> replicator unusable for the task. >>>>>> >>>>>> Two solutions exist that I know of. There is a new replicator in >> trunk (not >>>>>> to be confused with the _replicator db from 1.1 -- it is still >> using the old >>>>>> replicator algorithms) and there is also a more reliable replicator >> written >>>>>> in node.js https://github.com/mikeal/replicate that was was >> written >>>>>> specifically to replicate the node package repository between >> hosting >>>>>> providers. >>>>>> >>>>>> Additionally it may be useful if you could describe the >> 'fingerprint' of >>>>>> your documents a bit. How many documents are in the failing >> databases? are >>>>>> the documents large or small? do they have many attachments? how >> large is >>>>>> your _changes feed? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Max >>>>>> >>>>>> On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton >>>>>> <[email protected] (mailto:[email protected] >> )>wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> We now have about 150 dbs that are refusing to replicate with >> random >>>>>>> crashes, which provide really zero debug information. The error >> is db >>>>>>> not found, but I know its available. Does anyone know how can I >>>>>>> trouble shoot this? Do we just have to many databases replicating >> for >>>>>>> couchdb to handle? 4000 is a small number for the massive >> hardware >>>>>>> these are running on. >>>>>>> >>>>>>> -Chris >> >> >>
