I bet you could go faster too, but that's a huge improvement, congrats!
On 6 March 2013 08:21, Daniel Gonzalez <gonva...@gonvaled.com> wrote: > I couldn't resist and I have moved to a bulk read / modify / bulk write > approach and the situation has dramatically improved: I am running now at > over 100 docs/s compared to a 4 docs/s with the update handler. > > On Wed, Mar 6, 2013 at 2:28 PM, Daniel Gonzalez <gonva...@gonvaled.com>wrote: > >> Thanks Robert, that explains it. >> >> I was indeed under the impression that update handlers are faster than >> re-creation of documents. Seeing couchdb as a black-box, that is what you >> would expect, since the update handler requires less information transfer, >> and is largely performed inside couchdb itself (with eventually some data >> coming with the http request). >> >> I understand now that the implementation details of the update handler >> make it slower (in the general case) than re-creation of documents, but >> since this is not plainly obvious, I think it should be mentioned in the >> documentation about update handlers. >> >> Actually, my first approach to solve the problem was to do exactly that >> (bulk read / modify / bulk write), but I discarded it because I had thought >> that an update handler would be *faster*. Then I implemented my solution, >> and was surprised about the slowness of it. Hence my mail. >> >> Now my database update is halfway through, and I will let it run until >> completion. For the next time, I hope to remember about this discussion. >> >> Thanks, >> Daniel >> >> On Wed, Mar 6, 2013 at 2:17 PM, Robert Newson <rnew...@apache.org> wrote: >> >>> Update handlers are very slow in comparison to a straight POST or PUT >>> as they have to invoke some Javascript on the server. This is, by some >>> margin, the slowest way to achieve your goal. >>> >>> The mistake here, though, is thinking that an update handler is the >>> right way to update every document in your system. Update handlers >>> exist to add a little server-side logic in cases where it's impossible >>> or awkward to do so in the client (i.e, when the client is not a >>> browser). GIven their intrinsic slowness, I'd avoid them where I >>> could. >>> >>> The fastest way to update documents is to use the bulk document API. >>> Ideally you want fetch a batch of docs that need updating in one call, >>> transform them using any scripting language or tool, and then update >>> the batch by posting it to _bulk_docs. These methods are described in >>> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API. Some >>> experimentation will be required to find a good batch size; too small >>> and this will take longer than it could, too high and the server can >>> crash by running out of memory. Unless your documents are very large, >>> or very small, I'd start with a couple of hundred docs and then tweak >>> up and down. Since this sounds like a one-off, you might even skip >>> this optimization phase, the difference between doing singular PUT's >>> through an update handler and doing 200 documents through _bulk_docs >>> will be so huge that you might not need it to go any faster. >>> >>> There was a recent thread to add this as a CouchDB feature. If we did, >>> it would work much the same as above. I'm wary, though, as it would >>> encourage the rewrite-all-the-documents approach. That should be quite >>> a rare event since a schema-less document-oriented approach should >>> largely relieve you of the pain of changing document contents. In this >>> thread's case, the inconsistent use of a particular field, a one-time >>> fix-up makes sense (assuming that new updates are consistent). >>> >>> B. >>> >>> >>> On 6 March 2013 06:13, Anthony Ananich <anton.anan...@inpun.com> wrote: >>> > And how much does it take to add document by HTTP PUT? >>> > >>> > On Wed, Mar 6, 2013 at 2:33 PM, svilen <a...@svilendobrev.com> wrote: >>> >> +1. i'd like to know also about update_handlers as i may get into such >>> >> situation soon. >>> >> >>> >> not an answer: >>> >> if you sure your transformation is correct, my lame take would be: >>> >> don't do anything. >>> >> 4doc/s, 12000/hour - so by tomorrow it would be done. >>> >> >>> >> of course, no harm to find/learn - e.g. u may need to rerun it again.. >>> >> >>> >> ciao >>> >> svilen >>> >> >>> >> On Wed, 6 Mar 2013 12:06:41 +0100 >>> >> Daniel Gonzalez <gonva...@gonvaled.com> wrote: >>> >> >>> >>> Hi, >>> >>> >>> >>> We have a problem in our data: we have been inconsistent in one of our >>> >>> fields, and we have named it in different ways. Besides, in some >>> >>> places we have used int, in other places string. I have created an >>> >>> update handler to correct this situation, and I am running it for our >>> >>> 100 thousand documents database, by doing PUT requests, as explained >>> >>> http://wiki.apache.org/couchdb/Document_Update_Handlers >>> >>> >>> >>> What I am doing is: >>> >>> >>> >>> 1. get affected documents with a view >>> >>> 2. call the update handler. >>> >>> >>> >>> And this is running over an ssh tunnel. >>> >>> >>> >>> My problem is that this is veeeery slow. Currently I am running at 4 >>> >>> docs/s. Is this normal? >>> >>> >>> >>> I could do this locally (no ssh tunnel), but I guess things would not >>> >>> improve much, since the data being transferred is not that big (no >>> >>> include_docs, and the view emits very litte information). I have the >>> >>> impression that the bottleneck is couchdb itself: the update handler >>> >>> is just that slow. >>> >>> >>> >>> Am I right about this? Is there a way to speed this up? >>> >>> >>> >>> Thanks, >>> >>> Daniel >>> >> >>