On Mon, May 4, 2009 at 10:20 AM, Tom Nichols <[email protected]> wrote: > well, if I set "batch" to true, I all of my load scripts die after a > short amount of time with this error: > > /var/lib/gems/1.8/gems/couchrest-0.24/lib/couchrest/monkeypatches.rb:41:in > `rbuf_fill': uninitialized constant Timeout::TimeoutError (NameError) > from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil' > from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline' > from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line' > > Regardless, it still seems like there is a bottleneck on the server > end. Did I mention I'm running the 'load' scripts locally? So it's > not network latency that is causing the slowness. Any other ideas? >
You're probably best of using explicit bulk_docs saves with an array of documents. That way you know how much you are passing to CouchDB at a time. With smallish docs (less than a few kb) you can usually do around 1000 at a time to get the best insert performance. > Thanks. > -Tom > > > On Mon, May 4, 2009 at 12:19 PM, Zachary Zolton > <[email protected]> wrote: >> Yeah, the optional second argument —for usign bulk save semantics— >> defaults to false. >> >> Also, there's an option where you can set how many documents to batch >> save at a time. I don't remember the default, but I've had good luck >> saving with anywhere between 500 and 2000 docs. >> >> On Mon, May 4, 2009 at 11:13 AM, Tom Nichols <[email protected]> wrote: >>> Thanks. I'm using save_doc, I just need to pass 'true' as a second >>> argument? >>> >>> I posted the question here because I assumed the performance >>> bottleneck was on the CouchDB end, not my ruby script. Am I wrong? I >>> assumed if I was running 20 "slow" ruby scripts they would peg the >>> CPU. The fact that I'm not seeing that makes me think there is some >>> blocking/ synchronization that is making the CouchDB server slow....? >>> >>> Thanks again. >>> -Tom >>> >>> On Mon, May 4, 2009 at 11:58 AM, Zachary Zolton >>> <[email protected]> wrote: >>>> Short answer: use db.save_doc(hash, true) for bulk_docs behavior. >>>> >>>> Also, consider moving this thread to the CouchRest Google Group: >>>> http://groups.google.com/group/couchrest/topics >>>> >>>> Cheers, >>>> zdzolton >>>> >>>> On Mon, May 4, 2009 at 10:40 AM, Tom Nichols <[email protected]> wrote: >>>>> Hi, I have some questions about insert performance. >>>>> >>>>> I have a single CouchDB 0.9.0 node running on small EC2 instance. I >>>>> attached a huge EBS volume to it and mounted it where CouchDB's data >>>>> files are stored. I fired up about ruby scripts running inserts and >>>>> after a weekend I only have about 30GB/ 12M rows of data... Which >>>>> seems small. 'top' tells me that my CPU is only about 30% utilized. >>>>> >>>>> Any idea what I might be doing wrong? I pretty much just followed >>>>> these instructions: >>>>> http://wiki.apache.org/couchdb/Getting_started_with_Amazon_EC2 >>>>> >>>>> My ruby script looks like this: >>>>> #!/usr/bin/env ruby >>>>> #Script to load random data into CouchDB >>>>> >>>>> require 'rubygems' >>>>> require 'couchrest' >>>>> >>>>> db = CouchRest.database! "http://127.0.0.1:5984/#{ARGV[0]}" >>>>> puts "Created database: #{ARGV[0]}" >>>>> >>>>> max = 9999999999999999 >>>>> while 1 >>>>> puts 'loading...' >>>>> for val in 0..max >>>>> db.save_doc({ :key => val, 'val one' => "val ${val}", >>>>> 'val2' => "#{ARGV[1]} #{val}" }) >>>>> end >>>>> end >>>>> >>>>> >>>>> Thanks in advance... >>>>> >>>> >>> >> > -- Chris Anderson http://jchrisa.net http://couch.io
