2010/11/4 cdr53x <[email protected]>: > On 10/30/2010 03:52 PM, Anand Chitipothu wrote: >> >> I'm trying to setup a couchdb database with 14M documents. The view >> generation is taking too long. It is running at the rate of 22 >> docs/sec right now. At this rate it will take 7days to build the view, >> which is too slow and I expect the speed to go down further as the >> view file size increase. >> >> > > Hi , > > What is the size of the design document files on the drive ? > > I noticed that large views use quite large file ;). > > I also noticed that the view group indexers take a large amount time to > achieve the last 30% of the task. At least twice then to complete the first > 70%. > > In my case I have a 'small' database containing 400K docs. I also hava a > design doc that indexes 80% of the docs with 8 views. Map functions only > emit a single property per doc and a null value, so they should be compact. > > The overall size of this desing doc .view file on disk is 17G ;). > > I don't know how couchdb handles the update of such large files but maybe > there is something with updating large files ... > > Concerning the performance, I use std javascript as interpreter and get a > rate of ~60 changes/sec in the beginning of the process. > > Then it drops to 15c/s after 70%. > > I'm about 6c/s, then after 85% > > The first 70% took 52minutes and the whole process runned for 3h21m on a > small stand alone dedicated server. > > So I get the feeling that it is not an issue with the view "calculation" > algo, but probably something that is related to the disk i/o. > > I have no erlang knowlege, and I might be quite wrong about the feeling, but > if you guys know a little bit on this part of couch code maybe there is > something that could be checked and would improve the overall design doc > refresh performance ?
Yes, it is due is IO. In my case it started with a speed of 200 docs/sec and it dropped to almost 3docs/sec and the view file size was about 60GB after processing something around 6-7M docs. I noticed that the IO wait has increased to about 15 and the the beam.smp and couchjs together weren't taking even 50% of one core. I tried running compaction and looked like the size of the view will be reduced to 1/6 after compaction, but it was still not progressing well because if IO wait. Having an SSD might have helped, but I don't have one. So I thought it might be faster to run compaction after loading and waiting for view generation to complete. Tried it and still it looked like it not going to finish in one week. Even compaction is very very slow. I decided to generate the view by feeding the data directly to the map function and it took about an hour to generate the view for entire 14M docs. I sorted it, ran reduce and saved the results in another couchdb database. That was quite faster. I could finish the whole process in less than 10 hours. The downside is that I need to take the pain of making sure the view is up-to-date with the original database. I think that is the good compromise. Anand
