On Tue, Dec 21, 2010 at 10:39 AM, Adam Kocoloski <[email protected]> wrote: > On Dec 21, 2010, at 4:55 AM, Bob Clary wrote: > >> Large Initial View sizes: Several of my views are initially created with >> sizes which are 10-20 times the size of the compacted view. For example, I >> have one view which when initially created can take 95G but when compacted >> uses less than 5G. This has caused several out of disk space conditions when >> I've had to regenerate views for the database. I know commodity disks are >> relatively cheap these days, but due to my current hosting environment I am >> using relatively expensive networked storage. Asking for sufficient storage >> for my expected database size was difficult enough, but asking for 10 or >> more times that amount just to deal with temporary explosive view sizes is >> probably a non-starter. > > This one is being worked on in > https://issues.apache.org/jira/browse/COUCHDB-700 . Guaranteeing a minimum > batch size results in a smaller index file and also speeds up indexing in > many circumstances. > >> CouchDB 1.0.x: My experience with attempting to use the 1.0.x branch was a >> failure due to the crashing immediately upon view compaction completion >> which caused the views to begin indexing from scratch. > > I agree with Paul that the timeout dropping a ref counter at the end of view > compaction is a significant bug. I'm guessing it depends on the particular > deployment and size of the file being deleted. There have been multiple > attempts [1,2] to rewrite the reference counting system; one of those should > probably be merged for 1.2.0. We might be able to have some stopgap fix for > 1.0.x and 1.1.x. > > I also have to agree with Mike and Paul that BigCouch would help you a lot > here. Even if you use it in a single-node setup the ability to split a large > monolithic database into an arbitrary number of shards can help tremendously > when trying to build and compact indexes. Regards, >
I should've mentioned this in my earlier email as well, but I'll underscore the point that using BigCouch to shard your db on a single node would still help in splitting the unit of work for a single database. > Adam > > [1]: https://github.com/tilgovi/couchdb/tree/ets_ref_count > [2]: > https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_file.erl#L483
