On Tue, Dec 21, 2010 at 10:39 AM, Adam Kocoloski <[email protected]> wrote:
> On Dec 21, 2010, at 4:55 AM, Bob Clary wrote:
>
>> Large Initial View sizes: Several of my views are initially created with 
>> sizes which are 10-20 times the size of the compacted view. For example, I 
>> have one view which when initially created can take 95G but when compacted 
>> uses less than 5G. This has caused several out of disk space conditions when 
>> I've had to regenerate views for the database. I know commodity disks are 
>> relatively cheap these days, but due to my current hosting environment I am 
>> using relatively expensive networked storage. Asking for sufficient storage 
>> for my expected database size was difficult enough, but asking for 10 or 
>> more times that amount just to deal with temporary explosive view sizes is 
>> probably a non-starter.
>
> This one is being worked on in 
> https://issues.apache.org/jira/browse/COUCHDB-700 .  Guaranteeing a minimum 
> batch size results in a smaller index file and also speeds up indexing in 
> many circumstances.
>
>> CouchDB 1.0.x: My experience with attempting to use the 1.0.x branch was a 
>> failure due to the crashing immediately upon view compaction completion 
>> which caused the views to begin indexing from scratch.
>
> I agree with Paul that the timeout dropping a ref counter at the end of view 
> compaction is a significant bug.  I'm guessing it depends on the particular 
> deployment and size of the file being deleted.  There have been multiple 
> attempts [1,2] to rewrite the reference counting system; one of those should 
> probably be merged for 1.2.0.  We might be able to have some stopgap fix for 
> 1.0.x and 1.1.x.
>
> I also have to agree with Mike and Paul that BigCouch would help you a lot 
> here.  Even if you use it in a single-node setup the ability to split a large 
> monolithic database into an arbitrary number of shards can help tremendously 
> when trying to build and compact indexes.  Regards,
>

I should've mentioned this in my earlier email as well, but I'll
underscore the point that using BigCouch to shard your db on a single
node would still help in splitting the unit of work for a single
database.


> Adam
>
> [1]: https://github.com/tilgovi/couchdb/tree/ets_ref_count
> [2]: 
> https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_file.erl#L483

Reply via email to