Thanks again for your help Jan. Sorry, I thought that continuous compaction might be a feature I had overlooked. I have no problems automating a compaction process, I always envisaged needing to do that...
I think that I will revert to running far fewer updates on the couchdb document and caching the throughput in Redis as disc space is more of a priority than application complexity. A few more (different) questions in the pipeline as I'm still learning couch ;) On Fri, Jun 3, 2011 at 3:37 PM, Jan Lehnardt <[email protected]> wrote: > > On 3 Jun 2011, at 16:28, muji wrote: > >> Thanks very much for the help. >> >> I could of course reduce the amount of times the update is done but >> the service plans to bill based on throughput so this is quite >> critical from a billing perspective. > > You can still bill on throughput as you will know exactly how much > date has been transferred in what amount of time, but reporting is > going to be less granular, i.e. chunks of say 10MB and not 100Kb or > however big chunks are. > >> A quick search for continuous compaction didn't yield anything, and I >> don't see anything here: >> >> http://wiki.apache.org/couchdb/Compaction >> >> Could you point me in the right direction please? > > I made it up and I explained how to do it. Pseduocode: > > while(`curl http://127.0.0.1:5984/db/_compact`); > >> Funny you mention about caching before updating couch, that was my >> very first implementation! I was updating Redis with the throughput >> and then updating the file document once the upload completed. That >> worked very well but I wanted to remove Redis from the stack as the >> application is already pretty complex. >> >> I'm guessing my best option is to revert back to that technique? > > It depends on what your goals are. The initial design you mentioned > seems fine to me if you compact often. If you are optimising for > disk space, Redis or memcached may be a good idea. If you are > optimising for a small stack, not having Redis or memcached is a > good idea. > >> As an aside, why would my document update handler be raising >> conflicts? My understanding was that update handlers would not raise >> conflicts - is that correct? > > That is not correct. > > Cheers > Jan > -- > >> >> Thanks! >> >> On Fri, Jun 3, 2011 at 3:03 PM, Jan Lehnardt <[email protected]> wrote: >>> Hi, >>> >>> On 3 Jun 2011, at 15:43, muji wrote: >>>> I'm still new to couchdb and nosql so apologies if the answer to this >>>> is trivial. >>> >>> No worries, we're all new at something :) >>> >>>> >>>> I'm trying to track the throughput of a file sent via a POST request >>>> in a couchdb document. >>>> >>>> My initial implementation creates a document for the file before the >>>> POST is sent and then I have an update handler that increments the >>>> "uploadbytes" for every chunk of data received from the client. >>> >>> Could you make that little less frequent in interpolate between the >>> data points? Instead of tracking bytes exactly at the chunk boundaries, >>> just update every 10 or so MB? And have the UI adjust accordingly? >>> >>> >>>> This *nearly* works except that I get document update conflicts (which >>>> I think is to do with me not being able to throttle back the upload >>>> while the db is updated) but the main problem is that for large files >>>> (~2.4GB) the number of document revisions is around 40-50,000. So I >>>> have a single document taking up between 0.7GB and 1GB. After >>>> compaction if reduces to ~380KB which of course is much better but >>>> this still seems excessive and poses problems with compacting to a >>>> write heavy database. I understand the trick to that is to replicate, >>>> compact and replicate back to the source, please correct me if I'm >>>> wrong... >>> >>> Hm no that won't do anything, just regular compaction is good enough. >>> >>>> So, I don't think this approach is viable which makes me wonder >>>> whether setting the _revs_limit will help, although I understand that >>>> setting this per database still requires compaction and will save on >>>> space after compaction. >>> >>> _revs_limit won't help, you will always need to compact to get rid of >>> data. >>> >>>> I was thinking that tracking the throughput as chunks in individual >>>> documents and then calculating the throughput with a map/reduce on all >>>> the chunks might be a better approach. Although I'm concerned that >>>> having lots of little documents for each data chunk will also take up >>>> large amounts of space... >>> >>> Yeah, wouldn't save any space here. That said, the numbers you quote, >>> I wouldn't call "large amounts". >>> >>> >>>> Any advice and guidance on the best way to tackle this would be much >>>> appreciated. >>> >>> I'd either set up continuous compaction (restart compaction right when >>> it is done) to keep DB size at a minimum or use an in-memory store >>> to keep track of the uploaded bytes. >>> >>> Ideally though, CouchDB would give you an endpoint to query that kind >>> of data. >>> >>> Cheers >>> Jan >>> -- >>> >>> >> >> >> >> -- >> muji. > > -- muji.
