On 3 Jun 2011, at 17:00, muji wrote: > Thanks again for your help Jan. > > Sorry, I thought that continuous compaction might be a feature I had > overlooked. I have no problems automating a compaction process, I > always envisaged needing to do that... > > I think that I will revert to running far fewer updates on the couchdb > document and caching the throughput in Redis as disc space is more of > a priority than application complexity. > > A few more (different) questions in the pipeline as I'm still learning couch > ;)
Sure, any time :) Cheers Jan -- > > On Fri, Jun 3, 2011 at 3:37 PM, Jan Lehnardt <[email protected]> wrote: >> >> On 3 Jun 2011, at 16:28, muji wrote: >> >>> Thanks very much for the help. >>> >>> I could of course reduce the amount of times the update is done but >>> the service plans to bill based on throughput so this is quite >>> critical from a billing perspective. >> >> You can still bill on throughput as you will know exactly how much >> date has been transferred in what amount of time, but reporting is >> going to be less granular, i.e. chunks of say 10MB and not 100Kb or >> however big chunks are. >> >>> A quick search for continuous compaction didn't yield anything, and I >>> don't see anything here: >>> >>> http://wiki.apache.org/couchdb/Compaction >>> >>> Could you point me in the right direction please? >> >> I made it up and I explained how to do it. Pseduocode: >> >> while(`curl http://127.0.0.1:5984/db/_compact`); >> >>> Funny you mention about caching before updating couch, that was my >>> very first implementation! I was updating Redis with the throughput >>> and then updating the file document once the upload completed. That >>> worked very well but I wanted to remove Redis from the stack as the >>> application is already pretty complex. >>> >>> I'm guessing my best option is to revert back to that technique? >> >> It depends on what your goals are. The initial design you mentioned >> seems fine to me if you compact often. If you are optimising for >> disk space, Redis or memcached may be a good idea. If you are >> optimising for a small stack, not having Redis or memcached is a >> good idea. >> >>> As an aside, why would my document update handler be raising >>> conflicts? My understanding was that update handlers would not raise >>> conflicts - is that correct? >> >> That is not correct. >> >> Cheers >> Jan >> -- >> >>> >>> Thanks! >>> >>> On Fri, Jun 3, 2011 at 3:03 PM, Jan Lehnardt <[email protected]> wrote: >>>> Hi, >>>> >>>> On 3 Jun 2011, at 15:43, muji wrote: >>>>> I'm still new to couchdb and nosql so apologies if the answer to this >>>>> is trivial. >>>> >>>> No worries, we're all new at something :) >>>> >>>>> >>>>> I'm trying to track the throughput of a file sent via a POST request >>>>> in a couchdb document. >>>>> >>>>> My initial implementation creates a document for the file before the >>>>> POST is sent and then I have an update handler that increments the >>>>> "uploadbytes" for every chunk of data received from the client. >>>> >>>> Could you make that little less frequent in interpolate between the >>>> data points? Instead of tracking bytes exactly at the chunk boundaries, >>>> just update every 10 or so MB? And have the UI adjust accordingly? >>>> >>>> >>>>> This *nearly* works except that I get document update conflicts (which >>>>> I think is to do with me not being able to throttle back the upload >>>>> while the db is updated) but the main problem is that for large files >>>>> (~2.4GB) the number of document revisions is around 40-50,000. So I >>>>> have a single document taking up between 0.7GB and 1GB. After >>>>> compaction if reduces to ~380KB which of course is much better but >>>>> this still seems excessive and poses problems with compacting to a >>>>> write heavy database. I understand the trick to that is to replicate, >>>>> compact and replicate back to the source, please correct me if I'm >>>>> wrong... >>>> >>>> Hm no that won't do anything, just regular compaction is good enough. >>>> >>>>> So, I don't think this approach is viable which makes me wonder >>>>> whether setting the _revs_limit will help, although I understand that >>>>> setting this per database still requires compaction and will save on >>>>> space after compaction. >>>> >>>> _revs_limit won't help, you will always need to compact to get rid of >>>> data. >>>> >>>>> I was thinking that tracking the throughput as chunks in individual >>>>> documents and then calculating the throughput with a map/reduce on all >>>>> the chunks might be a better approach. Although I'm concerned that >>>>> having lots of little documents for each data chunk will also take up >>>>> large amounts of space... >>>> >>>> Yeah, wouldn't save any space here. That said, the numbers you quote, >>>> I wouldn't call "large amounts". >>>> >>>> >>>>> Any advice and guidance on the best way to tackle this would be much >>>>> appreciated. >>>> >>>> I'd either set up continuous compaction (restart compaction right when >>>> it is done) to keep DB size at a minimum or use an in-memory store >>>> to keep track of the uploaded bytes. >>>> >>>> Ideally though, CouchDB would give you an endpoint to query that kind >>>> of data. >>>> >>>> Cheers >>>> Jan >>>> -- >>>> >>>> >>> >>> >>> >>> -- >>> muji. >> >> > > > > -- > muji.
