Argh. Missed where you said you had upgraded. Ok it will proceed with getting you comparison numbers.
Sent from my iPhone > On Apr 8, 2014, at 6:51 AM, Edgar Veiga <[email protected]> wrote: > > Thanks again Matthew, you've been very helpful! > > Maybe you can give me some kind of advise on this issue I'm having since I've > upgraded to 1.4.8. > > Since I've upgraded my anti-entropy data has been growing a lot and has only > stabilised in very high values... Write now my cluster has 6 machines each > one with ~120G of anti-entropy data and 600G of level-db data. This seems to > be quite a lot no? My total amount of keys is ~2.5 Billions. > > Best regards, > Edgar > >> On 6 April 2014 23:30, Matthew Von-Maszewski <[email protected]> wrote: >> Edgar, >> >> This is indirectly related to you key deletion discussion. I made changes >> recently to the aggressive delete code. The second section of the following >> (updated) web page discusses the adjustments: >> >> https://github.com/basho/leveldb/wiki/Mv-aggressive-delete >> >> Matthew >> >> >>> On Apr 6, 2014, at 4:29 PM, Edgar Veiga <[email protected]> wrote: >>> >>> Matthew, thanks again for the response! >>> >>> That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :) >>> >>> Best regards >>> >>> >>>> On 6 April 2014 15:02, Matthew Von-Maszewski <[email protected]> wrote: >>>> Edgar, >>>> >>>> In Riak 1.4, there is no advantage to using empty values versus deleting. >>>> >>>> leveldb is a "write once" data store. New data for a given key never >>>> physically overwrites old data for the same key. New data "hides" the old >>>> data by being in a lower level, and therefore picked first. >>>> >>>> leveldb's compaction operation will remove older key/value pairs only when >>>> the newer key/value is pair is part of a compaction involving both new and >>>> old. The new and the old key/value pairs must have migrated to adjacent >>>> levels through normal compaction operations before leveldb will see them >>>> in the same compaction. The migration could take days, weeks, or even >>>> months depending upon the size of your entire dataset and the rate of >>>> incoming write operations. >>>> >>>> leveldb's "delete" object is exactly the same as your empty JSON object. >>>> The delete object simply has one more flag set that allows it to also be >>>> removed if and only if there is no chance for an identical key to exist on >>>> a higher level. >>>> >>>> I apologize that I cannot give you a more useful answer. 2.0 is on the >>>> horizon. >>>> >>>> Matthew >>>> >>>> >>>>> On Apr 6, 2014, at 7:04 AM, Edgar Veiga <[email protected]> wrote: >>>>> >>>>> Hi again! >>>>> >>>>> Sorry to reopen this discussion, but I have another question regarding >>>>> the former post. >>>>> >>>>> What if, instead of doing a mass deletion (We've already seen that it >>>>> will be non profitable, regarding disk space) I update all the values >>>>> with an empty JSON object "{}" ? Do you see any problem with this? I no >>>>> longer need those millions of values that are living in the cluster... >>>>> >>>>> When the version 2.0 of riak runs stable I'll do the update and only then >>>>> delete those keys! >>>>> >>>>> Best regards >>>>> >>>>> >>>>>> On 18 February 2014 16:32, Edgar Veiga <[email protected]> wrote: >>>>>> Ok, thanks a lot Matthew. >>>>>> >>>>>> >>>>>>> On 18 February 2014 16:18, Matthew Von-Maszewski <[email protected]> >>>>>>> wrote: >>>>>>> Riak 2.0 is coming. Hold your mass delete until then. The "bug" is >>>>>>> within Google's original leveldb architecture. Riak 2.0 sneaks around >>>>>>> to get the disk space freed. >>>>>>> >>>>>>> Matthew >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Feb 18, 2014, at 11:10 AM, Edgar Veiga <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> The only/main purpose is to free disk space.. >>>>>>>> >>>>>>>> I was a little bit concerned regarding this operation, but now with >>>>>>>> your feedback I'm tending to don't do nothing, I can't risk the >>>>>>>> growing of space... >>>>>>>> Regarding the overhead I think that with a tight throttling system I >>>>>>>> could control and avoid overloading the cluster. >>>>>>>> >>>>>>>> Mixed feelings :S >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On 18 February 2014 15:45, Matthew Von-Maszewski <[email protected]> >>>>>>>>> wrote: >>>>>>>>> Edgar, >>>>>>>>> >>>>>>>>> The first "concern" I have is that leveldb's delete does not free >>>>>>>>> disk space. Others have executed mass delete operations only to >>>>>>>>> discover they are now using more disk space instead of less. Here is >>>>>>>>> a discussion of the problem: >>>>>>>>> >>>>>>>>> https://github.com/basho/leveldb/wiki/mv-aggressive-delete >>>>>>>>> >>>>>>>>> The link also describes Riak's database operation overhead. This is >>>>>>>>> a second "concern". You will need to carefully throttle your delete >>>>>>>>> rate or the overhead will likely impact your production throughput. >>>>>>>>> >>>>>>>>> We have new code to help quicken the actual purge of deleted data in >>>>>>>>> Riak 2.0. But that release is not quite ready for production usage. >>>>>>>>> >>>>>>>>> >>>>>>>>> What do you hope to achieve by the mass delete? >>>>>>>>> >>>>>>>>> Matthew >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 18, 2014, at 10:29 AM, Edgar Veiga <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Sorry, forgot that info! >>>>>>>>>> >>>>>>>>>> It's leveldb. >>>>>>>>>> >>>>>>>>>> Best regards >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 18 February 2014 15:27, Matthew Von-Maszewski >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> Which Riak backend are you using: bitcask, leveldb, multi? >>>>>>>>>>> >>>>>>>>>>> Matthew >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Feb 18, 2014, at 10:17 AM, Edgar Veiga <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> > Hi all! >>>>>>>>>>> > >>>>>>>>>>> > I have a fairly trivial question regarding mass deletion on a >>>>>>>>>>> > riak cluster, but firstly let me give you just some context. My >>>>>>>>>>> > cluster is running with riak 1.4.6 on 6 machines with a ring of >>>>>>>>>>> > 256 nodes and 1Tb ssd disks. >>>>>>>>>>> > >>>>>>>>>>> > I need to execute a massive object deletion on a bucket, I'm >>>>>>>>>>> > talking of ~1 billion keys (The object average size is ~1Kb). I >>>>>>>>>>> > will not retrive the keys from riak because a I have a file with >>>>>>>>>>> > all of them. I'll just start a script that reads them from the >>>>>>>>>>> > file and triggers an HTTP DELETE for each one. >>>>>>>>>>> > The cluster will continue running on production with a quite high >>>>>>>>>>> > load serving all other applications, while running this deletion. >>>>>>>>>>> > >>>>>>>>>>> > My question is simple, do I need to have any kind of extra >>>>>>>>>>> > concerns regarding this action? Do you advise me on taking >>>>>>>>>>> > special attention to any kind of metrics regarding riak or event >>>>>>>>>>> > the servers where it's running? >>>>>>>>>>> > >>>>>>>>>>> > Best regards! >>>>>>>>>>> > _______________________________________________ >>>>>>>>>>> > riak-users mailing list >>>>>>>>>>> > [email protected] >>>>>>>>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
