Thanks a lot Matthew! A little bit of more info, I've gathered a sample of the contents of anti-entropy data of one of my machines: - 44 folders with the name equal to the name of the folders in level-db dir (i.e. 393920363186844927172086927568060657641638068224/) - each folder has a 5 files (log, current, log, etc) and 5 sst_* folders. - The biggest sst folder is sst_3 with 4.3G - Inside sst_3 folder there are 1219 files name 00****.sst. - Each of the 00*****.sst files has ~3.7M
Hope this info gives you some more help! Best regards, and again, thanks a lot Edgar On 8 April 2014 13:24, Matthew Von-Maszewski <[email protected]> wrote: > Argh. Missed where you said you had upgraded. Ok it will proceed with > getting you comparison numbers. > > Sent from my iPhone > > On Apr 8, 2014, at 6:51 AM, Edgar Veiga <[email protected]> wrote: > > Thanks again Matthew, you've been very helpful! > > Maybe you can give me some kind of advise on this issue I'm having since > I've upgraded to 1.4.8. > > Since I've upgraded my anti-entropy data has been growing a lot and has > only stabilised in very high values... Write now my cluster has 6 machines > each one with ~120G of anti-entropy data and 600G of level-db data. This > seems to be quite a lot no? My total amount of keys is ~2.5 Billions. > > Best regards, > Edgar > > On 6 April 2014 23:30, Matthew Von-Maszewski <[email protected]> wrote: > >> Edgar, >> >> This is indirectly related to you key deletion discussion. I made >> changes recently to the aggressive delete code. The second section of the >> following (updated) web page discusses the adjustments: >> >> https://github.com/basho/leveldb/wiki/Mv-aggressive-delete >> >> Matthew >> >> >> On Apr 6, 2014, at 4:29 PM, Edgar Veiga <[email protected]> wrote: >> >> Matthew, thanks again for the response! >> >> That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :) >> >> Best regards >> >> >> On 6 April 2014 15:02, Matthew Von-Maszewski <[email protected]> wrote: >> >>> Edgar, >>> >>> In Riak 1.4, there is no advantage to using empty values versus deleting. >>> >>> leveldb is a "write once" data store. New data for a given key never >>> physically overwrites old data for the same key. New data "hides" the old >>> data by being in a lower level, and therefore picked first. >>> >>> leveldb's compaction operation will remove older key/value pairs only >>> when the newer key/value is pair is part of a compaction involving both new >>> and old. The new and the old key/value pairs must have migrated to >>> adjacent levels through normal compaction operations before leveldb will >>> see them in the same compaction. The migration could take days, weeks, or >>> even months depending upon the size of your entire dataset and the rate of >>> incoming write operations. >>> >>> leveldb's "delete" object is exactly the same as your empty JSON object. >>> The delete object simply has one more flag set that allows it to also be >>> removed if and only if there is no chance for an identical key to exist on >>> a higher level. >>> >>> I apologize that I cannot give you a more useful answer. 2.0 is on the >>> horizon. >>> >>> Matthew >>> >>> >>> On Apr 6, 2014, at 7:04 AM, Edgar Veiga <[email protected]> wrote: >>> >>> Hi again! >>> >>> Sorry to reopen this discussion, but I have another question regarding >>> the former post. >>> >>> What if, instead of doing a mass deletion (We've already seen that it >>> will be non profitable, regarding disk space) I update all the values with >>> an empty JSON object "{}" ? Do you see any problem with this? I no longer >>> need those millions of values that are living in the cluster... >>> >>> When the version 2.0 of riak runs stable I'll do the update and only >>> then delete those keys! >>> >>> Best regards >>> >>> >>> On 18 February 2014 16:32, Edgar Veiga <[email protected]> wrote: >>> >>>> Ok, thanks a lot Matthew. >>>> >>>> >>>> On 18 February 2014 16:18, Matthew Von-Maszewski <[email protected]>wrote: >>>> >>>>> Riak 2.0 is coming. Hold your mass delete until then. The "bug" is >>>>> within Google's original leveldb architecture. Riak 2.0 sneaks around to >>>>> get the disk space freed. >>>>> >>>>> Matthew >>>>> >>>>> >>>>> >>>>> On Feb 18, 2014, at 11:10 AM, Edgar Veiga <[email protected]> >>>>> wrote: >>>>> >>>>> The only/main purpose is to free disk space.. >>>>> >>>>> I was a little bit concerned regarding this operation, but now with >>>>> your feedback I'm tending to don't do nothing, I can't risk the growing of >>>>> space... >>>>> Regarding the overhead I think that with a tight throttling system I >>>>> could control and avoid overloading the cluster. >>>>> >>>>> Mixed feelings :S >>>>> >>>>> >>>>> >>>>> On 18 February 2014 15:45, Matthew Von-Maszewski >>>>> <[email protected]>wrote: >>>>> >>>>>> Edgar, >>>>>> >>>>>> The first "concern" I have is that leveldb's delete does not free >>>>>> disk space. Others have executed mass delete operations only to discover >>>>>> they are now using more disk space instead of less. Here is a discussion >>>>>> of the problem: >>>>>> >>>>>> https://github.com/basho/leveldb/wiki/mv-aggressive-delete >>>>>> >>>>>> The link also describes Riak's database operation overhead. This is >>>>>> a second "concern". You will need to carefully throttle your delete rate >>>>>> or the overhead will likely impact your production throughput. >>>>>> >>>>>> We have new code to help quicken the actual purge of deleted data in >>>>>> Riak 2.0. But that release is not quite ready for production usage. >>>>>> >>>>>> >>>>>> What do you hope to achieve by the mass delete? >>>>>> >>>>>> Matthew >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Feb 18, 2014, at 10:29 AM, Edgar Veiga <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Sorry, forgot that info! >>>>>> >>>>>> It's leveldb. >>>>>> >>>>>> Best regards >>>>>> >>>>>> >>>>>> On 18 February 2014 15:27, Matthew Von-Maszewski >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> Which Riak backend are you using: bitcask, leveldb, multi? >>>>>>> >>>>>>> Matthew >>>>>>> >>>>>>> >>>>>>> On Feb 18, 2014, at 10:17 AM, Edgar Veiga <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> > Hi all! >>>>>>> > >>>>>>> > I have a fairly trivial question regarding mass deletion on a riak >>>>>>> cluster, but firstly let me give you just some context. My cluster is >>>>>>> running with riak 1.4.6 on 6 machines with a ring of 256 nodes and 1Tb >>>>>>> ssd >>>>>>> disks. >>>>>>> > >>>>>>> > I need to execute a massive object deletion on a bucket, I'm >>>>>>> talking of ~1 billion keys (The object average size is ~1Kb). I will not >>>>>>> retrive the keys from riak because a I have a file with all of them. >>>>>>> I'll >>>>>>> just start a script that reads them from the file and triggers an HTTP >>>>>>> DELETE for each one. >>>>>>> > The cluster will continue running on production with a quite high >>>>>>> load serving all other applications, while running this deletion. >>>>>>> > >>>>>>> > My question is simple, do I need to have any kind of extra >>>>>>> concerns regarding this action? Do you advise me on taking special >>>>>>> attention to any kind of metrics regarding riak or event the servers >>>>>>> where >>>>>>> it's running? >>>>>>> > >>>>>>> > Best regards! >>>>>>> > _______________________________________________ >>>>>>> > riak-users mailing list >>>>>>> > [email protected] >>>>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> >> >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
