Matthew, thanks again for the response! That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :)
Best regards On 6 April 2014 15:02, Matthew Von-Maszewski <[email protected]> wrote: > Edgar, > > In Riak 1.4, there is no advantage to using empty values versus deleting. > > leveldb is a "write once" data store. New data for a given key never > physically overwrites old data for the same key. New data "hides" the old > data by being in a lower level, and therefore picked first. > > leveldb's compaction operation will remove older key/value pairs only when > the newer key/value is pair is part of a compaction involving both new and > old. The new and the old key/value pairs must have migrated to adjacent > levels through normal compaction operations before leveldb will see them in > the same compaction. The migration could take days, weeks, or even months > depending upon the size of your entire dataset and the rate of incoming > write operations. > > leveldb's "delete" object is exactly the same as your empty JSON object. > The delete object simply has one more flag set that allows it to also be > removed if and only if there is no chance for an identical key to exist on > a higher level. > > I apologize that I cannot give you a more useful answer. 2.0 is on the > horizon. > > Matthew > > > On Apr 6, 2014, at 7:04 AM, Edgar Veiga <[email protected]> wrote: > > Hi again! > > Sorry to reopen this discussion, but I have another question regarding the > former post. > > What if, instead of doing a mass deletion (We've already seen that it will > be non profitable, regarding disk space) I update all the values with an > empty JSON object "{}" ? Do you see any problem with this? I no longer need > those millions of values that are living in the cluster... > > When the version 2.0 of riak runs stable I'll do the update and only then > delete those keys! > > Best regards > > > On 18 February 2014 16:32, Edgar Veiga <[email protected]> wrote: > >> Ok, thanks a lot Matthew. >> >> >> On 18 February 2014 16:18, Matthew Von-Maszewski <[email protected]>wrote: >> >>> Riak 2.0 is coming. Hold your mass delete until then. The "bug" is >>> within Google's original leveldb architecture. Riak 2.0 sneaks around to >>> get the disk space freed. >>> >>> Matthew >>> >>> >>> >>> On Feb 18, 2014, at 11:10 AM, Edgar Veiga <[email protected]> wrote: >>> >>> The only/main purpose is to free disk space.. >>> >>> I was a little bit concerned regarding this operation, but now with your >>> feedback I'm tending to don't do nothing, I can't risk the growing of >>> space... >>> Regarding the overhead I think that with a tight throttling system I >>> could control and avoid overloading the cluster. >>> >>> Mixed feelings :S >>> >>> >>> >>> On 18 February 2014 15:45, Matthew Von-Maszewski <[email protected]>wrote: >>> >>>> Edgar, >>>> >>>> The first "concern" I have is that leveldb's delete does not free disk >>>> space. Others have executed mass delete operations only to discover they >>>> are now using more disk space instead of less. Here is a discussion of the >>>> problem: >>>> >>>> https://github.com/basho/leveldb/wiki/mv-aggressive-delete >>>> >>>> The link also describes Riak's database operation overhead. This is a >>>> second "concern". You will need to carefully throttle your delete rate or >>>> the overhead will likely impact your production throughput. >>>> >>>> We have new code to help quicken the actual purge of deleted data in >>>> Riak 2.0. But that release is not quite ready for production usage. >>>> >>>> >>>> What do you hope to achieve by the mass delete? >>>> >>>> Matthew >>>> >>>> >>>> >>>> >>>> On Feb 18, 2014, at 10:29 AM, Edgar Veiga <[email protected]> >>>> wrote: >>>> >>>> Sorry, forgot that info! >>>> >>>> It's leveldb. >>>> >>>> Best regards >>>> >>>> >>>> On 18 February 2014 15:27, Matthew Von-Maszewski <[email protected]>wrote: >>>> >>>>> Which Riak backend are you using: bitcask, leveldb, multi? >>>>> >>>>> Matthew >>>>> >>>>> >>>>> On Feb 18, 2014, at 10:17 AM, Edgar Veiga <[email protected]> >>>>> wrote: >>>>> >>>>> > Hi all! >>>>> > >>>>> > I have a fairly trivial question regarding mass deletion on a riak >>>>> cluster, but firstly let me give you just some context. My cluster is >>>>> running with riak 1.4.6 on 6 machines with a ring of 256 nodes and 1Tb ssd >>>>> disks. >>>>> > >>>>> > I need to execute a massive object deletion on a bucket, I'm talking >>>>> of ~1 billion keys (The object average size is ~1Kb). I will not retrive >>>>> the keys from riak because a I have a file with all of them. I'll just >>>>> start a script that reads them from the file and triggers an HTTP DELETE >>>>> for each one. >>>>> > The cluster will continue running on production with a quite high >>>>> load serving all other applications, while running this deletion. >>>>> > >>>>> > My question is simple, do I need to have any kind of extra concerns >>>>> regarding this action? Do you advise me on taking special attention to any >>>>> kind of metrics regarding riak or event the servers where it's running? >>>>> > >>>>> > Best regards! >>>>> > _______________________________________________ >>>>> > riak-users mailing list >>>>> > [email protected] >>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>> >>>>> >>>> >>>> >>> >>> >> > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
