Matthew, thanks again for the response!

That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :)

Best regards


On 6 April 2014 15:02, Matthew Von-Maszewski <[email protected]> wrote:

> Edgar,
>
> In Riak 1.4, there is no advantage to using empty values versus deleting.
>
> leveldb is a "write once" data store.  New data for a given key never
> physically overwrites old data for the same key.  New data "hides" the old
> data by being in a lower level, and therefore picked first.
>
> leveldb's compaction operation will remove older key/value pairs only when
> the newer key/value is pair is part of a compaction involving both new and
> old.  The new and the old key/value pairs must have migrated to adjacent
> levels through normal compaction operations before leveldb will see them in
> the same compaction.  The migration could take days, weeks, or even months
> depending upon the size of your entire dataset and the rate of incoming
> write operations.
>
> leveldb's "delete" object is exactly the same as your empty JSON object.
>  The delete object simply has one more flag set that allows it to also be
> removed if and only if there is no chance for an identical key to exist on
> a higher level.
>
> I apologize that I cannot give you a more useful answer.  2.0 is on the
> horizon.
>
> Matthew
>
>
> On Apr 6, 2014, at 7:04 AM, Edgar Veiga <[email protected]> wrote:
>
> Hi again!
>
> Sorry to reopen this discussion, but I have another question regarding the
> former post.
>
> What if, instead of doing a mass deletion (We've already seen that it will
> be non profitable, regarding disk space) I update all the values with an
> empty JSON object "{}" ? Do you see any problem with this? I no longer need
> those millions of values that are living in the cluster...
>
> When the version 2.0 of riak runs stable I'll do the update and only then
> delete those keys!
>
> Best regards
>
>
> On 18 February 2014 16:32, Edgar Veiga <[email protected]> wrote:
>
>> Ok, thanks a lot Matthew.
>>
>>
>> On 18 February 2014 16:18, Matthew Von-Maszewski <[email protected]>wrote:
>>
>>> Riak 2.0 is coming.  Hold your mass delete until then.  The "bug" is
>>> within Google's original leveldb architecture.  Riak 2.0 sneaks around to
>>> get the disk space freed.
>>>
>>> Matthew
>>>
>>>
>>>
>>> On Feb 18, 2014, at 11:10 AM, Edgar Veiga <[email protected]> wrote:
>>>
>>> The only/main purpose is to free disk space..
>>>
>>> I was a little bit concerned regarding this operation, but now with your
>>> feedback I'm tending to don't do nothing, I can't risk the growing of
>>> space...
>>> Regarding the overhead I think that with a tight throttling system I
>>> could control and avoid overloading the cluster.
>>>
>>> Mixed feelings :S
>>>
>>>
>>>
>>> On 18 February 2014 15:45, Matthew Von-Maszewski <[email protected]>wrote:
>>>
>>>> Edgar,
>>>>
>>>> The first "concern" I have is that leveldb's delete does not free disk
>>>> space.  Others have executed mass delete operations only to discover they
>>>> are now using more disk space instead of less.  Here is a discussion of the
>>>> problem:
>>>>
>>>> https://github.com/basho/leveldb/wiki/mv-aggressive-delete
>>>>
>>>> The link also describes Riak's database operation overhead.  This is a
>>>> second "concern".  You will need to carefully throttle your delete rate or
>>>> the overhead will likely impact your production throughput.
>>>>
>>>> We have new code to help quicken the actual purge of deleted data in
>>>> Riak 2.0.  But that release is not quite ready for production usage.
>>>>
>>>>
>>>> What do you hope to achieve by the mass delete?
>>>>
>>>> Matthew
>>>>
>>>>
>>>>
>>>>
>>>> On Feb 18, 2014, at 10:29 AM, Edgar Veiga <[email protected]>
>>>> wrote:
>>>>
>>>> Sorry, forgot that info!
>>>>
>>>> It's leveldb.
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 18 February 2014 15:27, Matthew Von-Maszewski <[email protected]>wrote:
>>>>
>>>>> Which Riak backend are you using:  bitcask, leveldb, multi?
>>>>>
>>>>> Matthew
>>>>>
>>>>>
>>>>> On Feb 18, 2014, at 10:17 AM, Edgar Veiga <[email protected]>
>>>>> wrote:
>>>>>
>>>>> > Hi all!
>>>>> >
>>>>> > I have a fairly trivial question regarding mass deletion on a riak
>>>>> cluster, but firstly let me give you just some context. My cluster is
>>>>> running with riak 1.4.6 on 6 machines with a ring of 256 nodes and 1Tb ssd
>>>>> disks.
>>>>> >
>>>>> > I need to execute a massive object deletion on a bucket, I'm talking
>>>>> of ~1 billion keys (The object average size is ~1Kb). I will not retrive
>>>>> the keys from riak because a I have a file with all of them. I'll just
>>>>> start a script that reads them from the file and triggers an HTTP DELETE
>>>>> for each one.
>>>>> > The cluster will continue running on production with a quite high
>>>>> load serving all other applications, while running this deletion.
>>>>> >
>>>>> > My question is simple, do I need to have any kind of extra concerns
>>>>> regarding this action? Do you advise me on taking special attention to any
>>>>> kind of metrics regarding riak or event the servers where it's running?
>>>>> >
>>>>> > Best regards!
>>>>> > _______________________________________________
>>>>> > riak-users mailing list
>>>>> > [email protected]
>>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to