Thanks a lot Matthew!

A little bit of more info, I've gathered a sample of the contents of
anti-entropy data of one of my machines:
- 44 folders with the name equal to the name of the folders in level-db dir
(i.e. 393920363186844927172086927568060657641638068224/)
- each folder has a 5 files (log, current, log, etc) and 5 sst_* folders.
- The biggest sst folder is sst_3 with 4.3G
- Inside sst_3 folder there are 1219 files name 00****.sst.
- Each of the 00*****.sst files has ~3.7M

Hope this info gives you some more help!

Best regards, and again, thanks a lot
Edgar


On 8 April 2014 13:24, Matthew Von-Maszewski <[email protected]> wrote:

> Argh. Missed where you said you had upgraded. Ok it will proceed with
> getting you comparison numbers.
>
> Sent from my iPhone
>
> On Apr 8, 2014, at 6:51 AM, Edgar Veiga <[email protected]> wrote:
>
> Thanks again Matthew, you've been very helpful!
>
> Maybe you can give me some kind of advise on this issue I'm having since
> I've upgraded to 1.4.8.
>
> Since I've upgraded my anti-entropy data has been growing a lot and has
> only stabilised in very high values... Write now my cluster has 6 machines
> each one with ~120G of anti-entropy data and 600G of level-db data. This
> seems to be quite a lot no? My total amount of keys is ~2.5 Billions.
>
> Best regards,
> Edgar
>
> On 6 April 2014 23:30, Matthew Von-Maszewski <[email protected]> wrote:
>
>> Edgar,
>>
>> This is indirectly related to you key deletion discussion.  I made
>> changes recently to the aggressive delete code.  The second section of the
>> following (updated) web page discusses the adjustments:
>>
>>     https://github.com/basho/leveldb/wiki/Mv-aggressive-delete
>>
>> Matthew
>>
>>
>> On Apr 6, 2014, at 4:29 PM, Edgar Veiga <[email protected]> wrote:
>>
>> Matthew, thanks again for the response!
>>
>> That said, I'll wait again for the 2.0 (and maybe buy some bigger disks :)
>>
>> Best regards
>>
>>
>> On 6 April 2014 15:02, Matthew Von-Maszewski <[email protected]> wrote:
>>
>>> Edgar,
>>>
>>> In Riak 1.4, there is no advantage to using empty values versus deleting.
>>>
>>> leveldb is a "write once" data store.  New data for a given key never
>>> physically overwrites old data for the same key.  New data "hides" the old
>>> data by being in a lower level, and therefore picked first.
>>>
>>> leveldb's compaction operation will remove older key/value pairs only
>>> when the newer key/value is pair is part of a compaction involving both new
>>> and old.  The new and the old key/value pairs must have migrated to
>>> adjacent levels through normal compaction operations before leveldb will
>>> see them in the same compaction.  The migration could take days, weeks, or
>>> even months depending upon the size of your entire dataset and the rate of
>>> incoming write operations.
>>>
>>> leveldb's "delete" object is exactly the same as your empty JSON object.
>>>  The delete object simply has one more flag set that allows it to also be
>>> removed if and only if there is no chance for an identical key to exist on
>>> a higher level.
>>>
>>> I apologize that I cannot give you a more useful answer.  2.0 is on the
>>> horizon.
>>>
>>> Matthew
>>>
>>>
>>> On Apr 6, 2014, at 7:04 AM, Edgar Veiga <[email protected]> wrote:
>>>
>>> Hi again!
>>>
>>> Sorry to reopen this discussion, but I have another question regarding
>>> the former post.
>>>
>>> What if, instead of doing a mass deletion (We've already seen that it
>>> will be non profitable, regarding disk space) I update all the values with
>>> an empty JSON object "{}" ? Do you see any problem with this? I no longer
>>> need those millions of values that are living in the cluster...
>>>
>>> When the version 2.0 of riak runs stable I'll do the update and only
>>> then delete those keys!
>>>
>>> Best regards
>>>
>>>
>>> On 18 February 2014 16:32, Edgar Veiga <[email protected]> wrote:
>>>
>>>> Ok, thanks a lot Matthew.
>>>>
>>>>
>>>> On 18 February 2014 16:18, Matthew Von-Maszewski <[email protected]>wrote:
>>>>
>>>>> Riak 2.0 is coming.  Hold your mass delete until then.  The "bug" is
>>>>> within Google's original leveldb architecture.  Riak 2.0 sneaks around to
>>>>> get the disk space freed.
>>>>>
>>>>> Matthew
>>>>>
>>>>>
>>>>>
>>>>> On Feb 18, 2014, at 11:10 AM, Edgar Veiga <[email protected]>
>>>>> wrote:
>>>>>
>>>>> The only/main purpose is to free disk space..
>>>>>
>>>>> I was a little bit concerned regarding this operation, but now with
>>>>> your feedback I'm tending to don't do nothing, I can't risk the growing of
>>>>> space...
>>>>> Regarding the overhead I think that with a tight throttling system I
>>>>> could control and avoid overloading the cluster.
>>>>>
>>>>> Mixed feelings :S
>>>>>
>>>>>
>>>>>
>>>>> On 18 February 2014 15:45, Matthew Von-Maszewski 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Edgar,
>>>>>>
>>>>>> The first "concern" I have is that leveldb's delete does not free
>>>>>> disk space.  Others have executed mass delete operations only to discover
>>>>>> they are now using more disk space instead of less.  Here is a discussion
>>>>>> of the problem:
>>>>>>
>>>>>> https://github.com/basho/leveldb/wiki/mv-aggressive-delete
>>>>>>
>>>>>> The link also describes Riak's database operation overhead.  This is
>>>>>> a second "concern".  You will need to carefully throttle your delete rate
>>>>>> or the overhead will likely impact your production throughput.
>>>>>>
>>>>>> We have new code to help quicken the actual purge of deleted data in
>>>>>> Riak 2.0.  But that release is not quite ready for production usage.
>>>>>>
>>>>>>
>>>>>> What do you hope to achieve by the mass delete?
>>>>>>
>>>>>> Matthew
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Feb 18, 2014, at 10:29 AM, Edgar Veiga <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> Sorry, forgot that info!
>>>>>>
>>>>>> It's leveldb.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 18 February 2014 15:27, Matthew Von-Maszewski 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> Which Riak backend are you using:  bitcask, leveldb, multi?
>>>>>>>
>>>>>>> Matthew
>>>>>>>
>>>>>>>
>>>>>>> On Feb 18, 2014, at 10:17 AM, Edgar Veiga <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Hi all!
>>>>>>> >
>>>>>>> > I have a fairly trivial question regarding mass deletion on a riak
>>>>>>> cluster, but firstly let me give you just some context. My cluster is
>>>>>>> running with riak 1.4.6 on 6 machines with a ring of 256 nodes and 1Tb 
>>>>>>> ssd
>>>>>>> disks.
>>>>>>> >
>>>>>>> > I need to execute a massive object deletion on a bucket, I'm
>>>>>>> talking of ~1 billion keys (The object average size is ~1Kb). I will not
>>>>>>> retrive the keys from riak because a I have a file with all of them. 
>>>>>>> I'll
>>>>>>> just start a script that reads them from the file and triggers an HTTP
>>>>>>> DELETE for each one.
>>>>>>> > The cluster will continue running on production with a quite high
>>>>>>> load serving all other applications, while running this deletion.
>>>>>>> >
>>>>>>> > My question is simple, do I need to have any kind of extra
>>>>>>> concerns regarding this action? Do you advise me on taking special
>>>>>>> attention to any kind of metrics regarding riak or event the servers 
>>>>>>> where
>>>>>>> it's running?
>>>>>>> >
>>>>>>> > Best regards!
>>>>>>> > _______________________________________________
>>>>>>> > riak-users mailing list
>>>>>>> > [email protected]
>>>>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to