Re: RIAK 1.4.6 - Mass key deletion

Matthew Von-Maszewski Thu, 10 Apr 2014 05:43:29 -0700

Yes, you can send the AAE (active anti-entropy) data to a different disk.


AAE calculates a hash each time you PUT new data to the regular database.  AAE 
then buffers around 1,000 hashes (I forget the exact value) to write as a block 
to the AAE database.  The AAE write is NOT in series with the user database 
writes.  Your throughput should not be impacted.  But this is not something I 
have personally measured/validated.

Matthew


On Apr 10, 2014, at 7:33 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:

> Hi Matthew!
> 
> I have a possibility of moving the data of anti-entropy directory to a 
> mechanic disk 7200, that exists on each of the machines. I was thinking of 
> changing the anti_entropy data dir config in app.config file and restart the 
> riak process.
> 
> Is there any problem using a mechanic disk to store the anti-entropy data?
> 
> Best regards!
> 
> 
> On 8 April 2014 23:58, Edgar Veiga <edgarmve...@gmail.com> wrote:
> I'll wait a few more days, see if the AAE maybe "stabilises" and only after 
> that make a decision regarding this.
> The cluster expanding was on the roadmap, but not right now :)
> 
> I've attached a few screenshot, you can clearly observe  the evolution of one 
> of the machines after the anti-entropy data removal and consequent restart  
> (5th of April).
> 
> https://cloudup.com/cB0a15lCMeS
> 
> Best regards!
> 
> 
> On 8 April 2014 23:44, Matthew Von-Maszewski <matth...@basho.com> wrote:
> No.  I do not see a problem with your plan.  But ...
> 
> I would prefer to see you add servers to your cluster.  Scalabilty is one of 
> Riak's fundamental characteristics.  As your database needs grow, we grow 
> with you … just add another server and migrate some of the vnodes there.
> 
> I obviously cannot speak to your budgetary constraints.  All of the engineers 
> at Basho, I am just one, are focused upon providing you performance and 
> features along with your scalability needs.  This seems to be a situation 
> where you might be sacrificing data integrity where another server or two 
> would address the situation.
> 
> And if 2.0 makes things better … sell the extra servers on Ebay.
> 
> Matthew
> 
> 
> On Apr 8, 2014, at 6:31 PM, Edgar Veiga <edgarmve...@gmail.com> wrote:
> 
>> Thanks Matthew!
>> 
>> Today this situation has become unsustainable, In two of the machines I have 
>> an anti-entropy dir of 250G... It just keeps growing and growing and I'm 
>> almost reaching max size of the disks.
>> 
>> Maybe I'll just turn off aae in the cluster, remove all the data in the 
>> anti-entropy directory and wait for the v2 of riak. Do you see any problem 
>> with this?
>> 
>> Best regards!
>> 
>> 
>> On 8 April 2014 22:11, Matthew Von-Maszewski <matth...@basho.com> wrote:
>> Edgar,
>> 
>> Today we disclosed a new feature for Riak's leveldb, Tiered Storage.  The 
>> details are here:
>> 
>> https://github.com/basho/leveldb/wiki/mv-tiered-options
>> 
>> This feature might give you another option in managing your storage volume. 
>> 
>> 
>> Matthew
>> 
>>> On Apr 8, 2014, at 11:07 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>> 
>>>> It makes sense, I do a lot, and I really mean a LOT of updates per key, 
>>>> maybe thousands a day! The cluster is experiencing a lot more updates per 
>>>> each key, than new keys being inserted.
>>>> 
>>>> The hash trees will rebuild during the next weekend (normally it takes 
>>>> about two days to complete the operation) so I'll come back and give you 
>>>> some feedback (hopefully good) on the next Monday!
>>>> 
>>>> Again, thanks a lot, You've been very helpful.
>>>> Edgar
>>>> 
>>>> 
>>>> On 8 April 2014 15:47, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>>> Edgar,
>>>> 
>>>> The test I have running currently has reach 1 Billion keys.  It is running 
>>>> against a single node with N=1.  It has 42G of AAE data.  Here is my 
>>>> extrapolation to compare your numbers:
>>>> 
>>>> You have ~2.5 Billion keys.  I assume you are running N=3 (the default).  
>>>> AAE therefore is actually tracking ~7.5 Billion keys.  You have six nodes, 
>>>> therefore tracking ~1.25 Billion keys per node.
>>>> 
>>>> Raw math would suggest that my 42G of AAE data for 1 billion keys would 
>>>> extrapolate to 52.5G of AAE data for you.  Yet you have ~120G of AAE data. 
>>>>  Is something wrong?  No.  My data is still loading and has experience 
>>>> zero key/value updates/edits.
>>>> 
>>>> AAE hashes get rewritten every time a user updates the value of a key.  
>>>> AAE's leveldb is just like the user leveldb, all prior values of a key 
>>>> accumulate in the .sst table files until compaction removes duplicates.  
>>>> Similarly, a user delete of a key causes a delete tombstone in the AAE 
>>>> hash tree.  Those delete tombstones have to await compactions too before 
>>>> leveldb recovers the disk space.
>>>> 
>>>> AAE's hash trees rebuild weekly.  I am told that the rebuild operation 
>>>> will actually destroy the existing files and start over.  That is when you 
>>>> should see AAE space usage dropping dramatically.
>>>> 
>>>> Matthew
>>>> 
>>>> 
>>>> On Apr 8, 2014, at 9:31 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>> 
>>>>> Thanks a lot Matthew!
>>>>> 
>>>>> A little bit of more info, I've gathered a sample of the contents of 
>>>>> anti-entropy data of one of my machines:
>>>>> - 44 folders with the name equal to the name of the folders in level-db 
>>>>> dir (i.e. 393920363186844927172086927568060657641638068224/)
>>>>> - each folder has a 5 files (log, current, log, etc) and 5 sst_* folders.
>>>>> - The biggest sst folder is sst_3 with 4.3G
>>>>> - Inside sst_3 folder there are 1219 files name 00****.sst.
>>>>> - Each of the 00*****.sst files has ~3.7M
>>>>> 
>>>>> Hope this info gives you some more help! 
>>>>> 
>>>>> Best regards, and again, thanks a lot
>>>>> Edgar
>>>>> 
>>>>> 
>>>>> On 8 April 2014 13:24, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>>>> Argh. Missed where you said you had upgraded. Ok it will proceed with 
>>>>> getting you comparison numbers. 
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On Apr 8, 2014, at 6:51 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>> 
>>>>>> Thanks again Matthew, you've been very helpful!
>>>>>> 
>>>>>> Maybe you can give me some kind of advise on this issue I'm having since 
>>>>>> I've upgraded to 1.4.8.
>>>>>> 
>>>>>> Since I've upgraded my anti-entropy data has been growing a lot and has 
>>>>>> only stabilised in very high values... Write now my cluster has 6 
>>>>>> machines each one with ~120G of anti-entropy data and 600G of level-db 
>>>>>> data. This seems to be quite a lot no? My total amount of keys is ~2.5 
>>>>>> Billions.
>>>>>> 
>>>>>> Best regards,
>>>>>> Edgar
>>>>>> 
>>>>>> On 6 April 2014 23:30, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>>>>> Edgar,
>>>>>> 
>>>>>> This is indirectly related to you key deletion discussion.  I made 
>>>>>> changes recently to the aggressive delete code.  The second section of 
>>>>>> the following (updated) web page discusses the adjustments:
>>>>>> 
>>>>>>     https://github.com/basho/leveldb/wiki/Mv-aggressive-delete
>>>>>> 
>>>>>> Matthew
>>>>>> 
>>>>>> 
>>>>>> On Apr 6, 2014, at 4:29 PM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>>> 
>>>>>>> Matthew, thanks again for the response!
>>>>>>> 
>>>>>>> That said, I'll wait again for the 2.0 (and maybe buy some bigger disks 
>>>>>>> :)
>>>>>>> 
>>>>>>> Best regards
>>>>>>> 
>>>>>>> 
>>>>>>> On 6 April 2014 15:02, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>>>>>> Edgar,
>>>>>>> 
>>>>>>> In Riak 1.4, there is no advantage to using empty values versus 
>>>>>>> deleting.
>>>>>>> 
>>>>>>> leveldb is a "write once" data store.  New data for a given key never 
>>>>>>> physically overwrites old data for the same key.  New data "hides" the 
>>>>>>> old data by being in a lower level, and therefore picked first.
>>>>>>> 
>>>>>>> leveldb's compaction operation will remove older key/value pairs only 
>>>>>>> when the newer key/value is pair is part of a compaction involving both 
>>>>>>> new and old.  The new and the old key/value pairs must have migrated to 
>>>>>>> adjacent levels through normal compaction operations before leveldb 
>>>>>>> will see them in the same compaction.  The migration could take days, 
>>>>>>> weeks, or even months depending upon the size of your entire dataset 
>>>>>>> and the rate of incoming write operations.
>>>>>>> 
>>>>>>> leveldb's "delete" object is exactly the same as your empty JSON 
>>>>>>> object.  The delete object simply has one more flag set that allows it 
>>>>>>> to also be removed if and only if there is no chance for an identical 
>>>>>>> key to exist on a higher level.
>>>>>>> 
>>>>>>> I apologize that I cannot give you a more useful answer.  2.0 is on the 
>>>>>>> horizon.
>>>>>>> 
>>>>>>> Matthew
>>>>>>> 
>>>>>>> 
>>>>>>> On Apr 6, 2014, at 7:04 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi again!
>>>>>>>> 
>>>>>>>> Sorry to reopen this discussion, but I have another question regarding 
>>>>>>>> the former post.
>>>>>>>> 
>>>>>>>> What if, instead of doing a mass deletion (We've already seen that it 
>>>>>>>> will be non profitable, regarding disk space) I update all the values 
>>>>>>>> with an empty JSON object "{}" ? Do you see any problem with this? I 
>>>>>>>> no longer need those millions of values that are living in the 
>>>>>>>> cluster... 
>>>>>>>> 
>>>>>>>> When the version 2.0 of riak runs stable I'll do the update and only 
>>>>>>>> then delete those keys!
>>>>>>>> 
>>>>>>>> Best regards
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 18 February 2014 16:32, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>>>>> Ok, thanks a lot Matthew.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 18 February 2014 16:18, Matthew Von-Maszewski <matth...@basho.com> 
>>>>>>>> wrote:
>>>>>>>> Riak 2.0 is coming.  Hold your mass delete until then.  The "bug" is 
>>>>>>>> within Google's original leveldb architecture.  Riak 2.0 sneaks around 
>>>>>>>> to get the disk space freed.
>>>>>>>> 
>>>>>>>> Matthew
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Feb 18, 2014, at 11:10 AM, Edgar Veiga <edgarmve...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> The only/main purpose is to free disk space..
>>>>>>>>> 
>>>>>>>>> I was a little bit concerned regarding this operation, but now with 
>>>>>>>>> your feedback I'm tending to don't do nothing, I can't risk the 
>>>>>>>>> growing of space... 
>>>>>>>>> Regarding the overhead I think that with a tight throttling system I 
>>>>>>>>> could control and avoid overloading the cluster.
>>>>>>>>> 
>>>>>>>>> Mixed feelings :S
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 18 February 2014 15:45, Matthew Von-Maszewski <matth...@basho.com> 
>>>>>>>>> wrote:
>>>>>>>>> Edgar,
>>>>>>>>> 
>>>>>>>>> The first "concern" I have is that leveldb's delete does not free 
>>>>>>>>> disk space.  Others have executed mass delete operations only to 
>>>>>>>>> discover they are now using more disk space instead of less.  Here is 
>>>>>>>>> a discussion of the problem:
>>>>>>>>> 
>>>>>>>>> https://github.com/basho/leveldb/wiki/mv-aggressive-delete
>>>>>>>>> 
>>>>>>>>> The link also describes Riak's database operation overhead.  This is 
>>>>>>>>> a second "concern".  You will need to carefully throttle your delete 
>>>>>>>>> rate or the overhead will likely impact your production throughput.
>>>>>>>>> 
>>>>>>>>> We have new code to help quicken the actual purge of deleted data in 
>>>>>>>>> Riak 2.0.  But that release is not quite ready for production usage.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> What do you hope to achieve by the mass delete?
>>>>>>>>> 
>>>>>>>>> Matthew
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Feb 18, 2014, at 10:29 AM, Edgar Veiga <edgarmve...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Sorry, forgot that info!
>>>>>>>>>> 
>>>>>>>>>> It's leveldb.
>>>>>>>>>> 
>>>>>>>>>> Best regards
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 18 February 2014 15:27, Matthew Von-Maszewski 
>>>>>>>>>> <matth...@basho.com> wrote:
>>>>>>>>>> Which Riak backend are you using:  bitcask, leveldb, multi?
>>>>>>>>>> 
>>>>>>>>>> Matthew
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Feb 18, 2014, at 10:17 AM, Edgar Veiga <edgarmve...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> > Hi all!
>>>>>>>>>> >
>>>>>>>>>> > I have a fairly trivial question regarding mass deletion on a riak 
>>>>>>>>>> > cluster, but firstly let me give you just some context. My cluster 
>>>>>>>>>> > is running with riak 1.4.6 on 6 machines with a ring of 256 nodes 
>>>>>>>>>> > and 1Tb ssd disks.
>>>>>>>>>> >
>>>>>>>>>> > I need to execute a massive object deletion on a bucket, I'm 
>>>>>>>>>> > talking of ~1 billion keys (The object average size is ~1Kb). I 
>>>>>>>>>> > will not retrive the keys from riak because a I have a file with 
>>>>>>>>>> > all of them. I'll just start a script that reads them from the 
>>>>>>>>>> > file and triggers an HTTP DELETE for each one.
>>>>>>>>>> > The cluster will continue running on production with a quite high 
>>>>>>>>>> > load serving all other applications, while running this deletion.
>>>>>>>>>> >
>>>>>>>>>> > My question is simple, do I need to have any kind of extra 
>>>>>>>>>> > concerns regarding this action? Do you advise me on taking special 
>>>>>>>>>> > attention to any kind of metrics regarding riak or event the 
>>>>>>>>>> > servers where it's running?
>>>>>>>>>> >
>>>>>>>>>> > Best regards!
>>>>>>>>>> > _______________________________________________
>>>>>>>>>> > riak-users mailing list
>>>>>>>>>> > riak-users@lists.basho.com
>>>>>>>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
> 
> 
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: RIAK 1.4.6 - Mass key deletion

Reply via email to