Re: Merging smaller/empty tablets [SEC=UNOFFICIAL]

Michael Wall Tue, 17 Jan 2017 06:48:33 -0800

Matt,

Using the merge command can be a little tricky.  Imagine your splits look
like this for whatever table 1 is using 'scan -t accumulo.metadata -b 1; -e
2;  -c ~tab'


1;a ~tab:~pr []    \x00
1;b ~tab:~pr []    \x01a
1;c ~tab:~pr []    \x01b
1;d ~tab:~pr []    \x01c
1;e ~tab:~pr []    \x01d
1;f ~tab:~pr []    \x01e
1;g ~tab:~pr []    \x01f
1;h ~tab:~pr []    \x01g
1;i ~tab:~pr []    \x01h
1;j ~tab:~pr []    \x01i
1;k ~tab:~pr []    \x01j
1< ~tab:~pr []    \x01k

Merging will take the entire range you specify, and put all the splits into
the last split.  Say you want to merge split c and d into e.  You might try
this command, but it would be wrong

'merge -t test1 -b c -e e'

because the -b is exclusive.  That command would merge d into e but skip
c.  A correct command would be something like

'merge -t test1 -b b\x00 -e e'

After this would end up with the following splits

1;a ~tab:~pr []    \x00
1;b ~tab:~pr []    \x01a
1;e ~tab:~pr []    \x01b
1;f ~tab:~pr []    \x01e
1;g ~tab:~pr []    \x01f
1;h ~tab:~pr []    \x01g
1;i ~tab:~pr []    \x01h
1;j ~tab:~pr []    \x01i
1;k ~tab:~pr []    \x01j
1< ~tab:~pr []    \x01k

where the begin range is a value after the ~tab:~pr value for the range you
want to start with.

Also, merges can be really slow.  Much of this comes from having to look at
every rfile and doing a special "chop" compaction to pull out what doesn't
belong in the merged range.  This can be avoided if you do a full major
compaction on the ranges you are going to merge.  This get each split down
to 1 rfile and no "chop" compaction is done.

Suggest you start slow merging one split into another to get a feel for how
it goes and how long it takes.

Good luck.

Mike

On Mon, Jan 16, 2017 at 10:57 PM, Mike Drob <md...@mdrob.com> wrote:

> I can't find this in the docs, but IIRC the merge command can take a
> start/end range for what to merge. So the best option might be to try it on
> a smaller slice and see what happens. At a guess, queries won't block but
> indexing will.
>
> Mike
>
> On Mon, Jan 16, 2017 at 5:23 PM, Dickson, Matt MR <
> matt.dick...@defence.gov.au> wrote:
>
>> *UNOFFICIAL*
>> That looks like a great option.  Before using it, whats the cost/impact
>> of running this on a massive table in a system with other large bulk
>> ingests/queries running?  In the past when I have used that (which was in
>> 2013 so things may have changed) all ingests were blocked and it took days
>> to complete.
>>
>> With 1.07T tablets to work on this may take some time?
>>
>>
>> ------------------------------
>> *From:* Mike Drob [mailto:md...@mdrob.com]
>> *Sent:* Tuesday, 17 January 2017 09:37
>> *To:* user@accumulo.apache.org
>> *Subject:* Re: Merging smaller/empty tablets [SEC=UNOFFICIAL]
>>
>> http://accumulo.apache.org/1.8/accumulo_user_manual.html#_merging_tablets
>>
>> In order to merge small tablets, you can ask Accumulo to merge sections
>> of a table smaller than a given size.
>>
>> root@myinstance> merge -t myTable -s 100M
>>
>>
>>
>> On Mon, Jan 16, 2017 at 4:31 PM, Dickson, Matt MR <
>> matt.dick...@defence.gov.au> wrote:
>>
>>> *UNOFFICIAL*
>>> I have a table that has evolved to have 1.07T tablets and I fairly
>>> confident a large portion of these are now empty or very small.  I'd like
>>> to merge smaller tablets and delete empty tablets, is there a smart way to
>>> do this?
>>>
>>> My thought was to query the metadata table for all tablets under a
>>> certain size for the table and then merge these tablets.
>>>
>>> Is the first number in thevalue the size of the tablet, ie
>>>
>>> >  scan -b 1xk -e 1xk\xff -c file
>>> 1xk;34234 file:hdfs//name/accumulo/tables/1xk/t-er23423/M423432.rf
>>> []    *213134*,234234
>>>
>>> Also, are there any side effects of this that I need to be aware of when
>>> doing this on a massive table?
>>>
>>> Thanks in advance,
>>> Matt
>>>
>>
>>
>

Re: Merging smaller/empty tablets [SEC=UNOFFICIAL]

Reply via email to