Matt, Using the merge command can be a little tricky. Imagine your splits look like this for whatever table 1 is using 'scan -t accumulo.metadata -b 1; -e 2; -c ~tab'
1;a ~tab:~pr [] \x00 1;b ~tab:~pr [] \x01a 1;c ~tab:~pr [] \x01b 1;d ~tab:~pr [] \x01c 1;e ~tab:~pr [] \x01d 1;f ~tab:~pr [] \x01e 1;g ~tab:~pr [] \x01f 1;h ~tab:~pr [] \x01g 1;i ~tab:~pr [] \x01h 1;j ~tab:~pr [] \x01i 1;k ~tab:~pr [] \x01j 1< ~tab:~pr [] \x01k Merging will take the entire range you specify, and put all the splits into the last split. Say you want to merge split c and d into e. You might try this command, but it would be wrong 'merge -t test1 -b c -e e' because the -b is exclusive. That command would merge d into e but skip c. A correct command would be something like 'merge -t test1 -b b\x00 -e e' After this would end up with the following splits 1;a ~tab:~pr [] \x00 1;b ~tab:~pr [] \x01a 1;e ~tab:~pr [] \x01b 1;f ~tab:~pr [] \x01e 1;g ~tab:~pr [] \x01f 1;h ~tab:~pr [] \x01g 1;i ~tab:~pr [] \x01h 1;j ~tab:~pr [] \x01i 1;k ~tab:~pr [] \x01j 1< ~tab:~pr [] \x01k where the begin range is a value after the ~tab:~pr value for the range you want to start with. Also, merges can be really slow. Much of this comes from having to look at every rfile and doing a special "chop" compaction to pull out what doesn't belong in the merged range. This can be avoided if you do a full major compaction on the ranges you are going to merge. This get each split down to 1 rfile and no "chop" compaction is done. Suggest you start slow merging one split into another to get a feel for how it goes and how long it takes. Good luck. Mike On Mon, Jan 16, 2017 at 10:57 PM, Mike Drob <md...@mdrob.com> wrote: > I can't find this in the docs, but IIRC the merge command can take a > start/end range for what to merge. So the best option might be to try it on > a smaller slice and see what happens. At a guess, queries won't block but > indexing will. > > Mike > > On Mon, Jan 16, 2017 at 5:23 PM, Dickson, Matt MR < > matt.dick...@defence.gov.au> wrote: > >> *UNOFFICIAL* >> That looks like a great option. Before using it, whats the cost/impact >> of running this on a massive table in a system with other large bulk >> ingests/queries running? In the past when I have used that (which was in >> 2013 so things may have changed) all ingests were blocked and it took days >> to complete. >> >> With 1.07T tablets to work on this may take some time? >> >> >> ------------------------------ >> *From:* Mike Drob [mailto:md...@mdrob.com] >> *Sent:* Tuesday, 17 January 2017 09:37 >> *To:* user@accumulo.apache.org >> *Subject:* Re: Merging smaller/empty tablets [SEC=UNOFFICIAL] >> >> http://accumulo.apache.org/1.8/accumulo_user_manual.html#_merging_tablets >> >> In order to merge small tablets, you can ask Accumulo to merge sections >> of a table smaller than a given size. >> >> root@myinstance> merge -t myTable -s 100M >> >> >> >> On Mon, Jan 16, 2017 at 4:31 PM, Dickson, Matt MR < >> matt.dick...@defence.gov.au> wrote: >> >>> *UNOFFICIAL* >>> I have a table that has evolved to have 1.07T tablets and I fairly >>> confident a large portion of these are now empty or very small. I'd like >>> to merge smaller tablets and delete empty tablets, is there a smart way to >>> do this? >>> >>> My thought was to query the metadata table for all tablets under a >>> certain size for the table and then merge these tablets. >>> >>> Is the first number in thevalue the size of the tablet, ie >>> >>> > scan -b 1xk -e 1xk\xff -c file >>> 1xk;34234 file:hdfs//name/accumulo/tables/1xk/t-er23423/M423432.rf >>> [] *213134*,234234 >>> >>> Also, are there any side effects of this that I need to be aware of when >>> doing this on a massive table? >>> >>> Thanks in advance, >>> Matt >>> >> >> >