RE: How to speed up table compaction [SEC=UNOFFICIAL]

Dickson, Matt MR Tue, 06 May 2014 16:12:10 -0700

UNOFFICIAL

To sumamrise the outcome of this.


In the past 24hours the compactions have reduced from 58K to 6K primarily due 
to increasing the tserver.compaction.major.concurrent.max to 18 (requires all 
tservers to be restarted).  The default is 3 so initially I had only pushed it 
up to 10 to leave head room for other processes but had not seen major 
increases in performance.

Due to disk space constraints I left the tserver.file.compress.type as gz.  The 
other settings altered were tserver.cache.data.size, tserver.cache.data.size, 
tserver.memory.maps.max and tserver.walog.max.size.  We increased all these to 
maximise memory usage but aren't confident these had a big impact on the 
compaction progress.

For completeness, the reason I had to run a table compact was because there 
were tablets on the table that were receiving no new data and therefore the 
ageoff filter was never being applied due to no compactions being triggered on 
these tablets. I posted a question in the user group "Identify tablets with no 
new data loaded" on 30/4/14 trying to find a clean way to identify these 
tablets via timestamps in the metadata table.  The goal was to only compact the 
necessary ranges rather than this approach which is quite heavy handed.  I'm 
still keen to persue an idea related to inspecting the files in hdfs to get the 
time of the last compact, but that's a topic for another post.  Any suggestions 
are welcome.

Thanks again to everyone for all the feedback.

________________________________
From: David Medinets [mailto:[email protected]]
Sent: Wednesday, 7 May 2014 01:52
To: accumulo-user
Subject: Re: How to speed up table compaction [SEC=UNOFFICIAL]

Can that property be changed on the fly? And Snappy needs to . On Tue, May 6, 
2014 at 12:43 AM, Josh Elser 
<[email protected]<mailto:[email protected]>> wrote:

Depending on the CPU/IO ratio for your system, switching to a different 
compression codec might help. Snappy tends to be a bit quicker writing out data 
as opposed to gzip at the cost of being larger on disk. The increase in final 
size on disk might be prohibitive depending on your requirements though.

I forget the table property off hand, but, if you haven't changed this already, 
it will be the property with a defauly value of 'gz' :)

On May 5, 2014 6:43 PM, "Dickson, Matt MR" 
<[email protected]<mailto:[email protected]>> wrote:

UNOFFICIAL

I'm trying to compact a table and have had a queue of 60,000 compactions with 
only 800 running at a time.  This has now been running for 4 days and only 
decreased the queue by 8,000.

I've increased tserver.compaction.major.concurrent.max=12 and stopped all 
ingest but not seen a change in progress.  Are there other accumulo settings I 
can alter to improve this?  I also saw 
tserver.compaction.major.thread.files.open.max=10 should this be increased?

Thanks in advance,
Matt

RE: How to speed up table compaction [SEC=UNOFFICIAL]

Reply via email to