Paul, Here are a few suggestions:
1. Reduce the number of concurrent compaction threads (tserver.compaction.major.concurrent.max, and tserver.compaction.minor.concurrent.max). You probably want to lean towards twice as many major compaction threads as minor, but that somewhat depends on how bursty your ingest rate is. The total number of threads should leave plenty of cores for query processing. 2. Look into using a different compression codec. Snappy or LZz4 can support a much higher throughput that the default of gzip, although the compression ratio will not be as good. 3. Consider a key choice that limits the number of actively ingesting tablets. Writing across all ~100k tablets means they will all be actively compacting, but if you can arrange your keys such that only ~1k tablets are being actively written to then you can significantly cut your expected write amplification (i.e. number of major compactions needed). This is because minor compactions will be larger and you'll spend proportionally more time writing into smaller tablets. Cheers, Adam On Thu, Sep 11, 2014 at 12:06 PM, pdread <[email protected]> wrote: > > We have 100+ tablet servers, approx 860 tablets/server, ingest approx 300K+ > docs/day, the problem recently started that queries during a minor or major > compaction are taking about 100+ seconds as opposed to about 2 seconds when > no compaction. Everyone on the cluster is effected, mapreduce jobs and batch > scanners. > > One table has as many as 65K tablets. > > In the hopes of reducing the compactions yesterday we changed on 2 tables > that appeared to cause most of the compactions: > > compaction.ratio from 3 to 5 > table.file.max from 15 to 45 > split.threshold from 725M to 2G. > > tservers are set to 3G, top shows 6G res and 7G virt for the one I checked. > > The odd things is we expected the number of tablets to change and they did > not. The only thing that happened was the number of compactions went up but > the duration of the compactions went down by about half. Queries in off > times did not seem to change. > > One more thing, we only store docs < 64M in accumulo, otherwise they are > written directly to hdfs. > > The question would be, is there a way to reduce the compaction frequency and > or duration? > > Thanks in advance. > > Paul > > > > -- > View this message in context: > http://apache-accumulo.1065345.n5.nabble.com/Compaction-slowing-queries-tp11278.html > Sent from the Users mailing list archive at Nabble.com.
