On Tue, Apr 8, 2014 at 4:35 PM, Adam Fuchs <[email protected]> wrote:
> MIke, > > What version of Accumulo are you using, how many tablets do you have, and > how many threads are you using for minor and major compaction pools? Also, > how big are the keys and values that you are using? > > 1.4.5 6 threads each for min and major compaction Keys and values are not that large, there may be a few outliers but I would estimate that most of them are < 1k > Here are a few settings that may help you: > 1. WAL replication factor (tserver.wal.replication). This defaults to 3 > replicas (the HDFS default), but if you set it to 2 it will give you a > performance boost without a huge hit to reliability. > 2. Ingest buffer size (tserver.memory.maps.max), also known as the > in-memory map size. Increasing this generally improves the efficiency of > minor compactions and reduces the number of major compactions that will be > required down the line. 4-8 GB is not unreasonable. > 3. Make sure your WAL settings are such that the size of a log > (tserver.walog.max.size) multiplied by the number of active logs > (table.compaction.minor.logs.threshold) is greater than the in-memory map > size. You probably want to accomplish this by bumping up the number of > active logs. > 4. Increase the buffer size on the BatchWriter that the clients use. This > can be done with the setBatchWriterOptions method on the > AccumuloOutputFormat. > > Thanks for the tips, I try these out > Cheers, > Adam > > > > On Tue, Apr 8, 2014 at 4:47 PM, Mike Hugo <[email protected]> wrote: > >> Hello, >> >> We have an ingest process that operates via Map Reduce, processing a >> large set of XML files and inserting mutations based on that data into a >> set of tables. >> >> On a 5 node cluster (each node has 64G ram, 20 cores, and ~600GB SSD) I >> get 400k inserts per second with 20 mapper tasks running concurrently. >> Increasing the number of concurrent mapper tasks to 40 doesn't have any >> effect (besides causing a little more backup in compactions). >> >> I've increased the table.compaction.major.ratio and increased the number >> of concurrent allowed compactions for both min and max compaction but each >> of those only had negligible impact on ingest rates. >> >> Any advice on other settings I can tweak to get things to move more >> quickly? Or is 400k/second a reasonable ingest rate? Are we at a point >> where we should consider generating r files like the bulk ingest example? >> >> Thanks in advance for any advice. >> >> Mike >> > >
