Currently using the default AccumuloOutputFormat settings which I believe is 2 threads
On Tue, Apr 8, 2014 at 5:36 PM, <[email protected]> wrote: > > > How many threads are you using in the AccumuloOutputFormat? What is your > latency set to? > > > > *From:* Adam Fuchs [mailto:[email protected]] > *Sent:* Tuesday, April 08, 2014 5:36 PM > *To:* [email protected] > *Subject:* Re: Advice on increasing ingest rate > > > > MIke, > > > > What version of Accumulo are you using, how many tablets do you have, and > how many threads are you using for minor and major compaction pools? Also, > how big are the keys and values that you are using? > > > > Here are a few settings that may help you: > > 1. WAL replication factor (tserver.wal.replication). This defaults to 3 > replicas (the HDFS default), but if you set it to 2 it will give you a > performance boost without a huge hit to reliability. > > 2. Ingest buffer size (tserver.memory.maps.max), also known as the > in-memory map size. Increasing this generally improves the efficiency of > minor compactions and reduces the number of major compactions that will be > required down the line. 4-8 GB is not unreasonable. > > 3. Make sure your WAL settings are such that the size of a log > (tserver.walog.max.size) multiplied by the number of active logs > (table.compaction.minor.logs.threshold) is greater than the in-memory map > size. You probably want to accomplish this by bumping up the number of > active logs. > > 4. Increase the buffer size on the BatchWriter that the clients use. This > can be done with the setBatchWriterOptions method on the > AccumuloOutputFormat. > > > > Cheers, > > Adam > > > > > > On Tue, Apr 8, 2014 at 4:47 PM, Mike Hugo <[email protected]> wrote: > > Hello, > > > > We have an ingest process that operates via Map Reduce, processing a large > set of XML files and inserting mutations based on that data into a set of > tables. > > > > On a 5 node cluster (each node has 64G ram, 20 cores, and ~600GB SSD) I > get 400k inserts per second with 20 mapper tasks running concurrently. > Increasing the number of concurrent mapper tasks to 40 doesn't have any > effect (besides causing a little more backup in compactions). > > > > I've increased the table.compaction.major.ratio and increased the number > of concurrent allowed compactions for both min and max compaction but each > of those only had negligible impact on ingest rates. > > > > Any advice on other settings I can tweak to get things to move more > quickly? Or is 400k/second a reasonable ingest rate? Are we at a point > where we should consider generating r files like the bulk ingest example? > > > > Thanks in advance for any advice. > > > > Mike > > >
