What version of accumulo? Make sure you don't have any hotspots. For example, if you have data ordered by time, that may cause one tablet to be much busier than the others.
Pre-split your table(s) so that you have 20-80 tablets per tserver that you will be ingesting into. Use multiple writers per node (or increase the number of batchwriter threads). If you are using 1.7, decrease the Durability setting on your table. That may depend on your needs, of course. Likewise, you can decrease the WAL replication down to 2, if you are comfortable with that. If you have multiple updates for the same row, make sure the column updates are in the same mutation. You should see about 100K ingest (for small updates) per node, per second, sustained. -Eric On Wed, Oct 14, 2015 at 12:56 PM, Andrew Hulbert <[email protected]> wrote: > Hi all, > > I've been attempting to improve a streaming ingest client into Accumulo > and have been playing with a few of the following settings: > > tserver.memory.maps.max (and in tandem > table.compaction.minor.logs.threshold and tserver.wal.blocksize) > tserver.mutation.queue.max > > In one set of tests i stood up ~200 batch writers and wrote approx 250M > tweets into a couple of different index schemas. What I've noticed is that > increasing the tserver.memory.maps.max from 1G to 2G or 4G actually slows > down my ingest rate. Cutting it to 512M forced lots of compactions and high > server load but a faster ingest. > > I attached a screen shot of the two ingests (the > tserver.mutation.queue.max=4G in green) (33 nodes, -Xmx26G, 8 CPU, 4 SSDs) > > My question is whether anyone has done any performance tweaking for > non-bulk ingest on a cluster and understands why that'd be the case? I've > read through all the docs/etc but haven't found a consistent methodology > for tweaking params...so I was wondering if anyone else had attempted to > tune a cluster like this. > > Thanks for any ideas! > > Andrew >
