Re: Tweaking non-bulk Ingest Performance

Eric Newton Wed, 14 Oct 2015 10:48:43 -0700

What version of accumulo?

Make sure you don't have any hotspots. For example, if you have data
ordered by time, that may cause one tablet to be much busier than the
others.

Pre-split your table(s) so that you have 20-80 tablets per tserver that you
will be ingesting into.

Use multiple writers per node (or increase the number of batchwriter
threads).

If you are using 1.7, decrease the Durability setting on your table.  That
may depend on your needs, of course.
Likewise, you can decrease the WAL replication down to 2, if you are
comfortable with that.

If you have multiple updates for the same row, make sure the column updates
are in the same mutation.

You should see about 100K ingest (for small updates) per node, per second,
sustained.

-Eric

On Wed, Oct 14, 2015 at 12:56 PM, Andrew Hulbert <[email protected]> wrote:

> Hi all,
>
> I've been attempting to improve a streaming ingest client into Accumulo
> and have been playing with a few of the following settings:
>
> tserver.memory.maps.max (and in tandem
> table.compaction.minor.logs.threshold and tserver.wal.blocksize)
> tserver.mutation.queue.max
>
> In one set of tests i stood up ~200 batch writers and wrote approx 250M
> tweets into a couple of different index schemas. What I've noticed is that
> increasing the tserver.memory.maps.max from 1G to 2G or 4G actually slows
> down my ingest rate. Cutting it to 512M forced lots of compactions and high
> server load but a faster ingest.
>
> I attached a screen shot of the two ingests (the
> tserver.mutation.queue.max=4G in green) (33 nodes, -Xmx26G, 8 CPU, 4 SSDs)
>
> My question is whether anyone has done any performance tweaking for
> non-bulk ingest on a cluster and understands why that'd be the case? I've
> read through all the docs/etc but haven't found a consistent methodology
> for tweaking params...so I was wondering if anyone else had attempted to
> tune a cluster like this.
>
> Thanks for any ideas!
>
> Andrew
>

Re: Tweaking non-bulk Ingest Performance

Reply via email to