Keith I tried tserver.mutation.queue.max=4M and it improved but by no where near a significant difference. I my app records get turned into multiple Accumulo rows.
So in terms of my record write rate. wal=true & mutation.queue.max = 256K | ~8K records/s wal=true & mutation.queue.max = 4M | ~14K records/s wal=false | ~25K records/s Adam, Its one box so replication is off, good thought tnx. BTW - I've been plying around with ZFS compression vs Accumulo Snappy. What I've found was quite interesting. The idea was that with ZFS dedup and being in charge of compression I'd get a boost later on when blocks merge. What I've found is that after a while with ZFS LZ4 the CPU and disk all tail off, as though timeouts are elapsing somewhere whereas SNAPPY maintains an average ~20k+. Anyway tnx and if I get a chance I may the 1.7 branch for the fix. On Wednesday, 4 December 2013, 14:56, Adam Fuchs <[email protected]> wrote: One thing you can do is reduce the replication factor for the WAL. We have found that makes a pretty significant different in write performance. That can be modified with the tserver.wal.replication property. Setting it to 2 instead of the default (probably 3) should give you some performance improvement, of course at some cost to durability. Adam On Wed, Dec 4, 2013 at 5:14 AM, Peter Tillotson <[email protected]> wrote: I've been trying to get the most out of streaming data into Accumulo 1.5 (Hadoop Cloudera CDH4). Having tried a number of settings, re-writing client code etc I finally switched off the Write Ahead Log (table.walog.enabled=false) and saw a huge leap in ingest performance. > > >Ingest with table.walog.enabled= true: ~6 MB/s >Ingest with table.walog.enabled= false: ~28 MB/s > > > >That is a factor of about x4.67 speed improvement. > > >Now my use case could probably live without or work around not having a wal, >but I wondered if this was a known issue?? >(didn't see anything in jira), wal seem to be a significant rate limiter this >is either endemic to Accumulo or an HDFS / setup issue. Though given >everything is in HDFS these days and otherwise IO flies it looks like Accumulo >WAL is the most likely culprit. > > >I don't believe this to be an IO issue on the box, with wal off the is >significantly more IO (up to 80M/s reported by dstat), with wal on (up to >12M/s reported by dstat). Testing the box with FIO sequential write is 160M/s. > > >Further info: >Hadoop 2.00 (Cloudera cdh4) >Accumulo (1.5.0) >Zookeeper ( with Netty, minor improvement of <1MB/s ) >Filesystem ( HDFS is ZFS, compression=on, dedup=on, otherwise ext4 ) > > >With large imports from scratch now I start off CPU bound and as more >shuffling is needed this becomes Disk bound later in the import as expected. >So I know pre-splitting would probably sort it. > > >Tnx > > >P
