Also the native map is a Map<row,Map<column, val>> ... when doing updates for a mutation, it gets the Map<column, val> once and uses that. This can be much faster than a Map<Key, Value>, because for Map<Key,Value> each insert may have to traverse a deeper tree than inserting into Map<column, val>.
On Wed, Nov 30, 2016 at 11:47 PM, Dylan Hutchison <dhutc...@cs.washington.edu> wrote: > Hi folks, > > I'd like to share a tip that ~doubled BatchWriter ingest performance in my > application. > > When inserting multiple entries to the same Accumulo row, put them into the > same Mutation object. Add that one large Mutation to a BatchWriter rather > than an individual Mutation for each entry. The result reduces the amount of > data transferred. > > The tip seems obvious enough, but hey, I used Accumulo for a couple years > without realizing it, so I thought y'all might benefit too. > > Enjoy! > Dylan