Its possible that it could run faster. There are two things that could enable this.
If you are using the native map, then its structured as map<row, map<col, val>>. For a mutation w/ multiple columns, it will lookup the row once to get map<col, val>. After that it will do inserts into the column map directly. I am not sure this will help much in your case since you probably have a shallow row tree and deep columns trees. Second the row is only sent once over the wire and only written once to the walog. You may see some benefit here. There is a simple test that ships w/ Accumulo in test/system/test_ingest.sh. This test writes 5 million mutations w/ one column. I ran the test varying the number of rows and columns keeping row*columns==5M. I used 1.6.0 w/ tserver.mutation.queue.max=4M. I ran these test on my workstation, so the walog was not written across a network. 5 million mutations, 1 col per mutation : ~26 secs 500,000 mutations, 10 col per mutation : ~16 secs 500 mutations, 10,000 col per mutation: ~13 secs It might be worth experimenting with. Keith On Thu, May 15, 2014 at 10:53 AM, Slater, David M. <[email protected]>wrote: > Hi, quick question, > > > > I’m attempting to optimize the ingest rates for a document-partitioned > table. I am currently presplitting the tables and have even spread of data > across tablet servers. However, I was wondering if changing the size of > mutations would have a major impact on the ingest rates. Currently, I’m > batchwriting with one mutation per document (fairly small documents, e.g. > tweets), but since everything is organized by bins, I could create much > larger mutations. Would there be a benefit on the ingest side to doing so, > such as reducing tablet contention? Or will that push the complexity to the > ingestors? > > > > Best, > > David >
