The addMutations method blocks when the client-side buffer fills up, so you may see a lot of time spent in that method due to a bottleneck downstream. There are a number of things you could try to speed that up. Here are a few: 1. Increase the BatchWriter's buffer size. This can smooth out the network utilization and increase efficiency. 2. Increase the number of threads that the BatchWriter uses to process mutations. This is particularly useful if you have more tablet servers than ingest clients. 3. Use a more efficient encoding. The more data you put through the BatchWriter, the longer it will take, even if that data compresses well at rest. 4. If you are seeing hold time show up on your tablet servers (displayed through the monitor page) you can increase the memory.maps.max to make minor compactions more efficient.
Cheers, Adam On Sep 18, 2013 10:08 PM, "Slater, David M." <[email protected]> wrote: > Hi, I’m running a single-threaded ingestion program that takes data from > an input source, parses it into mutations, and then writes those mutations > (sequentially) to four different BatchWriters (all on different tables). > Most of the time (95%) taken is on adding mutations, e.g. > batchWriter.addMutations(mutations); I am wondering how to reduce the time > taken by these methods. **** > > ** ** > > 1) For the method batchWriter.addMutations(Iterable<Mutation>), does it > matter for performance whether the mutations returned by the iterator are > sorted in lexicographic order? **** > > ** ** > > 2) If the Iterable<Mutation> that I pass to the BatchWriter is very large, > will I need to wait for a number of Batches to be written and flushed > before it will finish iterating, or does it transfer the elements of the > Iterable to a different intermediate list?**** > > ** ** > > 3) If that is the case, would it then make sense to spawn off short > threads for each time I make use of addMutations?**** > > ** ** > > At a high level, my code looks like this:**** > > ** ** > > BatchWriter bw1 = connector.createBatchWriter(…)**** > > BatchWriter bw2 = …**** > > …**** > > while(true) {**** > > String[] data = input.getData();**** > > List<Mutation> mutations1 = parseData1(data);**** > > List<Mutation> mutations2 = parseData2(data);**** > > …**** > > bw1.addMutations(mutations1);**** > > bw2.addMutations(mutations2);**** > > …**** > > }**** > > **** > > Thanks, > David**** >
