Thanks Keith, I'm looking at it now. It appears like what I would want. As for 
the proper usage...

Would I create one using the Connector,
then .getBatchWriter() for each of the tables I'm interested in,
add data to each of BatchWriters returned,
and then hit flush() when I want to write all of that to get written?

Would the individual batch writers spawned by the multiTableBatchWriter still 
have synchronized addMutations() methods so I would have to worry about 
blocking still, or would that all happen at the flush() method?

From: Keith Turner [mailto:[email protected]]
Sent: Thursday, September 19, 2013 12:39 PM
To: [email protected]
Subject: Re: BatchWriter performance on 1.4

Are you aware of the multi table batch writer?  I am not sure if it would be 
useful, but wanted to make sure you knew about it.   It will use the same 
thread pool to process mutations for multiple tables.  Also it will batch 
mutations for multiple tablets into the same rpc calls.

On Wed, Sep 18, 2013 at 5:07 PM, Slater, David M. 
<[email protected]<mailto:[email protected]>> wrote:
Hi, I'm running a single-threaded ingestion program that takes data from an 
input source, parses it into mutations, and then writes those mutations 
(sequentially) to four different BatchWriters (all on different tables). Most 
of the time (95%) taken is on adding mutations, e.g. 
batchWriter.addMutations(mutations); I am wondering how to reduce the time 
taken by these methods.

1) For the method batchWriter.addMutations(Iterable<Mutation>), does it matter 
for performance whether the mutations returned by the iterator are sorted in 
lexicographic order?

2) If the Iterable<Mutation> that I pass to the BatchWriter is very large, will 
I need to wait for a number of Batches to be written and flushed before it will 
finish iterating, or does it transfer the elements of the Iterable to a 
different intermediate list?

3) If that is the case, would it then make sense to spawn off short threads for 
each time I make use of addMutations?

At a high level, my code looks like this:

BatchWriter bw1 = connector.createBatchWriter(...)
BatchWriter bw2 = ...
...
while(true) {
String[] data = input.getData();
List<Mutation> mutations1 = parseData1(data);
                List<Mutation> mutations2 = parseData2(data);
                ...
                bw1.addMutations(mutations1);
                bw2.addMutations(mutations2);
                ...
}
Thanks,
David

Reply via email to