If you don't want it to wait a long time before writing, then set the maxLatency lower. That is the entire reason for that setting.
On Fri, Sep 20, 2013 at 12:47 PM, Slater, David M. <[email protected]>wrote: > I was using flush() after sending a bunch of mutations to the batchwriters > to limit their latency. I thought it would normally flush the buffer to > ensure that the maxLatency is not violated. If the maxLatency is quite > large, how do I ensure that it doesn’t wait a long time before writing? ** > ** > > ** ** > > If the returned batchscanners are all thread safe, then I’m still going to > have the bottleneck of their synchronized addMutations method, correct?*** > * > > ** ** > > I’m looking for “org.apache.accumulo.client.impl” in the > log4j.properties, generic_logger.xml the and other config files, but can’t > locate it. Do I need to create a new entry for it there?**** > > ** ** > > Thanks, > David**** > > ** ** > > *From:* Keith Turner [mailto:[email protected]] > *Sent:* Thursday, September 19, 2013 7:01 PM > > *To:* [email protected] > *Subject:* Re: BatchWriter performance on 1.4**** > > ** ** > > On Thu, Sep 19, 2013 at 5:08 PM, Slater, David M. <[email protected]> > wrote:**** > > Thanks Keith, I’m looking at it now. It appears like what I would want. As > for the proper usage…**** > > **** > > Would I create one using the Connector, **** > > then .getBatchWriter() for each of the tables I’m interested in,**** > > add data to each of BatchWriters returned,**** > > ** ** > > yes.**** > > **** > > and then hit flush() when I want to write all of that to get written?**** > > ** ** > > Why are you calling flush() ? Doing this frequently will increase rpc > overhead and lower throughput.**** > > **** > > **** > > Would the individual batch writers spawned by the multiTableBatchWriter > still have synchronized addMutations() methods so I would have to worry > about blocking still, or would that all happen at the flush() method?**** > > ** ** > > The returned batch writers are thread safe. They all add to the same > queue/buffer in a synchronized manner. Calling flush() on any of the > batch writers returned from getBatchWriter() will block the others. **** > > ** ** > > If you enable set the log4j log level to TRACE for > org.apache.accumulo.client.impl you can see output like the following. > Binning is the process of taking each mutation and deciding which tablet > and tablet server it goes to.**** > > ** ** > > 2013-09-19 18:43:37,261 [impl.ThriftTransportPool] TRACE: Using existing > connection to 127.0.0.1:9997**** > > 2013-09-19 18:43:37,393 [impl.TabletLocatorImpl] TRACE: tid=12 oid=13 > Binning 80909 mutations for table 3**** > > 2013-09-19 18:43:37,402 [impl.TabletLocatorImpl] TRACE: tid=12 oid=13 > Binned 80909 mutations for table 3 to 1 tservers in 0.009 secs**** > > 2013-09-19 18:43:37,402 [impl.TabletServerBatchWriter] TRACE: Started > sending 80,909 mutations to 1 tablet servers**** > > 2013-09-19 18:43:37,656 [impl.ThriftTransportPool] TRACE: Returned > connection 127.0.0.1:9997 (120000) ioCount : 1459116**** > > 2013-09-19 18:43:37,657 [impl.TabletServerBatchWriter] TRACE: sent > 80,909 mutations to 127.0.0.1:9997 in 0.40 secs (204,832.91 > mutations/sec) with 0 failures**** > > ** ** > > When you close the batch writer, it will log some summary stats like the > following. **** > > ** ** > > ** ** > > 2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: **** > > 2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: TABLET > SERVER BATCH WRITER STATISTICS**** > > 2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: Added > : 1,000,000 mutations**** > > 2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: Sent > : 1,000,000 mutations**** > > 2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: Resent > percentage : 0.00%**** > > 2013-09-19 18:43:39,150 [impl.TabletServerBatchWriter] TRACE: Overall > time : 5.94 secs**** > > 2013-09-19 18:43:39,150 [impl.TabletServerBatchWriter] TRACE: Overall > send rate : 168,406.87 mutations/sec**** > > 2013-09-19 18:43:39,150 [impl.TabletServerBatchWriter] TRACE: Send > efficiency : 86.91%**** > > 2013-09-19 18:43:39,150 [impl.TabletServerBatchWriter] TRACE: **** > > 2013-09-19 18:43:39,150 [impl.TabletServerBatchWriter] TRACE: BACKGROUND > WRITER PROCESS STATISTICS**** > > 2013-09-19 18:43:39,150 [impl.TabletServerBatchWriter] TRACE: Total send > time : 5.16 secs 86.91%**** > > 2013-09-19 18:43:39,150 [impl.TabletServerBatchWriter] TRACE: Average > send rate : 193,760.90 mutations/sec**** > > 2013-09-19 18:43:39,151 [impl.TabletServerBatchWriter] TRACE: Total bin > time : 0.46 secs 7.81%**** > > 2013-09-19 18:43:39,151 [impl.TabletServerBatchWriter] TRACE: Average > bin rate : 2,155,172.41 mutations/sec**** > > 2013-09-19 18:43:39,151 [impl.TabletServerBatchWriter] TRACE: tservers > per batch : 1.00 avg 1 min 1 max**** > > 2013-09-19 18:43:39,151 [impl.TabletServerBatchWriter] TRACE: tablets > per batch : 1.00 avg 1 min 1 max**** > > 2013-09-19 18:43:39,151 [impl.TabletServerBatchWriter] TRACE: **** > > 2013-09-19 18:43:39,151 [impl.TabletServerBatchWriter] TRACE: SYSTEM > STATISTICS**** > > 2013-09-19 18:43:39,151 [impl.TabletServerBatchWriter] TRACE: JVM GC > Time : 0.53 secs**** > > 2013-09-19 18:43:39,152 [impl.TabletServerBatchWriter] TRACE: JVM > Compile Time : 1.60 secs**** > > 2013-09-19 18:43:39,152 [impl.TabletServerBatchWriter] TRACE: System > load average : initial= 0.22 final= 0.20**** > > ** ** > > What do these numbers look like for you?**** > > **** > > Keith**** > > ** ** > > **** > > *From:* Keith Turner [mailto:[email protected]] > *Sent:* Thursday, September 19, 2013 12:39 PM > *To:* [email protected]**** > > > *Subject:* Re: BatchWriter performance on 1.4**** > > **** > > Are you aware of the multi table batch writer? I am not sure if it would > be useful, but wanted to make sure you knew about it. It will use the > same thread pool to process mutations for multiple tables. Also it will > batch mutations for multiple tablets into the same rpc calls.**** > > **** > > On Wed, Sep 18, 2013 at 5:07 PM, Slater, David M. <[email protected]> > wrote:**** > > Hi, I’m running a single-threaded ingestion program that takes data from > an input source, parses it into mutations, and then writes those mutations > (sequentially) to four different BatchWriters (all on different tables). > Most of the time (95%) taken is on adding mutations, e.g. > batchWriter.addMutations(mutations); I am wondering how to reduce the time > taken by these methods. **** > > **** > > 1) For the method batchWriter.addMutations(Iterable<Mutation>), does it > matter for performance whether the mutations returned by the iterator are > sorted in lexicographic order? **** > > **** > > 2) If the Iterable<Mutation> that I pass to the BatchWriter is very large, > will I need to wait for a number of Batches to be written and flushed > before it will finish iterating, or does it transfer the elements of the > Iterable to a different intermediate list?**** > > **** > > 3) If that is the case, would it then make sense to spawn off short > threads for each time I make use of addMutations?**** > > **** > > At a high level, my code looks like this:**** > > **** > > BatchWriter bw1 = connector.createBatchWriter(…)**** > > BatchWriter bw2 = …**** > > …**** > > while(true) {**** > > String[] data = input.getData();**** > > List<Mutation> mutations1 = parseData1(data);**** > > List<Mutation> mutations2 = parseData2(data);**** > > …**** > > bw1.addMutations(mutations1);**** > > bw2.addMutations(mutations2);**** > > …**** > > }**** > > Thanks, > David**** > > **** > > ** ** >
