Going through the proxy will always be an extra RPC step over using a Java client. Eliminating that step, I think, would net the most benefit.
On Mon, Aug 11, 2014 at 12:16 AM, John R. Frank <[email protected]> wrote: > > Josh, > > Following up on this earlier post about the proxy: > > http://www.mail-archive.com/user%40accumulo.apache.org/msg03445.html > > > > On 4/14/14, 1:38 PM, Josh Elser wrote: > > If you care about maximizing your throughput, ingest is probably not >> desirable through the proxy (you can probably get ~10x faster using the >> Java BatchWriter API). >> > > Hrm. 10x may have been overstating too. 5x is probably more accurate. >> YMMV :) >> > > > > Is there something more than the extra network hop that makes the proxy > slow? The proxy exposes a BatchWriter interface: > > https://github.com/accumulo/pyaccumulo/blob/master/README. > md#writing-mutations-with-a-batchwriter-batched-and- > optimized-for-throughput > > So, we can batch up multiple requests through the proxy. Is there > something else that is only available (only possible?) by going direct > instead of through the proxy? > > For example, is there a logical difference between what can be done with > the Java BatchWriter API and this kind of batching loop running through the > thrift proxy: > > https://github.com/diffeo/kvlayer/blob/master/kvlayer/_accumulo.py#L149 > > (Note the crude handling of the max thrift message size.) > > If there is a logical difference, perhaps it would be worthwhile to > translate the Java BatchWriter into C so there can be native support for > C/C++/Python applications doing high-speed bulk ingest? > > > Thanks for your thoughts on this. > > > Regards, > John >
