Josh,
Following up on this earlier post about the proxy:
http://www.mail-archive.com/user%40accumulo.apache.org/msg03445.html
On 4/14/14, 1:38 PM, Josh Elser wrote:
If you care about maximizing your throughput, ingest is probably not
desirable through the proxy (you can probably get ~10x faster using the
Java BatchWriter API).
Hrm. 10x may have been overstating too. 5x is probably more accurate.
YMMV :)
Is there something more than the extra network hop that makes the proxy
slow? The proxy exposes a BatchWriter interface:
https://github.com/accumulo/pyaccumulo/blob/master/README.md#writing-mutations-with-a-batchwriter-batched-and-optimized-for-throughput
So, we can batch up multiple requests through the proxy. Is there
something else that is only available (only possible?) by going direct
instead of through the proxy?
For example, is there a logical difference between what can be done with
the Java BatchWriter API and this kind of batching loop running through
the thrift proxy:
https://github.com/diffeo/kvlayer/blob/master/kvlayer/_accumulo.py#L149
(Note the crude handling of the max thrift message size.)
If there is a logical difference, perhaps it would be worthwhile to
translate the Java BatchWriter into C so there can be native support for
C/C++/Python applications doing high-speed bulk ingest?
Thanks for your thoughts on this.
Regards,
John