Josh,

Following up on this earlier post about the proxy:

http://www.mail-archive.com/user%40accumulo.apache.org/msg03445.html



On 4/14/14, 1:38 PM, Josh Elser wrote:

If you care about maximizing your throughput, ingest is probably not desirable through the proxy (you can probably get ~10x faster using the Java BatchWriter API).

Hrm. 10x may have been overstating too. 5x is probably more accurate. YMMV :)



Is there something more than the extra network hop that makes the proxy slow? The proxy exposes a BatchWriter interface:

https://github.com/accumulo/pyaccumulo/blob/master/README.md#writing-mutations-with-a-batchwriter-batched-and-optimized-for-throughput

So, we can batch up multiple requests through the proxy. Is there something else that is only available (only possible?) by going direct instead of through the proxy?

For example, is there a logical difference between what can be done with the Java BatchWriter API and this kind of batching loop running through the thrift proxy:

https://github.com/diffeo/kvlayer/blob/master/kvlayer/_accumulo.py#L149

(Note the crude handling of the max thrift message size.)

If there is a logical difference, perhaps it would be worthwhile to translate the Java BatchWriter into C so there can be native support for C/C++/Python applications doing high-speed bulk ingest?


Thanks for your thoughts on this.


Regards,
John

Reply via email to