There is also https://github.com/phrocker/apeirogon which was a start at a C++ client to Accumulo. I'm not sure the state of it.

Thanks!


In short, it's possible, but like Eric said, the Java client does quite a bit more than just RPC to other processes.

Yes, understood.

Looks like the client also has the necessary information for figuring out which tablets are near the compute worker, which would enable bringing the computational task to the data.



Regarding python+proxy, extra RPC is definitely a concern, but I'm not sure how much of the performance decrease is the use of an interpreted/dynamic language and how much is using the Proxy. I haven't ever benchmarked the two to get a good understanding of where the extra time is really spent.

Yes, its easy to waste a lot of cycles with trivial things like python object creation. After figuring things out, it's often fruitful to profile a system on real data and migrate select pieces to native implementations in C/C++. This makes python the easily refactored glue between optimized components.

For now, we're experimenting with running more proxies closer to the compute workers. We are interested in a C++ client for Accumulo, which could be made to expose python interfaces.


jrf

Reply via email to