There is also https://github.com/phrocker/apeirogon which was a start at
a C++ client to Accumulo. I'm not sure the state of it.
Thanks!
In short, it's possible, but like Eric said, the Java client does quite
a bit more than just RPC to other processes.
Yes, understood.
Looks like the client also has the necessary information for figuring out
which tablets are near the compute worker, which would enable bringing the
computational task to the data.
Regarding python+proxy, extra RPC is definitely a concern, but I'm not
sure how much of the performance decrease is the use of an
interpreted/dynamic language and how much is using the Proxy. I haven't
ever benchmarked the two to get a good understanding of where the extra
time is really spent.
Yes, its easy to waste a lot of cycles with trivial things like python
object creation. After figuring things out, it's often fruitful to
profile a system on real data and migrate select pieces to native
implementations in C/C++. This makes python the easily refactored glue
between optimized components.
For now, we're experimenting with running more proxies closer to the
compute workers. We are interested in a C++ client for Accumulo, which
could be made to expose python interfaces.
jrf