On Thu, Apr 21, 2011 at 11:25 PM, Dmitriy Lyubimov <[email protected]> wrote: > I certainly would. Even more, i already read the code there just a > bit although not enough to understand where the efficiency comes from. > Do you actually implement another version of RPC on non-blocking > sockets there?
asynchbase implements the HBase RPC protocol in a different way, it's written from scratch. It uses Netty and is fully asynchronous and non-blocking. That's where the efficiency comes from. At StumbleUpon I've used it to push 200,000 edits/s to just 3 RegionServers. I never got even close to this with HTable. > Unfortunately i can't do much more than just 'try' any time soon as > the codebase is rather tightly coupled with current client. Migrating > code and tests would not be a one day effort. However, i certainly can > try with one test. Right, asynchbase has a different API than HTable, so it's not a drop-in replacement. I assumed that since you were talking about a short 40-row scan you could write a little Java program with a small main() that reproduces the issue, and then it would be much easier to try with asynchbase since you'd need to change about 5-10 lines of code only. > But if only i could solve the mysterious tcp lag in the datacenter, i > suspect my needs would be quite well covered with the standard client > as well. I doubt you have a "mysterious TCP lag". Please provide a trace with tcpdump if you still believe you do. On Thu, Apr 21, 2011 at 11:42 PM, Dmitriy Lyubimov <[email protected]> wrote: > Exactly. that's why i said 'for short scans and gets' and perhaps a > combo. As soon as it exceeds a frame, we'd rather not to mess with > reassembly. But I agree it is most likely not worth it. Most likely > reason for my latencies is not this. It's not that simple. Even if you get just one row, that row might be bigger than 1460 bytes, you don't know. The client cannot predict how much data even a simple "get" might return. You then potentially need to handle packet re-ordering. And either way you would also still need to discover and handle packet loss. I'm not even talking about checksumming. Modern TCP implementations are very efficient. You really do not want to use UDP to talk to a database, unless you spent a very significant amount of engineering time building a reliable protocol on top of it, as Ted mentioned. In which case, in 99% of the cases, you're better off with TCP anyway, because your reliable protocol is almost certainly not going to be as reliable and efficient as, say, Linux's TCP implementation. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
