From: "tsuna" <[email protected]>
Sent: Friday, September 10, 2010 12:41 AM

Having more TCP connections makes the code more complicated (since you
need to manage them all, implement a scheme to try to use them in a
round-robin fashion, etc).  It can also put more strain on some
network gear or OS components.  For instance we had a problem recently
at StumbleUpon where we realized that some of our webservers had
iptables connection tracking enabled (even though it wasn't doing
anything and there was no custom iptables rule).  When we added some
memcache instances, iptables was having a hard time keeping track of
the tens of thousands of sockets the OS was dealing with, and
significantly slowing down the machine.  We had to disable and rmmod
it (we didn't need it anyway).

Further, re-using the same TCP connection over and over again has the
advantage of letting TCP quickly increase the receive window on both
sides of the connection.  This definitely helps getting more
throughput due to the slow-start nature of TCP (doubly so if you use
the default TCP settings on Linux, which aren't optimized for
high-speed reliable gigabit networks).

In my recent loadtests on my HBase-heavy application (be it with
HBase's traditional client or with asynchbase) I've always been CPU
bound (except sometimes HBase's traditional client incurs too much
lock contention to really max out the CPU cores, but this is entirely
unrelated to the code you're quoting above).

Thank you for sharing your precious experience and knowledge. I understood. I'm relieved to know that many threads in one HBase client process can max out CPUs in most cases. I'm sorry to have interrupted discussion.

I'm more and more attracted by HBase. Reading the code is fun as the code is clean and relatively easy to understand. I hope the solution of this problem will be found in the application code.

Regards,
Maumau

Reply via email to