From: "tsuna" <[email protected]> Sent: Friday, September 10, 2010 12:41 AM
Having more TCP connections makes the code more complicated (since you need to manage them all, implement a scheme to try to use them in a round-robin fashion, etc). It can also put more strain on some network gear or OS components. For instance we had a problem recently at StumbleUpon where we realized that some of our webservers had iptables connection tracking enabled (even though it wasn't doing anything and there was no custom iptables rule). When we added some memcache instances, iptables was having a hard time keeping track of the tens of thousands of sockets the OS was dealing with, and significantly slowing down the machine. We had to disable and rmmod it (we didn't need it anyway).
Further, re-using the same TCP connection over and over again has the advantage of letting TCP quickly increase the receive window on both sides of the connection. This definitely helps getting more throughput due to the slow-start nature of TCP (doubly so if you use the default TCP settings on Linux, which aren't optimized for high-speed reliable gigabit networks).
In my recent loadtests on my HBase-heavy application (be it with HBase's traditional client or with asynchbase) I've always been CPU bound (except sometimes HBase's traditional client incurs too much lock contention to really max out the CPU cores, but this is entirely unrelated to the code you're quoting above).
Thank you for sharing your precious experience and knowledge. I understood. I'm relieved to know that many threads in one HBase client process can max out CPUs in most cases. I'm sorry to have interrupted discussion.
I'm more and more attracted by HBase. Reading the code is fun as the code is clean and relatively easy to understand. I hope the solution of this problem will be found in the application code.
Regards, Maumau
