On Thu, Sep 9, 2010 at 5:10 AM, MauMau <[email protected]> wrote:
> ----- Original Message ----- From: "tsuna" <[email protected]>
> Sent: Thursday, September 09, 2010 7:52 AM
>> If the server side latency consistently remains around 5-15ms but the
>> client side latency shoots up through the roof, you may be
>> experiencing lock contention or some other problem potentially even
>> unrelated to HBase in your application.  Maybe your application is
>> generating too much garbage and the GC has to run too frequently.
>> Maybe you have so many threads that you're trashing the caches of the
>> CPUs you're running on.
>>
>> Adding more threads only makes things faster up to a certain point.
>> Past that point, things actually become slower.
>
> I'm curious about this problem. I would apologize if I say something wrong.
>
> Isn't it possible that the latency is due to the client-side serialized
> send/receive? Only one TCP connection is established to each region server
> in a client process regardless of how many HTable objects are created. That

That's not entirely true.  It depends how you create your HTable
objects.  If they share the same Configuration object, then yes you'd
end up with the same TableServers instances (see
HConnectionManager#getConnection).

> is, many threads share one connection to each region server even if they use
> different HTable instances. The sends from different threads are processed
> one-by-one, while the receives are processed one-by-one. For example, the
> sends are serialized in HBaseClient.java as follows:
>
>   protected void sendParam(Call call) {
> ...
>       synchronized (this.out) { // FindBugs IS2_INCONSISTENT_SYNC
>       // send data via TCP connection
> ...
>
> So, I guess the latency is due to the HBase client library implementation,
> not due to the application. If I understand correctly, I wish this to be
> improved.

I'm not sure about that.  But we could easily instrument the code to
measure the time taken to send the data out.

> I think the solution is to allocate as many TCP connections as the number of
> processor cores to each region server. For example, if the client machine
> has 8 cores and there are three region servers, each application process
> will have at most 24 TCP connections. The application threads use those 8
> connections in a round-robin fashion.

In my experience, sending or receiving the data has almost never been
a source of performance issues with distributed systems like HBase.
We all use networks with gigabit Ethernet and have sub-millisecond
latency between any two machines.  Gigabit links are hardly ever maxed
out.

Having more TCP connections makes the code more complicated (since you
need to manage them all, implement a scheme to try to use them in a
round-robin fashion, etc).  It can also put more strain on some
network gear or OS components.  For instance we had a problem recently
at StumbleUpon where we realized that some of our webservers had
iptables connection tracking enabled (even though it wasn't doing
anything and there was no custom iptables rule).  When we added some
memcache instances, iptables was having a hard time keeping track of
the tens of thousands of sockets the OS was dealing with, and
significantly slowing down the machine.  We had to disable and rmmod
it (we didn't need it anyway).

Further, re-using the same TCP connection over and over again has the
advantage of letting TCP quickly increase the receive window on both
sides of the connection.  This definitely helps getting more
throughput due to the slow-start nature of TCP (doubly so if you use
the default TCP settings on Linux, which aren't optimized for
high-speed reliable gigabit networks).

In my recent loadtests on my HBase-heavy application (be it with
HBase's traditional client or with asynchbase) I've always been CPU
bound (except sometimes HBase's traditional client incurs too much
lock contention to really max out the CPU cores, but this is entirely
unrelated to the code you're quoting above).

My advice is really to look at the GC logs of the client app, and if
this doesn't reveal anything interesting, profile the app to see where
time is being spent.  Also trying to have >100 active worker threads
doesn't make sense, unless you have >32 cores in your servers (most
people don't at this time).  This is just going to trash the CPU
caches, force the OS to constantly move threads around (no CPU
affinity + each thread has a tiny CPU time slice before getting
pre-empted) and cause way too many of context switches.

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Reply via email to