[
https://issues.apache.org/jira/browse/HBASE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13436951#comment-13436951
]
nicu marasoiu commented on HBASE-2939:
--
Hi,
I am not sure why multiplexing does not work so well with multiple tcp
connections.
When using one tcp connection, the multiplexing pays off peaking at 16 threads
using same connection in my particular tests (remote client, 4kb batching of 2k
puts).
While when using 16 tcp connections, the multiplexing benefit peaks at 2
threads sharing one tcp connection.
The relative benefit of sharing/multiplexing seems about the same when 2
htables share one socket. However when using hbase.client.ipc.pool.size=16, the
benefit degrades rapidly when the multiplexing factor is above 2, and goes
below the line when above 4.
I would like to understand the underlying reason. Perhaps there are locking and
contention mechanisms preventing us loading the multiple connections in the
same way we can multiplex a single one.
Here are my times with batched puts, n htable instances (threads), m
connections in the robin pool:
1 HTable: 1400 records in a fixed timeframe
2 HTables sharing the tcp socket: 1566 records in a fixed timeframe
8 Htables sharing the tcp socket: 6500 records
16 HTables sharing the tcp socket: 9200 records
32 HTables sharing the tcp socket: 6500 records
256 Htables sharing the tcp socket: 1340 recs
16 HTables on 16 connections: 16753 recs
32 Htables on 16 connections: 18661 recs
64 Htables on 16 connections: 16800 recs
128 Htables on 16 connections: 4300 recs
256 Htables on 16 connections: 2434 recs
when saying 16 htables i mean 16 threads using a HTablePool
You can see that it seems that the multiplexing performance degrades when using
multiple connections much faster than when using just one (without pooling,
default).
Allow Client-Side Connection Pooling
Key: HBASE-2939
URL: https://issues.apache.org/jira/browse/HBASE-2939
Project: HBase
Issue Type: Improvement
Components: client
Affects Versions: 0.89.20100621
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Critical
Fix For: 0.92.0
Attachments: HBASE-2939-0.20.6.patch, HBASE-2939-LATEST.patch,
HBASE-2939.patch, HBASE-2939.patch, HBASE-2939.patch, HBASE-2939-V6.patch,
HBaseClient.java
By design, the HBase RPC client multiplexes calls to a given region server
(or the master for that matter) over a single socket, access to which is
managed by a connection thread defined in the HBaseClient class. While this
approach may suffice for most cases, it tends to break down in the context of
a real-time, multi-threaded server, where latencies need to be lower and
throughputs higher.
In brief, the problem is that we dedicate one thread to handle all
client-side reads and writes for a given server, which in turn forces them to
share the same socket. As load increases, this is bound to serialize calls on
the client-side. In particular, when the rate at which calls are submitted to
the connection thread is greater than that at which the server responds, then
some of those calls will inevitably end up sitting idle, just waiting their
turn to go over the wire.
In general, sharing sockets across multiple client threads is a good idea,
but limiting the number of such sockets to one may be overly restrictive for
certain cases. Here, we propose a way of defining multiple sockets per server
endpoint, access to which may be managed through either a load-balancing or
thread-local pool. To that end, we define the notion of a SharedMap, which
maps a key to a resource pool, and supports both of those pool types.
Specifically, we will apply that map in the HBaseClient, to associate
multiple connection threads with each server endpoint (denoted by a
connection id).
Currently, the SharedMap supports the following types of pools:
* A ThreadLocalPool, which represents a pool that builds on the
ThreadLocal class. It essentially binds the resource to the thread from which
it is accessed.
* A ReusablePool, which represents a pool that builds on the LinkedList
class. It essentially allows resources to be checked out, at which point it
is (temporarily) removed from the pool. When the resource is no longer
required, it should be returned to the pool in order to be reused.
* A RoundRobinPool, which represents a pool that stores its resources in
an ArrayList. It load-balances access to its resources by returning a
different resource every time a given key is looked up.
To control the type and size of the connection pools, we give the user a
couple of parameters (viz. hbase.client.ipc.pool.type and
hbase.client.ipc.pool.size). In