Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Sriram Ramachandrasekaran Sun, 03 Nov 2013 21:45:23 -0800

Hey Michael - Per API documentation, closing the HTable Instance would
close the underlying resources too. Hope you are aware of it.



On Mon, Nov 4, 2013 at 11:06 AM, <michael.grund...@high5games.com> wrote:

> Hi Lars, at application startup the pool is created with X number of
> connections using the first method you indicated:
> HConnectionManager.createConnection(conf). We store each connection in the
> pool automatically and serve it up to threads as they request it. When a
> thread is done using the connection, they return it back to the pool. The
> connections are not be created and closed per thread, but only once for the
> entire application. We are using the GenericObjectPool from Apache Commons
> Pooling as the foundation of our connection pooling approach. Our entire
> pool implementation really consists of just a couple overridden methods to
> specify how to create a new connection and close it. The GenericObjectPool
> class does all the rest. See here for details:
> http://commons.apache.org/proper/commons-pool/
>
> Each thread is getting a HTableInstance as needed and then closing it when
> done. The only thing we are not doing is using the createConnection method
> that takes in an ExecutorService as that wouldn't work in our model. Our
> app is like a web application - the thread pool is managed outside the
> scope of our application code so we can't assume the service is available
> at connection creation time. Thanks!
>
> -Mike
>
>
> -----Original Message-----
> From: lars hofhansl [mailto:la...@apache.org]
> Sent: Sunday, November 03, 2013 11:27 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
> Hi Micheal,
>
> can you try to create a single HConnection in your client:
> HConnectionManager.createConnection(Configuration conf) or
> HConnectionManager.createConnection(Configuration conf, ExecutorService
> pool)
>
> Then use HConnection.getTable(...) each time you need to do an operation.
>
> I.e.
> Configuration conf = ...;
> ExecutorService pool = ...;
> // create a single HConnection for you vm.
> HConnection con = HConnectionManager.createConnection(Configuration conf,
> ExecutorService pool); // reuse the connection for many tables, even in
> different threads HTableInterface table = con.getTable(...); // use table
> even for only a few operation.
> table.close();
> ...
> HTableInterface table = con.getTable(...); // use table even for only a
> few operation.
> table.close();
> ...
> // at the end close the connection
> con.close();
>
> -- Lars
>
>
>
> ________________________________
>  From: "michael.grund...@high5games.com" <michael.grund...@high5games.com>
> To: user@hbase.apache.org
> Sent: Sunday, November 3, 2013 7:46 PM
> Subject: HBase Client Performance Bottleneck in a Single Virtual Machine
>
>
> Hi all; I posted this as a question on StackOverflow as well but realized
> I should have gone straight ot the horses-mouth with my question. Sorry for
> the double post!
>
> We are running a series of HBase tests to see if we can migrate one of our
> existing datasets from a RDBMS to HBase. We are running 15 nodes with 5
> zookeepers and HBase 0.94.12 for this test.
>
> We have a single table with three column families and a key that is
> distributing very well across the cluster. All of our queries are running a
> direct look-up; no searching or scanning. Since the HTablePool is now
> frowned upon, we are using the Apache commons pool and a simple connection
> factory to create a pool of connections and use them in our threads. Each
> thread creates an HTableInstance as needed and closes it when done. There
> are no leaks we can identify.
>
> If we run a single thread and just do lots of random calls sequentially,
> the performance is quite good. Everything works great until we start trying
> to scale the performance. As we add more threads and try and get more work
> done in a single VM, we start seeing performance degrade quickly. The
> client code is simply attempting to run either one of several gets or a
> single put at a given frequency. It then waits until the next time to run,
> we use this to simulate the workload from external clients. With a single
> thread, we will see call times in the 2-3 milliseconds which is acceptable.
>
> As we add more threads, this call time starts increasing quickly. What
> gets strange is if we add more VMs, the times hold steady across them all
> so clearly it's a bottleneck in the running instance and not the cluster.
> We can get a huge amount of processing happening across the cluster very
> easily - it just has to use a lot of VMs on the client side to do it. We
> know the contention isn't in the connection pool as we see the problem even
> when we have more connections than threads. Unfortunately, the times are
> spiraling out of control very quickly. We need it to support at least 128
> threads in practice, but most important I want to support 500 updates/sec
> and 250 gets/sec. In theory, this should be a piece of cake for the cluster
> as we can do FAR more work than that with a few VMs, but we don't even get
> close to this with a single VM.
>
> So my question: how do people building high-performance apps with HBase
> get around this? What approach are others using for connection pooling in a
> multi-threaded environment? There seems to be a surprisingly little amount
> of info about this on the web considering the popularity. Is there some
> client setting we need to use that makes it perform better in a threaded
> environment? We are going to try to cache HTable instances next but that's
> a total guess. There are solutions to offloading work to other VMs but we
> really want to avoid this as clearly the cluster can handle the load and it
> will dramatically decrease the application performance in critical areas.
>
> Any help is greatly appreciated! Thanks!
> -Mike
>



-- 
It's just about how deep your longing is!

Re: HBase Client Performance Bottleneck in a Single Virtual Machine

Reply via email to