Our current usage is how I would do this in a typical database app with table acting like a statement. It looks like this:
Connection connection = null; HTableInterface table = null; try { connection = pool.acquire(); table = connection.getTable(tableName); // Do work } finally { table.close(); pool.release(connection); } Is this incorrect? The API docs says close " Releases any resources held or pending changes in internal buffers." I didn't interpret that as having it close the underlying connection. Thanks! -Mike -----Original Message----- From: Sriram Ramachandrasekaran [mailto:sri.ram...@gmail.com] Sent: Sunday, November 03, 2013 11:43 PM To: user@hbase.apache.org Cc: la...@apache.org Subject: Re: HBase Client Performance Bottleneck in a Single Virtual Machine Hey Michael - Per API documentation, closing the HTable Instance would close the underlying resources too. Hope you are aware of it. On Mon, Nov 4, 2013 at 11:06 AM, <michael.grund...@high5games.com> wrote: > Hi Lars, at application startup the pool is created with X number of > connections using the first method you indicated: > HConnectionManager.createConnection(conf). We store each connection in > the pool automatically and serve it up to threads as they request it. > When a thread is done using the connection, they return it back to the > pool. The connections are not be created and closed per thread, but > only once for the entire application. We are using the > GenericObjectPool from Apache Commons Pooling as the foundation of our > connection pooling approach. Our entire pool implementation really > consists of just a couple overridden methods to specify how to create > a new connection and close it. The GenericObjectPool class does all the rest. > See here for details: > http://commons.apache.org/proper/commons-pool/ > > Each thread is getting a HTableInstance as needed and then closing it > when done. The only thing we are not doing is using the > createConnection method that takes in an ExecutorService as that > wouldn't work in our model. Our app is like a web application - the > thread pool is managed outside the scope of our application code so we > can't assume the service is available at connection creation time. Thanks! > > -Mike > > > -----Original Message----- > From: lars hofhansl [mailto:la...@apache.org] > Sent: Sunday, November 03, 2013 11:27 PM > To: user@hbase.apache.org > Subject: Re: HBase Client Performance Bottleneck in a Single Virtual > Machine > > Hi Micheal, > > can you try to create a single HConnection in your client: > HConnectionManager.createConnection(Configuration conf) or > HConnectionManager.createConnection(Configuration conf, > ExecutorService > pool) > > Then use HConnection.getTable(...) each time you need to do an operation. > > I.e. > Configuration conf = ...; > ExecutorService pool = ...; > // create a single HConnection for you vm. > HConnection con = HConnectionManager.createConnection(Configuration > conf, ExecutorService pool); // reuse the connection for many tables, > even in different threads HTableInterface table = con.getTable(...); > // use table even for only a few operation. > table.close(); > ... > HTableInterface table = con.getTable(...); // use table even for only > a few operation. > table.close(); > ... > // at the end close the connection > con.close(); > > -- Lars > > > > ________________________________ > From: "michael.grund...@high5games.com" > <michael.grund...@high5games.com> > To: user@hbase.apache.org > Sent: Sunday, November 3, 2013 7:46 PM > Subject: HBase Client Performance Bottleneck in a Single Virtual > Machine > > > Hi all; I posted this as a question on StackOverflow as well but > realized I should have gone straight ot the horses-mouth with my > question. Sorry for the double post! > > We are running a series of HBase tests to see if we can migrate one of > our existing datasets from a RDBMS to HBase. We are running 15 nodes > with 5 zookeepers and HBase 0.94.12 for this test. > > We have a single table with three column families and a key that is > distributing very well across the cluster. All of our queries are > running a direct look-up; no searching or scanning. Since the > HTablePool is now frowned upon, we are using the Apache commons pool > and a simple connection factory to create a pool of connections and > use them in our threads. Each thread creates an HTableInstance as > needed and closes it when done. There are no leaks we can identify. > > If we run a single thread and just do lots of random calls > sequentially, the performance is quite good. Everything works great > until we start trying to scale the performance. As we add more threads > and try and get more work done in a single VM, we start seeing > performance degrade quickly. The client code is simply attempting to > run either one of several gets or a single put at a given frequency. > It then waits until the next time to run, we use this to simulate the > workload from external clients. With a single thread, we will see call times > in the 2-3 milliseconds which is acceptable. > > As we add more threads, this call time starts increasing quickly. What > gets strange is if we add more VMs, the times hold steady across them > all so clearly it's a bottleneck in the running instance and not the cluster. > We can get a huge amount of processing happening across the cluster > very easily - it just has to use a lot of VMs on the client side to do > it. We know the contention isn't in the connection pool as we see the > problem even when we have more connections than threads. > Unfortunately, the times are spiraling out of control very quickly. We > need it to support at least 128 threads in practice, but most > important I want to support 500 updates/sec and 250 gets/sec. In > theory, this should be a piece of cake for the cluster as we can do > FAR more work than that with a few VMs, but we don't even get close to this > with a single VM. > > So my question: how do people building high-performance apps with > HBase get around this? What approach are others using for connection > pooling in a multi-threaded environment? There seems to be a > surprisingly little amount of info about this on the web considering > the popularity. Is there some client setting we need to use that makes > it perform better in a threaded environment? We are going to try to > cache HTable instances next but that's a total guess. There are > solutions to offloading work to other VMs but we really want to avoid > this as clearly the cluster can handle the load and it will dramatically > decrease the application performance in critical areas. > > Any help is greatly appreciated! Thanks! > -Mike > -- It's just about how deep your longing is!