If you have used con.getTable() the close on HTable wont close the underlying connection
-Anoop- On Mon, Nov 4, 2013 at 11:21 AM, <michael.grund...@high5games.com> wrote: > Our current usage is how I would do this in a typical database app with > table acting like a statement. It looks like this: > > Connection connection = null; > HTableInterface table = null; > try { > connection = pool.acquire(); > table = connection.getTable(tableName); > // Do work > } finally { > table.close(); > pool.release(connection); > } > > Is this incorrect? The API docs says close " Releases any resources held > or pending changes in internal buffers." I didn't interpret that as having > it close the underlying connection. Thanks! > > -Mike > > -----Original Message----- > From: Sriram Ramachandrasekaran [mailto:sri.ram...@gmail.com] > Sent: Sunday, November 03, 2013 11:43 PM > To: user@hbase.apache.org > Cc: la...@apache.org > Subject: Re: HBase Client Performance Bottleneck in a Single Virtual > Machine > > Hey Michael - Per API documentation, closing the HTable Instance would > close the underlying resources too. Hope you are aware of it. > > > On Mon, Nov 4, 2013 at 11:06 AM, <michael.grund...@high5games.com> wrote: > > > Hi Lars, at application startup the pool is created with X number of > > connections using the first method you indicated: > > HConnectionManager.createConnection(conf). We store each connection in > > the pool automatically and serve it up to threads as they request it. > > When a thread is done using the connection, they return it back to the > > pool. The connections are not be created and closed per thread, but > > only once for the entire application. We are using the > > GenericObjectPool from Apache Commons Pooling as the foundation of our > > connection pooling approach. Our entire pool implementation really > > consists of just a couple overridden methods to specify how to create > > a new connection and close it. The GenericObjectPool class does all the > rest. See here for details: > > http://commons.apache.org/proper/commons-pool/ > > > > Each thread is getting a HTableInstance as needed and then closing it > > when done. The only thing we are not doing is using the > > createConnection method that takes in an ExecutorService as that > > wouldn't work in our model. Our app is like a web application - the > > thread pool is managed outside the scope of our application code so we > > can't assume the service is available at connection creation time. > Thanks! > > > > -Mike > > > > > > -----Original Message----- > > From: lars hofhansl [mailto:la...@apache.org] > > Sent: Sunday, November 03, 2013 11:27 PM > > To: user@hbase.apache.org > > Subject: Re: HBase Client Performance Bottleneck in a Single Virtual > > Machine > > > > Hi Micheal, > > > > can you try to create a single HConnection in your client: > > HConnectionManager.createConnection(Configuration conf) or > > HConnectionManager.createConnection(Configuration conf, > > ExecutorService > > pool) > > > > Then use HConnection.getTable(...) each time you need to do an operation. > > > > I.e. > > Configuration conf = ...; > > ExecutorService pool = ...; > > // create a single HConnection for you vm. > > HConnection con = HConnectionManager.createConnection(Configuration > > conf, ExecutorService pool); // reuse the connection for many tables, > > even in different threads HTableInterface table = con.getTable(...); > > // use table even for only a few operation. > > table.close(); > > ... > > HTableInterface table = con.getTable(...); // use table even for only > > a few operation. > > table.close(); > > ... > > // at the end close the connection > > con.close(); > > > > -- Lars > > > > > > > > ________________________________ > > From: "michael.grund...@high5games.com" > > <michael.grund...@high5games.com> > > To: user@hbase.apache.org > > Sent: Sunday, November 3, 2013 7:46 PM > > Subject: HBase Client Performance Bottleneck in a Single Virtual > > Machine > > > > > > Hi all; I posted this as a question on StackOverflow as well but > > realized I should have gone straight ot the horses-mouth with my > > question. Sorry for the double post! > > > > We are running a series of HBase tests to see if we can migrate one of > > our existing datasets from a RDBMS to HBase. We are running 15 nodes > > with 5 zookeepers and HBase 0.94.12 for this test. > > > > We have a single table with three column families and a key that is > > distributing very well across the cluster. All of our queries are > > running a direct look-up; no searching or scanning. Since the > > HTablePool is now frowned upon, we are using the Apache commons pool > > and a simple connection factory to create a pool of connections and > > use them in our threads. Each thread creates an HTableInstance as > > needed and closes it when done. There are no leaks we can identify. > > > > If we run a single thread and just do lots of random calls > > sequentially, the performance is quite good. Everything works great > > until we start trying to scale the performance. As we add more threads > > and try and get more work done in a single VM, we start seeing > > performance degrade quickly. The client code is simply attempting to > > run either one of several gets or a single put at a given frequency. > > It then waits until the next time to run, we use this to simulate the > > workload from external clients. With a single thread, we will see call > times in the 2-3 milliseconds which is acceptable. > > > > As we add more threads, this call time starts increasing quickly. What > > gets strange is if we add more VMs, the times hold steady across them > > all so clearly it's a bottleneck in the running instance and not the > cluster. > > We can get a huge amount of processing happening across the cluster > > very easily - it just has to use a lot of VMs on the client side to do > > it. We know the contention isn't in the connection pool as we see the > > problem even when we have more connections than threads. > > Unfortunately, the times are spiraling out of control very quickly. We > > need it to support at least 128 threads in practice, but most > > important I want to support 500 updates/sec and 250 gets/sec. In > > theory, this should be a piece of cake for the cluster as we can do > > FAR more work than that with a few VMs, but we don't even get close to > this with a single VM. > > > > So my question: how do people building high-performance apps with > > HBase get around this? What approach are others using for connection > > pooling in a multi-threaded environment? There seems to be a > > surprisingly little amount of info about this on the web considering > > the popularity. Is there some client setting we need to use that makes > > it perform better in a threaded environment? We are going to try to > > cache HTable instances next but that's a total guess. There are > > solutions to offloading work to other VMs but we really want to avoid > > this as clearly the cluster can handle the load and it will dramatically > decrease the application performance in critical areas. > > > > Any help is greatly appreciated! Thanks! > > -Mike > > > > > > -- > It's just about how deep your longing is! >