You might try asynchbase Michael. St.Ack
On Mon, Nov 4, 2013 at 11:00 AM, <michael.grund...@high5games.com> wrote: > Not yet, this is just a load test client. It literally does nothing but > create threads to talk to HBase and run 4 different calls. Nothing else is > done in the app at all. > > To eliminate even more of our code from the loop, we just tried removing > our connection pool entirely and just using a single connection per thread > - no improvement. Then we tried creating the HTableInterface (all calls are > against the same table) at the time of connection creation. The means > thread to connection to table interface were all at 1 to 1 and not being > passed around. No performance improvement. > > Long story short, running a single thread it's fast. Start multithreading, > it starts slowing down. CPU usage, memory usage, etc. are all negligible. > The performance isn't terrible - it's probably good enough for the vast > majority of users, but it's not good enough for our app. With one thread, > it might take 5 milliseconds. With 10 threads all spinning more quickly (40 > milliseconds delay), the call time increases to 15-30 milliseconds. The > problem is that at our throughput rates, that's a serious concern. > > We are going to fire up a profiler next to see what we can find. > > -Mike > > > -----Original Message----- > From: Vladimir Rodionov [mailto:vrodio...@carrieriq.com] > Sent: Monday, November 04, 2013 12:50 PM > To: user@hbase.apache.org > Subject: RE: HBase Client Performance Bottleneck in a Single Virtual > Machine > > Michael, have you tried jstack on your client application? > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodio...@carrieriq.com > > ________________________________________ > From: michael.grund...@high5games.com [michael.grund...@high5games.com] > Sent: Sunday, November 03, 2013 7:46 PM > To: user@hbase.apache.org > Subject: HBase Client Performance Bottleneck in a Single Virtual Machine > > Hi all; I posted this as a question on StackOverflow as well but realized > I should have gone straight ot the horses-mouth with my question. Sorry for > the double post! > > We are running a series of HBase tests to see if we can migrate one of our > existing datasets from a RDBMS to HBase. We are running 15 nodes with 5 > zookeepers and HBase 0.94.12 for this test. > > We have a single table with three column families and a key that is > distributing very well across the cluster. All of our queries are running a > direct look-up; no searching or scanning. Since the HTablePool is now > frowned upon, we are using the Apache commons pool and a simple connection > factory to create a pool of connections and use them in our threads. Each > thread creates an HTableInstance as needed and closes it when done. There > are no leaks we can identify. > > If we run a single thread and just do lots of random calls sequentially, > the performance is quite good. Everything works great until we start trying > to scale the performance. As we add more threads and try and get more work > done in a single VM, we start seeing performance degrade quickly. The > client code is simply attempting to run either one of several gets or a > single put at a given frequency. It then waits until the next time to run, > we use this to simulate the workload from external clients. With a single > thread, we will see call times in the 2-3 milliseconds which is acceptable. > > As we add more threads, this call time starts increasing quickly. What > gets strange is if we add more VMs, the times hold steady across them all > so clearly it's a bottleneck in the running instance and not the cluster. > We can get a huge amount of processing happening across the cluster very > easily - it just has to use a lot of VMs on the client side to do it. We > know the contention isn't in the connection pool as we see the problem even > when we have more connections than threads. Unfortunately, the times are > spiraling out of control very quickly. We need it to support at least 128 > threads in practice, but most important I want to support 500 updates/sec > and 250 gets/sec. In theory, this should be a piece of cake for the cluster > as we can do FAR more work than that with a few VMs, but we don't even get > close to this with a single VM. > > So my question: how do people building high-performance apps with HBase > get around this? What approach are others using for connection pooling in a > multi-threaded environment? There seems to be a surprisingly little amount > of info about this on the web considering the popularity. Is there some > client setting we need to use that makes it perform better in a threaded > environment? We are going to try to cache HTable instances next but that's > a total guess. There are solutions to offloading work to other VMs but we > really want to avoid this as clearly the cluster can handle the load and it > will dramatically decrease the application performance in critical areas. > > Any help is greatly appreciated! Thanks! > -Mike > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or notificati...@carrieriq.com and > delete or destroy any copy of this message and its attachments. >