Hey Jeremy: Those stack traces are from a server waiting for work; all threads are RUNNABLE waiting for something to do. Could they have been taken at other than the locked up time (No mention of HTablePool).
Good on you, St.Ack On Tue, Oct 19, 2010 at 10:18 PM, Jeremy Hinegardner <[email protected]> wrote: > I didn't make any changes to heap, it is running with -Xmx1000m and when it > hangs, its resident memory is 122m, so I kind of doubt the GC is at fault, > but I'm happy to be wrong. > > Here is a thread dump of lock-up time: http://gist.github.com/635805 > > Nothing in there jumped out at me. > > thanks, > > -jeremy > > > On Wed, Oct 20, 2010 at 01:41:09AM +0000, Jonathan Gray wrote: >> That kind of complete pause/resume behavior most often happens from GC >> issues. How much heap are you giving to the stargate server? Also, try >> doing a thread dump during the lock-up time, maybe something telling there. >> >> > -----Original Message----- >> > From: Andrew Purtell [mailto:[email protected]] >> > Sent: Tuesday, October 19, 2010 6:29 PM >> > To: [email protected]; [email protected] >> > Subject: Re: concurrency issue with Stargate/HTablePool >> > >> > Jeremy, >> > >> > Have you given any thought to trying out the latest 0.89 release? >> > >> > The Stargate package has been moved into org.apache.hadoop.hbase.rest >> > but otherwise it is the same. >> > >> > If this is a concurrency problem with the HBase client library it would >> > be better to try and deal with it on what is currently under >> > development, if you do not have a specific requirement to use 0.20.x. >> > >> > Best regards, >> > >> > - Andy >> > >> > >> > --- On Tue, 10/19/10, Jeremy Hinegardner <[email protected]> >> > wrote: >> > >> > > From: Jeremy Hinegardner <[email protected]> >> > > Subject: concurrency issue with Stargate/HTablePool >> > > To: [email protected] >> > > Date: Tuesday, October 19, 2010, 5:03 PM >> > > Hi all, >> > > >> > > I've done a bit of search on this issue, and have yet to >> > > find anything conclusive.? As a test case to demonstrate it, >> > > I am using HBase 0.20.6 and stargate. >> > > >> > > I have a test HBase cluster with 1 table and about 60M rows >> > > in it, and a Stargate instance that talks to it. >> > > >> > > I have clients that queue up a random list of rowid's to >> > > query stargate via http://stargate.example.com:3002/table/rowid >> > > like requests. >> > > >> > > When I have 3 concurrent clients querying stargate, they >> > > all do well and get >> > > a consistent throughput.? When I add the 4th client >> > > querying stargate, stargate >> > > comes to a screeching halt and everyone has 0 operations >> > > for a long while, then >> > > a small burst of requests will go through stargate and it >> > > will hang for a while, >> > > and repeat. >> > > >> > > If I then just kill one client, the other 3 start having a >> > > good consistent >> > > throughput again.? Bring back the 4th client and it >> > > comes to a halt. >> > > >> > > If I bypass Stargate completely, and have all clients use >> > > HTable instances >> > > directly, then everyone is good.? I can go up to as >> > > many clients as I need. >> > > >> > > This seems like quite a problem and I was wondering if >> > > anyone else is >> > > seeing something similar. >> > > >> > > thank you for your time, >> > > >> > > -jeremy >> > > >> > > >> > > -- >> > > >> > ======================================================================= >> > = >> > > Jeremy Hinegardner >> > > >> > > ? ? [email protected] >> > > >> > > >> > > >> > >> > >> > > > -- > ======================================================================== > Jeremy Hinegardner [email protected] > >
