Hi Tom, Aha. Our pauses keep happening. :(
We use SPM - see http://sematext.com/spm/ - it has support for HBase and Hadoop metrics, among other things. As a matter of fact, for troubleshooting an issue like this one you may also want to ship your logs into Logsene <http://sematext.com/logsene/>. Doing that will let you correlate your pause with messages in the logs, which could help you figure out what's going on next time something like this happens. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Tue, Jun 10, 2014 at 7:52 PM, Tom Brown <[email protected]> wrote: > Otis, > > I'm not sure our issue is the same (although they could turn out to be > related). As far as I have been able to determine, we have only had a > single long pause. > > However, we don't have much experience micromanaging our JVMs. How did you > generate those graphs? > > --Tom > > > On Tue, Jun 10, 2014 at 4:52 PM, Otis Gospodnetic < > [email protected]> wrote: > > > No, I don't think so. We had it until this morning and didn't see this > > problem. We'll probably switch to it tomorrow morning before we change > EC2 > > instances and see if that removes the problem. > > > > Tom - do your pauses look like the ones in our SPM graphs? > > > > Otis > > -- > > Performance Monitoring * Log Analytics * Search Analytics > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > On Tue, Jun 10, 2014 at 6:38 PM, Vladimir Rodionov < > > [email protected]> > > wrote: > > > > > Unbelievable. Do you see the same with the latest OpenJDK? > > > > > > Best regards, > > > Vladimir Rodionov > > > Principal Platform Engineer > > > Carrier IQ, www.carrieriq.com > > > e-mail: [email protected] > > > > > > ________________________________________ > > > From: Otis Gospodnetic [[email protected]] > > > Sent: Tuesday, June 10, 2014 2:43 PM > > > To: [email protected] > > > Subject: Re: Is this a long GC pause, or something else? > > > > > > Does it repeat? > > > We are seeing this with u60 oracle JVM too! SPM shows the whole JVM > > > blocking for about 16 minutes every M minutes. > > > > > > Otis > > > > > > > > > > > > > On Jun 10, 2014, at 2:05 PM, Tom Brown <[email protected]> wrote: > > > > > > > > Last night a regionserver in my cluster stopped responding in a > timely > > > > manner for about 20 minutes. I know that stop-the-world GC can cause > > this > > > > type of behavior, but 20 minutes seems excessive. > > > > > > > > The server is a 2 core VM with 16GB of RAM, (hbase max heap is 12GB). > > We > > > > are using the latest java 7 from oracle. HDFS is provided by an > Isilon > > > > cluster. > > > > > > > > The server workload is read/write: the writing process reads all rows > > it > > > is > > > > about to write, updates them if they exist, and then writes all the > > rows > > > > (replacing ones that were updated). > > > > > > > > The last messages before the pause were regarding an HLog roll: > > > > > > > > DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll > > requested > > > > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support > > > > getDefaultReplication > > > > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support > > > > getDefaultBlockSize > > > > > > > > During the next 20 minutes there were a handful of sporadic > > LruBlockCache > > > > stats messages but nothing else. After 20 minutes, normal operation > > > resumed. > > > > > > > > Is 20 minutes for a GC pause expected given the operational load and > > > > machine specs? Could a GC pause include periodic log messages? If it > > > wasn't > > > > a GC pause, what else could it be? > > > > > > > > --Tom > > > > > > Confidentiality Notice: The information contained in this message, > > > including any attachments hereto, may be confidential and is intended > to > > be > > > read only by the individual or entity to whom this message is > addressed. > > If > > > the reader of this message is not the intended recipient or an agent or > > > designee of the intended recipient, please note that any review, use, > > > disclosure or distribution of this message or its attachments, in any > > form, > > > is strictly prohibited. If you have received this message in error, > > please > > > immediately notify the sender and/or [email protected] and > > > delete or destroy any copy of this message and its attachments. > > > > > >
