Otis, I'm not sure our issue is the same (although they could turn out to be related). As far as I have been able to determine, we have only had a single long pause.
However, we don't have much experience micromanaging our JVMs. How did you generate those graphs? --Tom On Tue, Jun 10, 2014 at 4:52 PM, Otis Gospodnetic < [email protected]> wrote: > No, I don't think so. We had it until this morning and didn't see this > problem. We'll probably switch to it tomorrow morning before we change EC2 > instances and see if that removes the problem. > > Tom - do your pauses look like the ones in our SPM graphs? > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Tue, Jun 10, 2014 at 6:38 PM, Vladimir Rodionov < > [email protected]> > wrote: > > > Unbelievable. Do you see the same with the latest OpenJDK? > > > > Best regards, > > Vladimir Rodionov > > Principal Platform Engineer > > Carrier IQ, www.carrieriq.com > > e-mail: [email protected] > > > > ________________________________________ > > From: Otis Gospodnetic [[email protected]] > > Sent: Tuesday, June 10, 2014 2:43 PM > > To: [email protected] > > Subject: Re: Is this a long GC pause, or something else? > > > > Does it repeat? > > We are seeing this with u60 oracle JVM too! SPM shows the whole JVM > > blocking for about 16 minutes every M minutes. > > > > Otis > > > > > > > > > On Jun 10, 2014, at 2:05 PM, Tom Brown <[email protected]> wrote: > > > > > > Last night a regionserver in my cluster stopped responding in a timely > > > manner for about 20 minutes. I know that stop-the-world GC can cause > this > > > type of behavior, but 20 minutes seems excessive. > > > > > > The server is a 2 core VM with 16GB of RAM, (hbase max heap is 12GB). > We > > > are using the latest java 7 from oracle. HDFS is provided by an Isilon > > > cluster. > > > > > > The server workload is read/write: the writing process reads all rows > it > > is > > > about to write, updates them if they exist, and then writes all the > rows > > > (replacing ones that were updated). > > > > > > The last messages before the pause were regarding an HLog roll: > > > > > > DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll > requested > > > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support > > > getDefaultReplication > > > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support > > > getDefaultBlockSize > > > > > > During the next 20 minutes there were a handful of sporadic > LruBlockCache > > > stats messages but nothing else. After 20 minutes, normal operation > > resumed. > > > > > > Is 20 minutes for a GC pause expected given the operational load and > > > machine specs? Could a GC pause include periodic log messages? If it > > wasn't > > > a GC pause, what else could it be? > > > > > > --Tom > > > > Confidentiality Notice: The information contained in this message, > > including any attachments hereto, may be confidential and is intended to > be > > read only by the individual or entity to whom this message is addressed. > If > > the reader of this message is not the intended recipient or an agent or > > designee of the intended recipient, please note that any review, use, > > disclosure or distribution of this message or its attachments, in any > form, > > is strictly prohibited. If you have received this message in error, > please > > immediately notify the sender and/or [email protected] and > > delete or destroy any copy of this message and its attachments. > > >
