Re: Is this a long GC pause, or something else?

Tom Brown Tue, 10 Jun 2014 16:53:29 -0700

Otis,

I'm not sure our issue is the same (although they could turn out to be
related). As far as I have been able to determine, we have only had a
single long pause.


However, we don't have much experience micromanaging our JVMs. How did you
generate those graphs?

--Tom


On Tue, Jun 10, 2014 at 4:52 PM, Otis Gospodnetic <
[email protected]> wrote:

> No, I don't think so.  We had it until this morning and didn't see this
> problem.  We'll probably switch to it tomorrow morning before we change EC2
> instances and see if that removes the problem.
>
> Tom - do your pauses look like the ones in our SPM graphs?
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Tue, Jun 10, 2014 at 6:38 PM, Vladimir Rodionov <
> [email protected]>
> wrote:
>
> > Unbelievable. Do you see the same with the latest OpenJDK?
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: [email protected]
> >
> > ________________________________________
> > From: Otis Gospodnetic [[email protected]]
> > Sent: Tuesday, June 10, 2014 2:43 PM
> > To: [email protected]
> > Subject: Re: Is this a long GC pause, or something else?
> >
> > Does it repeat?
> > We are seeing this with u60 oracle JVM too!  SPM shows the whole JVM
> > blocking for about 16 minutes every M minutes.
> >
> > Otis
> >
> >
> >
> > > On Jun 10, 2014, at 2:05 PM, Tom Brown <[email protected]> wrote:
> > >
> > > Last night a regionserver in my cluster stopped responding in a timely
> > > manner for about 20 minutes. I know that stop-the-world GC can cause
> this
> > > type of behavior, but 20 minutes seems excessive.
> > >
> > > The server is a 2 core VM with 16GB of RAM, (hbase max heap is 12GB).
> We
> > > are using the latest java 7 from oracle. HDFS is provided by an Isilon
> > > cluster.
> > >
> > > The server workload is read/write: the writing process reads all rows
> it
> > is
> > > about to write, updates them if they exist, and then writes all the
> rows
> > > (replacing ones that were updated).
> > >
> > > The last messages before the pause were regarding an HLog roll:
> > >
> > > DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll
> requested
> > > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
> > > getDefaultReplication
> > > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
> > > getDefaultBlockSize
> > >
> > > During the next 20 minutes there were a handful of sporadic
> LruBlockCache
> > > stats messages but nothing else. After 20 minutes, normal operation
> > resumed.
> > >
> > > Is 20 minutes for a GC pause expected given the operational load and
> > > machine specs? Could a GC pause include periodic log messages? If it
> > wasn't
> > > a GC pause, what else could it be?
> > >
> > > --Tom
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or [email protected] and
> > delete or destroy any copy of this message and its attachments.
> >
>

Re: Is this a long GC pause, or something else?

Reply via email to