Re: Is this a long GC pause, or something else?

Tom Brown Tue, 10 Jun 2014 14:14:25 -0700

I do not believe GC logging is enabled. I will look into that for the
future.


The cluster is 6 machines all with the same spec. I have not seen any
evidence that any other server in the cluster had any problems at the same
time. There are/were no dead nodes. The master did not seem to notice
anything during this time.

The issue was detected because requests to a particular RS would
consistently timeout during the 20 minutes in question.

--Tom


On Tue, Jun 10, 2014 at 12:49 PM, Vladimir Rodionov <[email protected]
> wrote:

> 1. Do you have GC logging enabled on your cluster? It does not look like
> GC - pause to me but for future troubleshooting it is better
> to enable GC logging.
>
> 2. How large is your cluster? Did you check NN and DN logs as well? Are
> all your nodes (RS and DN) up and running? No dead nodes?
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [email protected]
>
> ________________________________________
> From: Tom Brown [[email protected]]
> Sent: Tuesday, June 10, 2014 11:13 AM
> To: [email protected]
> Subject: Re: Is this a long GC pause, or something else?
>
> We are still using 0.94.10. We are looking at upgrading soon, but have not
> done so yet.
>
> --Tom
>
>
> On Tue, Jun 10, 2014 at 12:10 PM, Ted Yu <[email protected]> wrote:
>
> > Which release are you using ?
> >
> > In 0.98+, there is JvmPauseMonitor.
> >
> > Cheers
> >
> >
> > On Tue, Jun 10, 2014 at 11:05 AM, Tom Brown <[email protected]>
> wrote:
> >
> > > Last night a regionserver in my cluster stopped responding in a timely
> > > manner for about 20 minutes. I know that stop-the-world GC can cause
> this
> > > type of behavior, but 20 minutes seems excessive.
> > >
> > > The server is a 2 core VM with 16GB of RAM, (hbase max heap is 12GB).
> We
> > > are using the latest java 7 from oracle. HDFS is provided by an Isilon
> > > cluster.
> > >
> > > The server workload is read/write: the writing process reads all rows
> it
> > is
> > > about to write, updates them if they exist, and then writes all the
> rows
> > > (replacing ones that were updated).
> > >
> > > The last messages before the pause were regarding an HLog roll:
> > >
> > > DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll
> requested
> > > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
> > > getDefaultReplication
> > > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
> > > getDefaultBlockSize
> > >
> > > During the next 20 minutes there were a handful of sporadic
> LruBlockCache
> > > stats messages but nothing else. After 20 minutes, normal operation
> > > resumed.
> > >
> > > Is 20 minutes for a GC pause expected given the operational load and
> > > machine specs? Could a GC pause include periodic log messages? If it
> > wasn't
> > > a GC pause, what else could it be?
> > >
> > > --Tom
> > >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or [email protected] and
> delete or destroy any copy of this message and its attachments.
>

Re: Is this a long GC pause, or something else?

Reply via email to