Re: SocketTimeoutException caused by GC?

Ted Yu Thu, 27 Jan 2011 16:13:54 -0800

Should steps 1 and 2 below be exchanged ?

Regards


On Thu, Jan 27, 2011 at 3:53 PM, Jean-Daniel Cryans <[email protected]>wrote:

> To mitigate heap fragmentation, you could consider adding more nodes
> to the cluster :)
>
> Regarding rolling restarts, currently there's one major issue:
> https://issues.apache.org/jira/browse/HBASE-3441
>
> How it currently works is a bit dumb, when you cleanly close a region
> server it will first close all incoming connections and then will
> procede to close the regions and it's not until it's fully done that
> it will report to the master. What it means for your clients is that a
> portion of the regions will become unavailable for some time until the
> region server is done shutting down. How long you ask? Well it depends
> on 1) how many regions you have but also mostly 2) how much data needs
> to be flushed from the MemStores. On one of our clusters, shutting
> down HBase takes a few minutes since our write pattern is almost
> perfectly distributed meaning that all the memstore space is always
> full from all the regions (luckily it's a cluster that serves only
> mapreduce jobs).
>
> Writing this gives me an idea... I think one "easy" way we could
> achieve this region draining problem is by writing a jruby script
> that:
>
> 1- Retrieves the list of regions served by a RS
> 2- Disables master balancing
> 3- Moves one by one every region out of the RS, assigning them to the
> other RSs in a round-robin fashion
> 4- Shuts down the RS
> 5- Reenables master balancing
>
> I wonder if it would work... At least it's a process that you could
> stop at any time without breaking everything.
>
> J-D
>
> On Thu, Jan 27, 2011 at 11:38 AM, Wayne <[email protected]> wrote:
> > I assumed GC was *trying* to roll. It shows the last 30min of logs with
> > control characters at the end.
> >
> > We are not all writes. In terms of writes we can wait and the zookeeper
> > timeout can go way up, but we also need to support real-time reads (end
> user
> > based) and that is why the zookeeper timeout is not our first choice to
> > increase (we would rather decrease it). The funny part is that .90 seems
> > faster for us and churns through writes at a faster clip thereby probably
> > becoming less stable sooner due to the JVM not being able to handle it.
> > Should we schedule a rolling restart every 24 hours? How do production
> > systems accept volume writes through the front door without melting the
> JVM
> > due to fragmentation? We can possibly switch to bulk writes but
> performance
> > is not our problem...stability is. We are pushing 40k writes/node/sec
> > sustained with well balanced regions hour after hour day after day (until
> a
> > zookeeper tear down).
> >
> > Great to hear it is actively being looked at. I will keep an eye on
> #3455.
> >
> > Below are our GC options, many of which are from work with the other java
> > database. Should I go back to the default settings? Should I use those
> > referenced in the Jira #3455 (-XX:+UseConcMarkSweepGC
> > -XX:CMSInitiatingOccupancyFraction=65 -Xms8g -Xmx8g). We are also using
> > Java6u23.
> >
> >
> > export HBASE_HEAPSIZE=8192
> > export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
> > -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
> > -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
> > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
> > -XX:+CMSIncrementalMode"
> > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
> > -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
> >
> >
> > Thanks for your help!
> >
> >
>

Re: SocketTimeoutException caused by GC?

Reply via email to