Re: SocketTimeoutException caused by GC?

Ted Yu Thu, 27 Jan 2011 16:23:30 -0800

Is there a way to disable splitting (on a particular region server) ?

On Thu, Jan 27, 2011 at 4:20 PM, Jean-Daniel Cryans <[email protected]>wrote:


> Mmm yes for the sake of not having a single region that moved, but it
> wouldn't be so bad... it just means that those regions will be closed
> when the RS closes.
>
> Also it's possible to have splits during that time, again it's not
> dramatic as long as the script doesn't freak out because a region is
> gone.
>
> J-D
>
> On Thu, Jan 27, 2011 at 4:13 PM, Ted Yu <[email protected]> wrote:
> > Should steps 1 and 2 below be exchanged ?
> >
> > Regards
> >
> > On Thu, Jan 27, 2011 at 3:53 PM, Jean-Daniel Cryans <[email protected]
> >wrote:
> >
> >> To mitigate heap fragmentation, you could consider adding more nodes
> >> to the cluster :)
> >>
> >> Regarding rolling restarts, currently there's one major issue:
> >> https://issues.apache.org/jira/browse/HBASE-3441
> >>
> >> How it currently works is a bit dumb, when you cleanly close a region
> >> server it will first close all incoming connections and then will
> >> procede to close the regions and it's not until it's fully done that
> >> it will report to the master. What it means for your clients is that a
> >> portion of the regions will become unavailable for some time until the
> >> region server is done shutting down. How long you ask? Well it depends
> >> on 1) how many regions you have but also mostly 2) how much data needs
> >> to be flushed from the MemStores. On one of our clusters, shutting
> >> down HBase takes a few minutes since our write pattern is almost
> >> perfectly distributed meaning that all the memstore space is always
> >> full from all the regions (luckily it's a cluster that serves only
> >> mapreduce jobs).
> >>
> >> Writing this gives me an idea... I think one "easy" way we could
> >> achieve this region draining problem is by writing a jruby script
> >> that:
> >>
> >> 1- Retrieves the list of regions served by a RS
> >> 2- Disables master balancing
> >> 3- Moves one by one every region out of the RS, assigning them to the
> >> other RSs in a round-robin fashion
> >> 4- Shuts down the RS
> >> 5- Reenables master balancing
> >>
> >> I wonder if it would work... At least it's a process that you could
> >> stop at any time without breaking everything.
> >>
> >> J-D
> >>
> >> On Thu, Jan 27, 2011 at 11:38 AM, Wayne <[email protected]> wrote:
> >> > I assumed GC was *trying* to roll. It shows the last 30min of logs
> with
> >> > control characters at the end.
> >> >
> >> > We are not all writes. In terms of writes we can wait and the
> zookeeper
> >> > timeout can go way up, but we also need to support real-time reads
> (end
> >> user
> >> > based) and that is why the zookeeper timeout is not our first choice
> to
> >> > increase (we would rather decrease it). The funny part is that .90
> seems
> >> > faster for us and churns through writes at a faster clip thereby
> probably
> >> > becoming less stable sooner due to the JVM not being able to handle
> it.
> >> > Should we schedule a rolling restart every 24 hours? How do production
> >> > systems accept volume writes through the front door without melting
> the
> >> JVM
> >> > due to fragmentation? We can possibly switch to bulk writes but
> >> performance
> >> > is not our problem...stability is. We are pushing 40k writes/node/sec
> >> > sustained with well balanced regions hour after hour day after day
> (until
> >> a
> >> > zookeeper tear down).
> >> >
> >> > Great to hear it is actively being looked at. I will keep an eye on
> >> #3455.
> >> >
> >> > Below are our GC options, many of which are from work with the other
> java
> >> > database. Should I go back to the default settings? Should I use those
> >> > referenced in the Jira #3455 (-XX:+UseConcMarkSweepGC
> >> > -XX:CMSInitiatingOccupancyFraction=65 -Xms8g -Xmx8g). We are also
> using
> >> > Java6u23.
> >> >
> >> >
> >> > export HBASE_HEAPSIZE=8192
> >> > export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
> >> > -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
> >> > -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
> >> > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
> >> > -XX:+CMSIncrementalMode"
> >> > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
> >> > -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
> >> >
> >> >
> >> > Thanks for your help!
> >> >
> >> >
> >>
> >
>

Re: SocketTimeoutException caused by GC?

Reply via email to