Mmm yes for the sake of not having a single region that moved, but it wouldn't be so bad... it just means that those regions will be closed when the RS closes.
Also it's possible to have splits during that time, again it's not dramatic as long as the script doesn't freak out because a region is gone. J-D On Thu, Jan 27, 2011 at 4:13 PM, Ted Yu <[email protected]> wrote: > Should steps 1 and 2 below be exchanged ? > > Regards > > On Thu, Jan 27, 2011 at 3:53 PM, Jean-Daniel Cryans > <[email protected]>wrote: > >> To mitigate heap fragmentation, you could consider adding more nodes >> to the cluster :) >> >> Regarding rolling restarts, currently there's one major issue: >> https://issues.apache.org/jira/browse/HBASE-3441 >> >> How it currently works is a bit dumb, when you cleanly close a region >> server it will first close all incoming connections and then will >> procede to close the regions and it's not until it's fully done that >> it will report to the master. What it means for your clients is that a >> portion of the regions will become unavailable for some time until the >> region server is done shutting down. How long you ask? Well it depends >> on 1) how many regions you have but also mostly 2) how much data needs >> to be flushed from the MemStores. On one of our clusters, shutting >> down HBase takes a few minutes since our write pattern is almost >> perfectly distributed meaning that all the memstore space is always >> full from all the regions (luckily it's a cluster that serves only >> mapreduce jobs). >> >> Writing this gives me an idea... I think one "easy" way we could >> achieve this region draining problem is by writing a jruby script >> that: >> >> 1- Retrieves the list of regions served by a RS >> 2- Disables master balancing >> 3- Moves one by one every region out of the RS, assigning them to the >> other RSs in a round-robin fashion >> 4- Shuts down the RS >> 5- Reenables master balancing >> >> I wonder if it would work... At least it's a process that you could >> stop at any time without breaking everything. >> >> J-D >> >> On Thu, Jan 27, 2011 at 11:38 AM, Wayne <[email protected]> wrote: >> > I assumed GC was *trying* to roll. It shows the last 30min of logs with >> > control characters at the end. >> > >> > We are not all writes. In terms of writes we can wait and the zookeeper >> > timeout can go way up, but we also need to support real-time reads (end >> user >> > based) and that is why the zookeeper timeout is not our first choice to >> > increase (we would rather decrease it). The funny part is that .90 seems >> > faster for us and churns through writes at a faster clip thereby probably >> > becoming less stable sooner due to the JVM not being able to handle it. >> > Should we schedule a rolling restart every 24 hours? How do production >> > systems accept volume writes through the front door without melting the >> JVM >> > due to fragmentation? We can possibly switch to bulk writes but >> performance >> > is not our problem...stability is. We are pushing 40k writes/node/sec >> > sustained with well balanced regions hour after hour day after day (until >> a >> > zookeeper tear down). >> > >> > Great to hear it is actively being looked at. I will keep an eye on >> #3455. >> > >> > Below are our GC options, many of which are from work with the other java >> > database. Should I go back to the default settings? Should I use those >> > referenced in the Jira #3455 (-XX:+UseConcMarkSweepGC >> > -XX:CMSInitiatingOccupancyFraction=65 -Xms8g -Xmx8g). We are also using >> > Java6u23. >> > >> > >> > export HBASE_HEAPSIZE=8192 >> > export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly >> > -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled >> > -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1 >> > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC >> > -XX:+CMSIncrementalMode" >> > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails >> > -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log" >> > >> > >> > Thanks for your help! >> > >> > >> >
