On Thu, Jan 27, 2011 at 6:48 AM, Wayne <[email protected]> wrote: > We have got .90 up and running well, but again after 24 hours of loading a > node went down. Under it all I assume it is a GC issue, but the GC logging > rolls every < 60 minutes so I can never see logs from 5 hours ago (working > on getting Scribe up to solve that).
Is it possible to ask GC logs roll? Whats the config. please? > Most of our issues are a node being > marked as dead after being un-responsive. It often starts with a socket > timeout. We can turn up the timeout for zookeeper but that is not dealing > with the issue. > > Here is the first sign of trouble. Is the 1 min 34 second gap below most > likely a stop the world GC? > > 2011-01-27 07:00:43,716 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: > Roll /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111623011, > entries=242, filesize=69508440. New hlog > /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111643436 > 2011-01-27 07:02:17,663 WARN org.apache.hadoop.hdfs.DFSClient: > DFSOutputStream ResponseProcessor exception for block > blk_-5705652521953118952_104835java.net.SocketTimeoutException: 69000 millis > timeout while waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/x.x.x.6:48141 > remote=/x.x.x.6:50010] > Possibly. Whats the log history like before that? What was going on in the local DataNode at that time? Seems like we failed talking to it (I'm guessing x.x.x.6 the local datanode). Could be though because of GC pause. > It is followed by zookeeper complaining due to lack of a response. > > 2011-01-27 07:02:17,665 INFO org.apache.zookeeper.ClientCnxn: Client session > timed out, have not heard from server in 94590ms for sessionid > 0x2dbdc88700000e, closing socket connection and attempting reconnect > This is strong indicator that there was a long GC pause around this time. > There is also a message about the data node. > > 2011-01-27 07:02:17,665 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block blk_4267667433820446273_104837 bad datanode[0] > x.x.x.6:50010 > This is probably just the DFSClient up in the RegionServer marking the local datanode bad because of the above timeout talking to it (possibly because of GC pause). Its marking the first datanode in the pipeline, probably the first, as bad.. Its probably going to try the next node in the pipeline. > And eventually the node is brought down. > > 2011-01-27 07:02:17,783 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region > server... > This happened because session expired on zk? You should be able to see up earlier in the log. > The data node also shows some errors. > > 2011-01-27 07:02:17,667 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(x.x.x.6:50010, > storageID=DS-1438948528-x.x.x.6-50010-1295969305669, infoPort=50075, > ipcPort=50020):DataXceiver java.net.SocketException: Connection reset > Oh, you did check the DN logs. So, does this look like DN logging the RS dropping the connection because of timeout? (Check the IPs -- see if it makes sense reading it that way). > > Any help, advice, ideas, or guesses would be greatly appreciated. Can anyone > sustain 30-40k writes/node/sec for days/weeks on end without using the bulk > loader? Am I rolling a rock uphill against the reality of the JVM? > You are going to run into heap fragmentation that brings on the stop-the-world GCs. Some good progress has been made on this high-priority issue recently, see https://issues.apache.org/jira/browse/HBASE-3455, but a 'solution' is still to do. Even then, my guess is that we would not be able to guarantee our never running into a stop-the-world GC. Are you all about writes? If so, the bulk loader runs at one to two orders of magnitude faster than going via API. For this reason alone, its worth exploring (You can do incremental bulk loads to a running table). If reads are rare, you might down the size of your heap so the stop-the-worlds run quicker or you try giving HBase a really large heap; perhaps the elbow room will make it so we avoid fragmentation (but it might make it worse). Reads and reading from cache do not seem to bring on the stop-the-world GC the way writes do. Are your writes latency sensitive such that you can't tolerate the occasional long pause? If not, up your zk lease timeouts. St.Ack
