1)I found one of my regionserver close with exception, why does this happen??
I see this in your log: 2010-07-15 20:17:31,260 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to master for 295729663 milliseconds - retrying Congratulations. Thats the longest pause seen up on these lists. What happened? Machine went into a terminal swap? A garbage collection that never recovered? Time on your machine jumped? Your regionserver lost its session with zookeeper so shut itself down. 2) I have 3 machine run hbase: one runs master, the other two run regionserver, each machine run zookeeper. and I set HBASE_HEAPSIZE=4000 and use 64-bit jdk This should be fine. Check your cluster monitoring systems. They might help you figure what happened about about the time of the long sleep above (look for a spike in i/o or swapping, etc.) St.Ack
