When you shutdown the region server, check the master logs to see if master has detected this condition. I've seen weird things happen if dns is not setup correctly - so, check if master (logs & ui) is correctly detecting that the region server is down after step 2.
--Suraj 2011/7/27 吴限 <[email protected]>: > Just by keep cheking http://master:60010. > Before Step 2 : > AddressStart CodeLoadserver4.yun.com:600301311785159202requests=0, > regions=10, usedHeap=32, > maxHeap=995server5.yun.com:600301311768553647requests=18, > regions=7, usedHeap=117, maxHeap=995Total:servers: 2 requests=18, > regions=17Then > at Step 2, I shut server4 and wait until the html shows like this: > AddressStart CodeLoad > > server5.yun.com:600301311768553647requests=18, regions=17, usedHeap=117, > maxHeap=995Total:servers: 2 requests=18, regions=17then I continued the > following steps.. > > 在 2011年7月28日 上午12:40,Chris Tarnas <[email protected]>写道: > >> That is strange behavior. How long did you wait between Step 2 and 3, and >> what is the results of running >> >> hbase hbck >> >> at step 3? >> >> -chris >> >> On Jul 27, 2011, at 9:23 AM, 吴限 wrote: >> >> > Thx for your reply. But actually later I did another experiment similar >> to >> > one which I explained earlier. >> > Step 1: I inserted some data into the hbase. >> > Step 2: I shut one of the region servers. >> > Step 3 : I checked the table and found some data had been lost. >> > Step 4: I disabled the table and then enabled the table >> > Step 5 : I checked again and found nothing lost. >> > >> > If some data didn't exist in the other region server, then how can u >> explain >> > this? >> > >> > Hope to get ur reply.Thx~ >> > >> > 2011/7/28 Chris Tarnas <[email protected]> >> > >> >> Replication of 1x means no replication. 2x would mean the data exists in >> >> two locations (what it looks like you want). Running with a replication >> of >> >> 1x is a very bad idea and is pretty much a guaranteed way to get data >> loss. >> >> >> >> -chris >> >> >> >> On Jul 27, 2011, at 8:58 AM, 吴限 wrote: >> >> >> >>> Hi everyone. I'd like to run the following *data* *loss* scenario by >> you >> >> to >> >>> see if >> >>> we are doing something obviously wrong with our setup here. >> >>> >> >>> Setup: >> >>> -cdh3u0 >> >>> - Hadoop 0.20.2 >> >>> - HBase 0.90.1 >> >>> - 1 Master Node running as NameNode & JobTracker >> >>> -zookeeper quorum >> >>> - 2 child nodes running as Datanode, TaskTracker and RegionServer each >> >>> - dfs.replication is set to 1 >> >>> >> >>> First, I inserted some data into the hbase a few hours ago. >> >>> Then after a while. I rebooted one of the region servers and waited >> until >> >>> the master responded to that. However, after I checked the table using >> >> hbase >> >>> shell (I used the "count" command), I noticed that there was a huge >> >> amount >> >>> of data being lost. >> >>> After I restarted the regionserver which I had rebooted and checked >> >> again, >> >>> I found that some of the missing data was got back but there still >> >> existed >> >>> some data which hadn't been found yet. >> >>> At last,after I disabled the table and then enabled the table , I found >> >> that >> >>> all data was stored in the cluster and there was no data that was lost. >> >>> >> >>> This is problematic since we are supposed to >> >>> replicate at x1, so at least one other node should be able to >> >>> theoretically serve the *data* that the downed regionserver can't. >> >>> >> >>> Questions: >> >>> >> >>> - How can you guys explain this weird situation? >> >>> - Are there way to recover such lost *data*? >> >>> >> >>> Any tips here are definitely appreciated. I'll be happy to provide more >> >>> information as well.-0 >> >> >> >> >> >> >
