Just by keep cheking http://master:60010. Before Step 2 : AddressStart CodeLoadserver4.yun.com:600301311785159202requests=0, regions=10, usedHeap=32, maxHeap=995server5.yun.com:600301311768553647requests=18, regions=7, usedHeap=117, maxHeap=995Total:servers: 2 requests=18, regions=17Then at Step 2, I shut server4 and wait until the html shows like this: AddressStart CodeLoad
server5.yun.com:600301311768553647requests=18, regions=17, usedHeap=117, maxHeap=995Total:servers: 2 requests=18, regions=17then I continued the following steps.. 在 2011年7月28日 上午12:40,Chris Tarnas <[email protected]>写道: > That is strange behavior. How long did you wait between Step 2 and 3, and > what is the results of running > > hbase hbck > > at step 3? > > -chris > > On Jul 27, 2011, at 9:23 AM, 吴限 wrote: > > > Thx for your reply. But actually later I did another experiment similar > to > > one which I explained earlier. > > Step 1: I inserted some data into the hbase. > > Step 2: I shut one of the region servers. > > Step 3 : I checked the table and found some data had been lost. > > Step 4: I disabled the table and then enabled the table > > Step 5 : I checked again and found nothing lost. > > > > If some data didn't exist in the other region server, then how can u > explain > > this? > > > > Hope to get ur reply.Thx~ > > > > 2011/7/28 Chris Tarnas <[email protected]> > > > >> Replication of 1x means no replication. 2x would mean the data exists in > >> two locations (what it looks like you want). Running with a replication > of > >> 1x is a very bad idea and is pretty much a guaranteed way to get data > loss. > >> > >> -chris > >> > >> On Jul 27, 2011, at 8:58 AM, 吴限 wrote: > >> > >>> Hi everyone. I'd like to run the following *data* *loss* scenario by > you > >> to > >>> see if > >>> we are doing something obviously wrong with our setup here. > >>> > >>> Setup: > >>> -cdh3u0 > >>> - Hadoop 0.20.2 > >>> - HBase 0.90.1 > >>> - 1 Master Node running as NameNode & JobTracker > >>> -zookeeper quorum > >>> - 2 child nodes running as Datanode, TaskTracker and RegionServer each > >>> - dfs.replication is set to 1 > >>> > >>> First, I inserted some data into the hbase a few hours ago. > >>> Then after a while. I rebooted one of the region servers and waited > until > >>> the master responded to that. However, after I checked the table using > >> hbase > >>> shell (I used the "count" command), I noticed that there was a huge > >> amount > >>> of data being lost. > >>> After I restarted the regionserver which I had rebooted and checked > >> again, > >>> I found that some of the missing data was got back but there still > >> existed > >>> some data which hadn't been found yet. > >>> At last,after I disabled the table and then enabled the table , I found > >> that > >>> all data was stored in the cluster and there was no data that was lost. > >>> > >>> This is problematic since we are supposed to > >>> replicate at x1, so at least one other node should be able to > >>> theoretically serve the *data* that the downed regionserver can't. > >>> > >>> Questions: > >>> > >>> - How can you guys explain this weird situation? > >>> - Are there way to recover such lost *data*? > >>> > >>> Any tips here are definitely appreciated. I'll be happy to provide more > >>> information as well.-0 > >> > >> > >
