Since you did the region mover thing then all the region should be off
from that region server, so data loss is highly unlikely.

My theory would be that a compaction was underway and got cancelled
when the region server was told to close the region, but the
DFSOutputStream thread hasn't been cleanup yet. Probably the file was
even already deleted and the data is still safe in the un-compacted
files.

Ways to confirm this:

 - Look in the region server log in the preceding lines what happens
to that region (named 0fd5a7b472fb092a4470e9505c3a421a) when it's
closing. Do you see a compaction being cancelled?
 - Look in the Namenode log for that file, is it being asked to delete
it or something before it sends a NotReplicatedYetException?

Hope this helps,

J-D

On Thu, Dec 29, 2011 at 3:11 PM, Alok Singh <[email protected]> wrote:
> When attempting to gracefully shutdown a regionserver, I saw a couple
> of NotReplicatedYet exceptions in the logs (below). Can't find the
> file that is causing this exception in on the HDFS filesystem. Have we
> potentially lost the data, or is this exception benign?
>
>
> Alok
>
> hbase: 0.90.3
> hadoop: 0.20.2-cdh3u0
>
> Gracefull shutdown process:
> hbase(main):001:0> balance_switch false
>
> hbase org.jruby.Main current/bin/region_mover.rb unload <IP>
>
> After the region count for the regionserver is 0, we kill the
> regionserver process.
>
> ------------------Log-------------------------
> 2011-12-29 17:11:03,768 - INFO  [PRI IPC Server handler 6 on
> 7040:HRegionServer@2142] - Received close region:
> rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
> 2011-12-29 17:11:03,787 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:03,803 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:03,911 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Writer@868]
> - Bloom added to HFile
> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990):
> 7.1k, 5225/5276 (99%)
> 2011-12-29 17:11:03,944 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@494] -
> Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990
> to 
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
> 2011-12-29 17:11:03,961 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Reader@1027]
> - Loaded col bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
> 2011-12-29 17:11:03,961 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965,
> entries=5276, sequenceid=1219931822, memsize=2.0m, filesize=384.4k
> 2011-12-29 17:11:03,964 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:HRegion@554] -
> Closed 
> rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
> 2011-12-29 17:11:05,706 - INFO  [PRI IPC Server handler 7 on
> 7040:HRegionServer@2142] - Received close region:
> rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
> 2011-12-29 17:11:05,716 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:05,735 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:05,808 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Writer@868]
> - Bloom added to HFile
> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097):
> 6.5k, 4727/4737 (100%)
> 2011-12-29 17:11:05,834 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@494] -
> Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097
> to 
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
> 2011-12-29 17:11:05,852 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Reader@1027]
> - Loaded col bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
> 2011-12-29 17:11:05,852 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492,
> entries=4737, sequenceid=1219931829, memsize=1.4m, filesize=246.5k
> 2011-12-29 17:11:05,856 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:HRegion@554] -
> Closed 
> rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
> 2011-12-29 17:14:58,942 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@100] -
> Shutdown hook starting; hbase.shutdown.hook=true;
> fsShutdownHook=Thread[Thread-15,5,main]
> 2011-12-29 17:14:58,942 - INFO
> [Shutdownhook:regionserver7040:HRegionServer@1342] - STOPPED: Shutdown
> hook
> 2011-12-29 17:14:59,476 - INFO  [regionserver7040:HBaseServer@1234] -
> Stopping server on 7040
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 2 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 2 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 9 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 9 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 4 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 4 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 4 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 4 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 5 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 5 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 6 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 6 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 7 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 7 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 8 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 8 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [PRI IPC Server handler 0 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 0 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 1 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 1 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 3 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 3 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 5 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 5 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 6 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 6 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 7 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 7 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 9 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 9 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 0 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 0 on 7040: exiting
> 2011-12-29 17:14:59,480 - INFO  [IPC Server
> Responder:HBaseServer$Responder@649] - Stopping IPC Server Responder
> 2011-12-29 17:14:59,479 - INFO  [IPC Server listener on
> 7040:HBaseServer$Listener@450] - Stopping IPC Server listener on 7040
> 2011-12-29 17:14:59,477 - INFO  [regionserver7040:HRegionServer@636] -
> Stopping infoServer
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 8 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 8 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 3 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 3 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 2 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 2 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 1 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 1 on 7040: exiting
> 2011-12-29 17:14:59,481 - INFO  [regionserver7040:Slf4jLog@67] -
> Stopped [email protected]:7041
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.logRoller:LogRoller@114] - LogRoller exiting.
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.cacheFlusher:MemStoreFlusher@266] -
> regionserver7040.cacheFlusher exiting
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.logSyncer:HLog$LogSyncer@966] -
> regionserver7040.logSyncer exiting
> 2011-12-29 17:14:59,518 - INFO
> [regionserver7040.majorCompactionChecker:Chore@79] -
> regionserver7040.majorCompactionChecker exiting
> 2011-12-29 17:14:59,518 - INFO
> [regionserver7040.compactor:CompactSplitThread@113] -
> regionserver7040.compactor exiting
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:HRegionServer@668] -
> stopping server at: 10.129.1.230,7040,1323990853621
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@124] -
> regionserver7040 closing leases
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@131] -
> regionserver7040 closed leases
> 2011-12-29 17:14:59,891 - INFO
> [regionserver7040:HConnectionManager$HConnectionImplementation@1067] -
> Closed zookeeper sessionid=0x4343f266cbc0012
> 2011-12-29 17:14:59,894 - INFO  [regionserver7040:ZooKeeper@538] -
> Session: 0x4343f266cbc0012 closed
> 2011-12-29 17:14:59,894 - INFO
> [main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down
> 2011-12-29 17:14:59,897 - INFO  [regionserver7040:ZooKeeper@538] -
> Session: 0x1343f2669140013 closed
> 2011-12-29 17:14:59,898 - INFO  [regionserver7040:HRegionServer@686] -
> regionserver7040 exiting
> 2011-12-29 17:14:59,899 - INFO
> [regionserver7040-EventThread:ClientCnxn$EventThread@520] -
> EventThread shut down
> 2011-12-29 17:14:59,900 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@106] -
> Starting fs shutdown hook thread.
> 2011-12-29 17:14:59,900 - ERROR
> [Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> : org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated 
> yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>        at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>        at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated 
> yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>        at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>        at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>        at $Proxy5.addBlock(Unknown Source)
>        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy5.addBlock(Unknown Source)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
> 2011-12-29 17:14:59,913 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@112] -
> Shutdown hook finished.
> ------------------------

Reply via email to