[jira] [Commented] (HBASE-19681) Online snapshot creation failing with missing store file
[ https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413820#comment-16413820 ] Saad Mufti commented on HBASE-19681: Restarting the region server worked for us also. > Online snapshot creation failing with missing store file > > > Key: HBASE-19681 > URL: https://issues.apache.org/jira/browse/HBASE-19681 > Project: HBase > Issue Type: Bug > Components: backuprestore, Performance, scaling, snapshots >Affects Versions: 1.3.0 > Environment: Hadoop - 2.7.3 > HBase 1.3.0 > OS - GNU/Linux x86_64 > Cluster - Amazon Elastic Mapreduce >Reporter: Anirban Roy >Priority: Major > Attachments: region-server-missing file-log.doc, > region-server-snapshot-exception-log.doc > > > We are facing problem creating online snapshot of our HBase table. The table > contains 20TB data and receiving ~1 writes per second. The snapshot > creating failing intermittently with error that some hfile missing, see the > detailed output below. Once we locate the region server hosting the region > and restart the region server, snapshot creation succeeds. It seems the > missing hfile removed due to minor compaction, but region server still holds > the pointer to the file. > [hadoop@ip-10-0-12-164 ~]$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017 > > hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’ > > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { > ss=x_snapshot table=x_table type=FLUSH } had an error. Procedure x_snapshot > { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, > ip-10-0-0-32.ec2.internal,16020,1508372591059, > ip-10-0-14-221.ec2.internal,16020,1508372580873, > ip-10-0-15-185.ec2.internal,16020,1508372588507, > ip-10-0-9-43.ec2.internal,16020,1508372569107, > ip-10-0-10-62.ec2.internal,16020,1512885921693, > ip-10-0-8-216.ec2.internal,16020,1508372584133, > ip-10-0-1-207.ec2.internal,16020,1508372580144, > ip-10-0-0-173.ec2.internal,16020,1508372584969, > ip-10-0-4-79.ec2.internal,16020,1508372587161, > ip-10-0-3-165.ec2.internal,16020,1508372593566, > ip-10-0-14-137.ec2.internal,16020,1508372583225, > ip-10-0-6-33.ec2.internal,16020,1508372581587, > ip-10-0-15-199.ec2.internal,16020,1508372587478, > ip-10-0-5-253.ec2.internal,16020,1508372581243, > ip-10-0-1-99.ec2.internal,16020,1508372609684] } > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354) > at > org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via > ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) > at > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344) > ... 6 more > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:140) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:160) > at >
[jira] [Commented] (HBASE-19681) Online snapshot creation failing with missing store file
[ https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1641#comment-1641 ] Saad Mufti commented on HBASE-19681: We are facing the exact same situation in HBase 1.4.0 on AWS EMR based HBase. Anyone have any potential recovery process? We haven't tried restart but we migrated the region using the "assign" command in the shell that moved the region but the problems persists. We have also seen the exception in both the snapshot thread and compaction thread. > Online snapshot creation failing with missing store file > > > Key: HBASE-19681 > URL: https://issues.apache.org/jira/browse/HBASE-19681 > Project: HBase > Issue Type: Bug > Components: backuprestore, Performance, scaling, snapshots >Affects Versions: 1.3.0 > Environment: Hadoop - 2.7.3 > HBase 1.3.0 > OS - GNU/Linux x86_64 > Cluster - Amazon Elastic Mapreduce >Reporter: Anirban Roy >Priority: Major > Attachments: region-server-missing file-log.doc, > region-server-snapshot-exception-log.doc > > > We are facing problem creating online snapshot of our HBase table. The table > contains 20TB data and receiving ~1 writes per second. The snapshot > creating failing intermittently with error that some hfile missing, see the > detailed output below. Once we locate the region server hosting the region > and restart the region server, snapshot creation succeeds. It seems the > missing hfile removed due to minor compaction, but region server still holds > the pointer to the file. > [hadoop@ip-10-0-12-164 ~]$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017 > > hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’ > > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { > ss=x_snapshot table=x_table type=FLUSH } had an error. Procedure x_snapshot > { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, > ip-10-0-0-32.ec2.internal,16020,1508372591059, > ip-10-0-14-221.ec2.internal,16020,1508372580873, > ip-10-0-15-185.ec2.internal,16020,1508372588507, > ip-10-0-9-43.ec2.internal,16020,1508372569107, > ip-10-0-10-62.ec2.internal,16020,1512885921693, > ip-10-0-8-216.ec2.internal,16020,1508372584133, > ip-10-0-1-207.ec2.internal,16020,1508372580144, > ip-10-0-0-173.ec2.internal,16020,1508372584969, > ip-10-0-4-79.ec2.internal,16020,1508372587161, > ip-10-0-3-165.ec2.internal,16020,1508372593566, > ip-10-0-14-137.ec2.internal,16020,1508372583225, > ip-10-0-6-33.ec2.internal,16020,1508372581587, > ip-10-0-15-199.ec2.internal,16020,1508372587478, > ip-10-0-5-253.ec2.internal,16020,1508372581243, > ip-10-0-1-99.ec2.internal,16020,1508372609684] } > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354) > at > org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via > ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) > at > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344) > ... 6 more > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347) > at >
[jira] [Commented] (HBASE-19681) Online snapshot creation failing with missing store file
[ https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314095#comment-16314095 ] Anirban Roy commented on HBASE-19681: - Could it be related to [lHBASE-16754|https://issues.apache.org/jira/browse/HBASE-16754] ? The stack traces looks identical. > Online snapshot creation failing with missing store file > > > Key: HBASE-19681 > URL: https://issues.apache.org/jira/browse/HBASE-19681 > Project: HBase > Issue Type: Bug > Components: backup, Performance, scaling, snapshots >Affects Versions: 1.3.0 > Environment: Hadoop - 2.7.3 > HBase 1.3.0 > OS - GNU/Linux x86_64 > Cluster - Amazon Elastic Mapreduce >Reporter: Anirban Roy > Attachments: region-server-missing file-log.doc, > region-server-snapshot-exception-log.doc > > > We are facing problem creating online snapshot of our HBase table. The table > contains 20TB data and receiving ~1 writes per second. The snapshot > creating failing intermittently with error that some hfile missing, see the > detailed output below. Once we locate the region server hosting the region > and restart the region server, snapshot creation succeeds. It seems the > missing hfile removed due to minor compaction, but region server still holds > the pointer to the file. > [hadoop@ip-10-0-12-164 ~]$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017 > > hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’ > > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { > ss=x_snapshot table=x_table type=FLUSH } had an error. Procedure x_snapshot > { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, > ip-10-0-0-32.ec2.internal,16020,1508372591059, > ip-10-0-14-221.ec2.internal,16020,1508372580873, > ip-10-0-15-185.ec2.internal,16020,1508372588507, > ip-10-0-9-43.ec2.internal,16020,1508372569107, > ip-10-0-10-62.ec2.internal,16020,1512885921693, > ip-10-0-8-216.ec2.internal,16020,1508372584133, > ip-10-0-1-207.ec2.internal,16020,1508372580144, > ip-10-0-0-173.ec2.internal,16020,1508372584969, > ip-10-0-4-79.ec2.internal,16020,1508372587161, > ip-10-0-3-165.ec2.internal,16020,1508372593566, > ip-10-0-14-137.ec2.internal,16020,1508372583225, > ip-10-0-6-33.ec2.internal,16020,1508372581587, > ip-10-0-15-199.ec2.internal,16020,1508372587478, > ip-10-0-5-253.ec2.internal,16020,1508372581243, > ip-10-0-1-99.ec2.internal,16020,1508372609684] } > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354) > at > org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via > ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) > at > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344) > ... 6 more > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:140) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:160) > at >
[jira] [Commented] (HBASE-19681) Online snapshot creation failing with missing store file
[ https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313972#comment-16313972 ] Anirban Roy commented on HBASE-19681: - Also see the following exception in region server during compaction - 2018-01-05 13:31:55,910 ERROR [regionserver/ip-10-0-1-237.ec2.internal/10.0.1.237:16020-longCompactions-1508372592608] regionserver.CompactSplitThread: Compaction selection failed Store = d, pri = 5 java.io.FileNotFoundException: File does not exist: hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/396a31774fbb8b8ed1020850e6035973/d/4a46f33587ae43d2986cbf0e45379c83 at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:431) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) at org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:321) at org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) at org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:64) at org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) at org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1661) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.selectCompaction(CompactSplitThread.java:369) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.access$100(CompactSplitThread.java:59) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:494) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:564) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) > Online snapshot creation failing with missing store file > > > Key: HBASE-19681 > URL: https://issues.apache.org/jira/browse/HBASE-19681 > Project: HBase > Issue Type: Bug > Components: backup, Performance, scaling, snapshots >Affects Versions: 1.3.0 > Environment: Hadoop - 2.7.3 > HBase 1.3.0 > OS - GNU/Linux x86_64 > Cluster - Amazon Elastic Mapreduce >Reporter: Anirban Roy > Attachments: region-server-missing file-log.doc, > region-server-snapshot-exception-log.doc > > > We are facing problem creating online snapshot of our HBase table. The table > contains 20TB data and receiving ~1 writes per second. The snapshot > creating failing intermittently with error that some hfile missing, see the > detailed output below. Once we locate the region server hosting the region > and restart the region server, snapshot creation succeeds. It seems the > missing hfile removed due to minor compaction, but region server still holds > the pointer to the file. > [hadoop@ip-10-0-12-164 ~]$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017 > > hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’ > > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { > ss=x_snapshot table=x_table type=FLUSH } had an error. Procedure x_snapshot > { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, > ip-10-0-0-32.ec2.internal,16020,1508372591059, > ip-10-0-14-221.ec2.internal,16020,1508372580873, > ip-10-0-15-185.ec2.internal,16020,1508372588507, > ip-10-0-9-43.ec2.internal,16020,1508372569107, > ip-10-0-10-62.ec2.internal,16020,1512885921693, > ip-10-0-8-216.ec2.internal,16020,1508372584133, > ip-10-0-1-207.ec2.internal,16020,1508372580144, > ip-10-0-0-173.ec2.internal,16020,1508372584969, > ip-10-0-4-79.ec2.internal,16020,1508372587161, >
[jira] [Commented] (HBASE-19681) Online snapshot creation failing with missing store file
[ https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309160#comment-16309160 ] Anirban Roy commented on HBASE-19681: - I attached the region server log snippet where I found the reference of the missing HFile. Note the timestamp in three log statements which dealt with the file. Apart from the last ERROR level log, I did not find any other ERROR/WARN level statements in log for the region. Do you have any clue what might have gone wrong? If that missing file subsumed by subsequent file due to minor compaction, wouldn't there be a mention in the log? I can't move to 1.4.0 now(considerable effort), but may consider once I know what is the real deal here. > Online snapshot creation failing with missing store file > > > Key: HBASE-19681 > URL: https://issues.apache.org/jira/browse/HBASE-19681 > Project: HBase > Issue Type: Bug > Components: backup, snapshots >Affects Versions: 1.3.0 > Environment: Hadoop - 2.7.3 > HBase 1.3.0 > OS - GNU/Linux x86_64 > Cluster - Amazon Elastic Mapreduce >Reporter: Anirban Roy > Attachments: region-server-missing file-log.doc > > > We are facing problem creating online snapshot of our HBase table. The table > contains 20TB data and receiving ~1 writes per second. The snapshot > creating failing intermittently with error that some hfile missing, see the > detailed output below. Once we locate the region server hosting the region > and restart the region server, snapshot creation succeeds. It seems the > missing hfile removed due to minor compaction, but region server still holds > the pointer to the file. > [hadoop@ip-10-0-12-164 ~]$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017 > > hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’ > > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { > ss=x_snapshot table=x_table type=FLUSH } had an error. Procedure x_snapshot > { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, > ip-10-0-0-32.ec2.internal,16020,1508372591059, > ip-10-0-14-221.ec2.internal,16020,1508372580873, > ip-10-0-15-185.ec2.internal,16020,1508372588507, > ip-10-0-9-43.ec2.internal,16020,1508372569107, > ip-10-0-10-62.ec2.internal,16020,1512885921693, > ip-10-0-8-216.ec2.internal,16020,1508372584133, > ip-10-0-1-207.ec2.internal,16020,1508372580144, > ip-10-0-0-173.ec2.internal,16020,1508372584969, > ip-10-0-4-79.ec2.internal,16020,1508372587161, > ip-10-0-3-165.ec2.internal,16020,1508372593566, > ip-10-0-14-137.ec2.internal,16020,1508372583225, > ip-10-0-6-33.ec2.internal,16020,1508372581587, > ip-10-0-15-199.ec2.internal,16020,1508372587478, > ip-10-0-5-253.ec2.internal,16020,1508372581243, > ip-10-0-1-99.ec2.internal,16020,1508372609684] } > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354) > at > org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via > ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) > at > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344) > ... 6 more > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at >
[jira] [Commented] (HBASE-19681) Online snapshot creation failing with missing store file
[ https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307526#comment-16307526 ] Ted Yu commented on HBASE-19681: Can you upload region server log so that we know more about f76d8827c29244b99bf9344982956523 ? If possible, please upgrade to 1.4.0 which has related fixes such as: HBASE-19468 FNFE during scans and flushes > Online snapshot creation failing with missing store file > > > Key: HBASE-19681 > URL: https://issues.apache.org/jira/browse/HBASE-19681 > Project: HBase > Issue Type: Bug > Components: backup, snapshots >Affects Versions: 1.3.0 > Environment: Hadoop - 2.7.3 > HBase 1.3.0 > OS - GNU/Linux x86_64 > Cluster - Amazon Elastic Mapreduce >Reporter: Anirban Roy > > We are facing problem creating online snapshot of our HBase table. The table > contains 20TB data and receiving ~1 writes per second. The snapshot > creating failing intermittently with error that some hfile missing, see the > detailed output below. Once we locate the region server hosting the region > and restart the region server, snapshot creation succeeds. It seems the > missing hfile removed due to minor compaction, but region server still holds > the pointer to the file. > [hadoop@ip-10-0-12-164 ~]$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017 > > hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’ > > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { > ss=x_snapshot table=x_table type=FLUSH } had an error. Procedure x_snapshot > { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, > ip-10-0-0-32.ec2.internal,16020,1508372591059, > ip-10-0-14-221.ec2.internal,16020,1508372580873, > ip-10-0-15-185.ec2.internal,16020,1508372588507, > ip-10-0-9-43.ec2.internal,16020,1508372569107, > ip-10-0-10-62.ec2.internal,16020,1512885921693, > ip-10-0-8-216.ec2.internal,16020,1508372584133, > ip-10-0-1-207.ec2.internal,16020,1508372580144, > ip-10-0-0-173.ec2.internal,16020,1508372584969, > ip-10-0-4-79.ec2.internal,16020,1508372587161, > ip-10-0-3-165.ec2.internal,16020,1508372593566, > ip-10-0-14-137.ec2.internal,16020,1508372583225, > ip-10-0-6-33.ec2.internal,16020,1508372581587, > ip-10-0-15-199.ec2.internal,16020,1508372587478, > ip-10-0-5-253.ec2.internal,16020,1508372581243, > ip-10-0-1-99.ec2.internal,16020,1508372609684] } > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354) > at > org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via > ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) > at > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344) > ... 6 more > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:140) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:160) > at >
[jira] [Commented] (HBASE-19681) Online snapshot creation failing with missing store file
[ https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307522#comment-16307522 ] Anirban Roy commented on HBASE-19681: - We also want to know if there is any potential data loss due to this error. Looking at the region server log, we see a reference to the hfile, a few other hfiles compacted to this file but no reference that this particular hfile being compacted to newer hfile. But when we check HDFS, the file is really missing. Once the region server get restarted, it no more complains about the missing hfile. Hence, this is very important to know the behavior and any impact due that, before we get a fix here. > Online snapshot creation failing with missing store file > > > Key: HBASE-19681 > URL: https://issues.apache.org/jira/browse/HBASE-19681 > Project: HBase > Issue Type: Bug > Components: backup, snapshots >Affects Versions: 1.3.0 > Environment: Hadoop - 2.7.3 > HBase 1.3.0 > OS - GNU/Linux x86_64 > Cluster - Amazon Elastic Mapreduce >Reporter: Anirban Roy > > We are facing problem creating online snapshot of our HBase table. The table > contains 20TB data and receiving ~1 writes per second. The snapshot > creating failing intermittently with error that some hfile missing, see the > detailed output below. Once we locate the region server hosting the region > and restart the region server, snapshot creation succeeds. It seems the > missing hfile removed due to minor compaction, but region server still holds > the pointer to the file. > [hadoop@ip-10-0-12-164 ~]$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017 > > hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’ > > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { > ss=x_snapshot table=x_table type=FLUSH } had an error. Procedure x_snapshot > { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, > ip-10-0-0-32.ec2.internal,16020,1508372591059, > ip-10-0-14-221.ec2.internal,16020,1508372580873, > ip-10-0-15-185.ec2.internal,16020,1508372588507, > ip-10-0-9-43.ec2.internal,16020,1508372569107, > ip-10-0-10-62.ec2.internal,16020,1512885921693, > ip-10-0-8-216.ec2.internal,16020,1508372584133, > ip-10-0-1-207.ec2.internal,16020,1508372580144, > ip-10-0-0-173.ec2.internal,16020,1508372584969, > ip-10-0-4-79.ec2.internal,16020,1508372587161, > ip-10-0-3-165.ec2.internal,16020,1508372593566, > ip-10-0-14-137.ec2.internal,16020,1508372583225, > ip-10-0-6-33.ec2.internal,16020,1508372581587, > ip-10-0-15-199.ec2.internal,16020,1508372587478, > ip-10-0-5-253.ec2.internal,16020,1508372581243, > ip-10-0-1-99.ec2.internal,16020,1508372609684] } > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354) > at > org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via > ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) > at > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344) > ... 6 more > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347) > at >