Hi, Esteban, There is no region split in this cluster, since we put the region size upper bound to be really high to prevent splitting.
I think it happens for all the regions of this table. I repeatedly run "hdfs dfs -lsr /hbase/.hbase-snapshot/ss_rich_pin_data_v1" while taking snapshot, no region was able to write into this direction. I also turn on DEBUG logging on RS, all RS just report fail with Timeout, with no specific reason. Thanks Tian-Ying On Tue, May 19, 2015 at 11:06 AM, Esteban Gutierrez <[email protected]> wrote: > Hi Tianying, > > Is this happening consistently in this region or is it happening randomly > across other regions too? One possibility is that there was a split going > on at the time you started to take the snapshot and it failed. If you look > into /hbase/rich_pin_data_v1 can you find a directory named > dff681880bb2b23d0351d6656a1dbbb9 in there? > > cheers, > esteban. > > > -- > Cloudera, Inc. > > > On Mon, May 18, 2015 at 11:12 PM, Tianying Chang <[email protected]> > wrote: > > > Hi, > > > > We have a cluster that used to be able to take snapshot. But recently, > one > > table failed due to the error below. Other tables on the same clusters > are > > fine. > > > > Any idea what could go wrong? Is the table not healthy? But I run hbase > > hbck, it reports cluster healthy. > > > > BTW, we are running 94.7, so we need to take snapshot of the data to > export > > to a new cluster of 94.26 as part of upgrade (and eventually upgrade to > > 1.x) > > > > Thanks > > Tian-Ying > > > > > > 015-05-19 06:00:45,505 ERROR > > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed > taking > > snapshot { ss=ss_rich_pin_data_v1 table=rich_pin_data_v1 type=SKIPFLUSH } > > due to exception:No region directory found for region:{NAME => > > 'rich_pin_data_v1,,1389319134976.dff681880bb2b23d0351d6656a1dbbb9.', > > STARTKEY => '', ENDKEY => '001ff3a165ff571471603035ca7b4be9', ENCODED => > > dff681880bb2b23d0351d6656a1dbbb9,} > > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: No region > > directory found for region:{NAME => > > 'rich_pin_data_v1,,1389319134976.dff681880bb2b23d0351d6656a1dbbb9.', > > STARTKEY => '', ENDKEY => '001ff3a165ff571471603035ca7b4be9', ENCODED => > > dff681880bb2b23d0351d6656a1dbbb9,} > > at > > > > > org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegion(MasterSnapshotVerifier.java:167) > > at > > > > > org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:152) > > at > > > > > org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:115) > > at > > > > > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:156) > > at > > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > 2015-05-19 06:00:45,505 INFO > > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Stop taking > > snapshot={ ss=ss_rich_pin_data_v1 table=rich_pin_data_v1 type=SKIPFLUSH } > > because: Failed to take snapshot '{ ss=ss_rich_pin_data_v1 > > table=rich_pin_data_v1 type=SKIPFLUSH }' due to exception > > 2015-05-19 06:00:49,745 WARN org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 50 on 60000 caught: java.lang.ArrayIndexOutOfBoundsException: 2 > > at java.util.Arrays$ArrayList.get(Arrays.java:3381) > > at > > java.util.Collections$UnmodifiableList.get(Collections.java:1152) > > at > > > > > org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$SnapshotDescription$Type.getValueDescriptor(HBaseProtos.java:99) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > > > com.google.protobuf.GeneratedMessage.invokeOrDie(GeneratedMessage.java:1369) > > at > > > com.google.protobuf.GeneratedMessage.access$1400(GeneratedMessage.java:57) > > at > > > > > com.google.protobuf.GeneratedMessage$FieldAccessorTable$SingularEnumFieldAccessor.get(GeneratedMessage.java:1670) > > at > > com.google.protobuf.GeneratedMessage.getField(GeneratedMessage.java:162) > > at > > > > > com.google.protobuf.GeneratedMessage.getAllFieldsMutable(GeneratedMessage.java:113) > > at > > > > > com.google.protobuf.GeneratedMessage.getAllFields(GeneratedMessage.java:152) > > at > > com.google.protobuf.TextFormat$Printer.print(TextFormat.java:228) > > at > > com.google.protobuf.TextFormat$Printer.access$200(TextFormat.java:217) > > at com.google.protobuf.TextFormat.print(TextFormat.java:68) > > at > > com.google.protobuf.TextFormat.printToString(TextFormat.java:115) > > at > > com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:86) > > at > > > > > org.apache.hadoop.hbase.snapshot.HSnapshotDescription.toString(HSnapshotDescription.java:72) > > at java.lang.String.valueOf(String.java:2826) > > at java.lang.StringBuilder.append(StringBuilder.java:115) > > at > > org.apache.hadoop.hbase.ipc.Invocation.toString(Invocation.java:152) > > at > > > org.apache.hadoop.hbase.ipc.HBaseServer$Call.toString(HBaseServer.java:304) > > at java.lang.String.valueOf(String.java:2826) > > at java.lang.StringBuilder.append(StringBuilder.java:115) > > >
