can you debug the protobuf problem, I think we abort because we are not
able to write

2015-05-19 06:00:49,745 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 50 on 60000 caught: java.lang.ArrayIndexOutOfBoundsException: 2
        at java.util.Arrays$ArrayList.get(Arrays.java:3381)
        at java.util.Collections$UnmodifiableList.get(Collections.java:1152)
        at
org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$SnapshotDescription$Type.getValueDescriptor(HBaseProtos.java:99)
...
com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:86)
        at
org.apache.hadoop.hbase.snapshot.HSnapshotDescription.toString(HSnapshotDescription.java:72)
        at java.lang.String.valueOf(String.java:2826)
        at java.lang.StringBuilder.append(StringBuilder.java:115)
        at
org.apache.hadoop.hbase.ipc.Invocation.toString(Invocation.java:152)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Call.toString(HBaseServer.java:304)

Matteo


On Tue, May 19, 2015 at 11:35 AM, Tianying Chang <[email protected]> wrote:

> Actually, I find it does not even print out the debug info below for this
> table, other table will print out this logging. So it seems it did not
> invoke the FlushSnapshotSubprocedure at all.
>
>
>  @Override
>     public Void call() throws Exception {
>       // Taking the region read lock prevents the individual region from
> being closed while a
>       // snapshot is in progress.  This is helpful but not sufficient for
> preventing races with
>       // snapshots that involve multiple regions and regionservers.  It is
> still possible to have
>       // an interleaving such that globally regions are missing, so we
> still need the verification
>       // step.
>       LOG.debug("Starting region operation on " + region);
>
> On Tue, May 19, 2015 at 11:26 AM, Tianying Chang <[email protected]>
> wrote:
>
> > Hi, Esteban,
> >
> > There is no region split in this cluster, since we put the region size
> > upper bound to be really high to prevent splitting.
> >
> > I think it happens for all the regions of this table.
> >
> > I repeatedly run "hdfs dfs -lsr
> > /hbase/.hbase-snapshot/ss_rich_pin_data_v1"  while taking snapshot, no
> > region was able to write into this direction. I also turn on DEBUG
> logging
> > on RS, all RS  just report fail with Timeout, with no specific reason.
> >
> > Thanks
> > Tian-Ying
> >
> > On Tue, May 19, 2015 at 11:06 AM, Esteban Gutierrez <
> [email protected]>
> > wrote:
> >
> >> Hi Tianying,
> >>
> >> Is this happening consistently in this region or is it happening
> randomly
> >> across other regions too? One possibility is that there was a split
> going
> >> on at the time you started to take the snapshot and it failed. If you
> look
> >> into /hbase/rich_pin_data_v1 can you find a directory named
> >> dff681880bb2b23d0351d6656a1dbbb9 in there?
> >>
> >> cheers,
> >> esteban.
> >>
> >>
> >> --
> >> Cloudera, Inc.
> >>
> >>
> >> On Mon, May 18, 2015 at 11:12 PM, Tianying Chang <[email protected]>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > We have a cluster that used to be able to take snapshot. But recently,
> >> one
> >> > table failed due to the error below. Other tables on the same clusters
> >> are
> >> > fine.
> >> >
> >> > Any idea what could go wrong? Is the table not healthy? But I run
> hbase
> >> > hbck, it reports cluster healthy.
> >> >
> >> > BTW, we are running 94.7, so we need to take snapshot of the data to
> >> export
> >> > to a new cluster of 94.26 as part of upgrade (and eventually upgrade
> to
> >> > 1.x)
> >> >
> >> > Thanks
> >> > Tian-Ying
> >> >
> >> >
> >> > 015-05-19 06:00:45,505 ERROR
> >> > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed
> >> taking
> >> > snapshot { ss=ss_rich_pin_data_v1 table=rich_pin_data_v1
> type=SKIPFLUSH
> >> }
> >> > due to exception:No region directory found for region:{NAME =>
> >> > 'rich_pin_data_v1,,1389319134976.dff681880bb2b23d0351d6656a1dbbb9.',
> >> > STARTKEY => '', ENDKEY => '001ff3a165ff571471603035ca7b4be9', ENCODED
> =>
> >> > dff681880bb2b23d0351d6656a1dbbb9,}
> >> > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: No region
> >> > directory found for region:{NAME =>
> >> > 'rich_pin_data_v1,,1389319134976.dff681880bb2b23d0351d6656a1dbbb9.',
> >> > STARTKEY => '', ENDKEY => '001ff3a165ff571471603035ca7b4be9', ENCODED
> =>
> >> > dff681880bb2b23d0351d6656a1dbbb9,}
> >> >         at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegion(MasterSnapshotVerifier.java:167)
> >> >         at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:152)
> >> >         at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:115)
> >> >         at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:156)
> >> >         at
> >> >
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
> >> >         at
> >> >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> >         at
> >> >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> >         at java.lang.Thread.run(Thread.java:662)
> >> > 2015-05-19 06:00:45,505 INFO
> >> > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Stop
> taking
> >> > snapshot={ ss=ss_rich_pin_data_v1 table=rich_pin_data_v1
> type=SKIPFLUSH
> >> }
> >> > because: Failed to take snapshot '{ ss=ss_rich_pin_data_v1
> >> > table=rich_pin_data_v1 type=SKIPFLUSH }' due to exception
> >> > 2015-05-19 06:00:49,745 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> >> Server
> >> > handler 50 on 60000 caught: java.lang.ArrayIndexOutOfBoundsException:
> 2
> >> >         at java.util.Arrays$ArrayList.get(Arrays.java:3381)
> >> >         at
> >> > java.util.Collections$UnmodifiableList.get(Collections.java:1152)
> >> >         at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$SnapshotDescription$Type.getValueDescriptor(HBaseProtos.java:99)
> >> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >         at
> >> >
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >         at
> >> >
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >> >         at
> >> >
> >> >
> >>
> com.google.protobuf.GeneratedMessage.invokeOrDie(GeneratedMessage.java:1369)
> >> >         at
> >> >
> >>
> com.google.protobuf.GeneratedMessage.access$1400(GeneratedMessage.java:57)
> >> >         at
> >> >
> >> >
> >>
> com.google.protobuf.GeneratedMessage$FieldAccessorTable$SingularEnumFieldAccessor.get(GeneratedMessage.java:1670)
> >> >         at
> >> >
> com.google.protobuf.GeneratedMessage.getField(GeneratedMessage.java:162)
> >> >         at
> >> >
> >> >
> >>
> com.google.protobuf.GeneratedMessage.getAllFieldsMutable(GeneratedMessage.java:113)
> >> >         at
> >> >
> >> >
> >>
> com.google.protobuf.GeneratedMessage.getAllFields(GeneratedMessage.java:152)
> >> >         at
> >> > com.google.protobuf.TextFormat$Printer.print(TextFormat.java:228)
> >> >         at
> >> > com.google.protobuf.TextFormat$Printer.access$200(TextFormat.java:217)
> >> >         at com.google.protobuf.TextFormat.print(TextFormat.java:68)
> >> >         at
> >> > com.google.protobuf.TextFormat.printToString(TextFormat.java:115)
> >> >         at
> >> > com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:86)
> >> >         at
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.snapshot.HSnapshotDescription.toString(HSnapshotDescription.java:72)
> >> >         at java.lang.String.valueOf(String.java:2826)
> >> >         at java.lang.StringBuilder.append(StringBuilder.java:115)
> >> >         at
> >> > org.apache.hadoop.hbase.ipc.Invocation.toString(Invocation.java:152)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.toString(HBaseServer.java:304)
> >> >         at java.lang.String.valueOf(String.java:2826)
> >> >         at java.lang.StringBuilder.append(StringBuilder.java:115)
> >> >
> >>
> >
> >
>

Reply via email to