Filed HBASE-18381 <https://issues.apache.org/jira/browse/HBASE-18381> for this.
2017-07-14 14:34 GMT+02:00 Daniel Jeliński <[email protected]>: > Hi Ted, > Thanks for looking into this. I'm not an admin of this cluster, so I > probably won't be able to help with testing. > > Just to clarify, the client code I sent succeeds here. HBase region server > crashes later when flushing WAL. Then the region is failed over to a new > server, which also crashes; every crash leaves a 64MB temp file, which adds > up quickly, since the region servers are restarted automatically. > I'll put that in a JIRA. > Regards, > Daniel > > > > 2017-07-14 14:26 GMT+02:00 Ted Yu <[email protected]>: > >> I put up a quick test (need to find better place) exercising the snippet >> you posted: >> >> https://pastebin.com/FNh245LD >> >> I got past where "written large put" is logged. >> >> Can you log an hbase JIRA ? >> >> On Fri, Jul 14, 2017 at 4:47 AM, Ted Yu <[email protected]> wrote: >> >> > If possible, can you try the following fix ? >> > >> > Thanks >> > >> > diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfil >> e/HFile.java >> > b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfil >> e/HFile.java >> > index feddc2c..ea01f76 100644 >> > --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/ >> > hfile/HFile.java >> > +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/ >> > hfile/HFile.java >> > @@ -834,7 +834,9 @@ public class HFile { >> > int read = in.read(pbuf); >> > if (read != pblen) throw new IOException("read=" + read + ", >> > wanted=" + pblen); >> > if (ProtobufUtil.isPBMagicPrefix(pbuf)) { >> > - parsePB(HFileProtos.FileInfoProto.parseDelimitedFrom(in)); >> > + HFileProtos.FileInfoProto.Builder builder = >> > HFileProtos.FileInfoProto.newBuilder(); >> > + ProtobufUtil.mergeDelimitedFrom(builder, in); >> > + parsePB(builder.build()); >> > } else { >> > if (in.markSupported()) { >> > in.reset(); >> > >> > On Fri, Jul 14, 2017 at 4:01 AM, Daniel Jeliński <[email protected]> >> > wrote: >> > >> >> Hello, >> >> While playing with MOB feature (on HBase 1.2.0-cdh5.10.0), I >> accidentally >> >> created a table that killed every region server it was assigned to. I >> >> can't >> >> test it with other revisions, and I couldn't find it in JIRA. >> >> >> >> I'm reporting it here, let me know if there's a better place. >> >> >> >> Gist of code used to create the table: >> >> >> >> private String table = "poisonPill"; >> >> private byte[] familyBytes = Bytes.toBytes("cf"); >> >> private void createTable(Connection conn) throws IOException { >> >> Admin hbase_admin = conn.getAdmin(); >> >> HTableDescriptor htable = new HTableDescriptor(TableName.val >> >> ueOf(table)); >> >> HColumnDescriptor hfamily = new HColumnDescriptor(familyBytes); >> >> hfamily.setMobEnabled(true); >> >> htable.setConfiguration("hfile.format.version","3"); >> >> htable.addFamily(hfamily); >> >> hbase_admin.createTable(htable); >> >> } >> >> private void killTable(Connection conn) throws IOException { >> >> Table tbl = conn.getTable(TableName.valueOf(table)); >> >> byte[] data = new byte[1<<26]; >> >> byte[] smalldata = new byte[0]; >> >> Put put = new Put(Bytes.toBytes("1")); >> >> put.addColumn(familyBytes, data, smalldata); >> >> tbl.put(put); >> >> } >> >> >> >> Resulting exception on region server: >> >> >> >> 2017-07-11 09:34:54,704 WARN >> >> org.apache.hadoop.hbase.regionserver.HStore: Failed validating store >> >> file hdfs://sandbox/hbase/data/default/poisonPill/f82e20f32302dfd >> >> d95c89ecc3be5a211/.tmp/7858d223eddd4199ad220fc77bb612eb, >> >> retrying num=0 >> >> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem >> >> reading HFile Trailer from file >> >> hdfs://sandbox/hbase/data/default/poisonPill/f82e20f32302dfd >> >> d95c89ecc3be5a211/.tmp/7858d223eddd4199ad220fc77bb612eb >> >> at org.apache.hadoop.hbase.io.hfi >> le.HFile.pickReaderVersion(HFi >> >> le.java:497) >> >> at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile. >> >> java:525) >> >> at org.apache.hadoop.hbase.region >> server.StoreFile$Reader.<init> >> >> (StoreFile.java:1105) >> >> at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open( >> >> StoreFileInfo.java:265) >> >> at org.apache.hadoop.hbase.region >> server.StoreFile.open(StoreFil >> >> e.java:404) >> >> at org.apache.hadoop.hbase.region >> server.StoreFile.createReader( >> >> StoreFile.java:509) >> >> at org.apache.hadoop.hbase.region >> server.StoreFile.createReader( >> >> StoreFile.java:499) >> >> at org.apache.hadoop.hbase.region >> server.HStore.createStoreFileA >> >> ndReader(HStore.java:675) >> >> at org.apache.hadoop.hbase.region >> server.HStore.createStoreFileA >> >> ndReader(HStore.java:667) >> >> at org.apache.hadoop.hbase.region >> server.HStore.validateStoreFil >> >> e(HStore.java:1746) >> >> at org.apache.hadoop.hbase.regionserver.HStore.flushCache( >> >> HStore.java:942) >> >> at org.apache.hadoop.hbase.region >> server.HStore$StoreFlusherImpl >> >> .flushCache(HStore.java:2299) >> >> at org.apache.hadoop.hbase.region >> server.HRegion.internalFlushCa >> >> cheAndCommit(HRegion.java:2372) >> >> at org.apache.hadoop.hbase.region >> server.HRegion.internalFlushca >> >> che(HRegion.java:2102) >> >> at org.apache.hadoop.hbase.region >> server.HRegion.replayRecovered >> >> Edits(HRegion.java:4139) >> >> at org.apache.hadoop.hbase.region >> server.HRegion.replayRecovered >> >> EditsIfAny(HRegion.java:3934) >> >> at org.apache.hadoop.hbase.region >> server.HRegion.initializeRegio >> >> nInternals(HRegion.java:828) >> >> at org.apache.hadoop.hbase.regionserver.HRegion.initialize( >> >> HRegion.java:799) >> >> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion( >> >> HRegion.java:6480) >> >> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion( >> >> HRegion.java:6441) >> >> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion( >> >> HRegion.java:6412) >> >> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion( >> >> HRegion.java:6368) >> >> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion( >> >> HRegion.java:6319) >> >> at org.apache.hadoop.hbase.region >> server.handler.OpenRegionHandl >> >> er.openRegion(OpenRegionHandler.java:362) >> >> at org.apache.hadoop.hbase.region >> server.handler.OpenRegionHandl >> >> er.process(OpenRegionHandler.java:129) >> >> at org.apache.hadoop.hbase.execut >> or.EventHandler.run(EventHandl >> >> er.java:129) >> >> at java.util.concurrent.ThreadPoo >> lExecutor.runWorker(ThreadPool >> >> Executor.java:1142) >> >> at java.util.concurrent.ThreadPoo >> lExecutor$Worker.run(ThreadPoo >> >> lExecutor.java:617) >> >> at java.lang.Thread.run(Thread.java:745) >> >> Caused by: com.google.protobuf.InvalidProtocolBufferException: >> >> Protocol message was too large. May be malicious. Use >> >> CodedInputStream.setSizeLimit() to increase the size limit. >> >> at com.google.protobuf.InvalidProtocolBufferException. >> >> sizeLimitExceeded(InvalidProtocolBufferException.java:110) >> >> at com.google.protobuf.CodedInput >> Stream.refillBuffer(CodedInput >> >> Stream.java:755) >> >> at com.google.protobuf.CodedInput >> Stream.isAtEnd(CodedInputStrea >> >> m.java:701) >> >> at com.google.protobuf.CodedInput >> Stream.readTag(CodedInputStrea >> >> m.java:99) >> >> at org.apache.hadoop.hbase.protobuf.generated.HFileProtos$ >> >> FileInfoProto.<init>(HFileProtos.java:82) >> >> at org.apache.hadoop.hbase.protobuf.generated.HFileProtos$ >> >> FileInfoProto.<init>(HFileProtos.java:46) >> >> at org.apache.hadoop.hbase.protobuf.generated.HFileProtos$ >> >> FileInfoProto$1.parsePartialFrom(HFileProtos.java:135) >> >> at org.apache.hadoop.hbase.protobuf.generated.HFileProtos$ >> >> FileInfoProto$1.parsePartialFrom(HFileProtos.java:130) >> >> at com.google.protobuf.AbstractPa >> rser.parsePartialFrom(Abstract >> >> Parser.java:200) >> >> at com.google.protobuf.AbstractPa >> rser.parsePartialDelimitedFrom >> >> (AbstractParser.java:241) >> >> at com.google.protobuf.AbstractPa >> rser.parseDelimitedFrom(Abstra >> >> ctParser.java:253) >> >> at com.google.protobuf.AbstractPa >> rser.parseDelimitedFrom(Abstra >> >> ctParser.java:259) >> >> at com.google.protobuf.AbstractPa >> rser.parseDelimitedFrom(Abstra >> >> ctParser.java:49) >> >> at org.apache.hadoop.hbase.protobuf.generated.HFileProtos$ >> >> FileInfoProto.parseDelimitedFrom(HFileProtos.java:297) >> >> at org.apache.hadoop.hbase.io.hfile.HFile$FileInfo.read(HFile. >> >> java:752) >> >> at org.apache.hadoop.hbase.io.hfi >> le.HFileReaderV2.<init>(HFileR >> >> eaderV2.java:161) >> >> at org.apache.hadoop.hbase.io.hfi >> le.HFileReaderV3.<init>(HFileR >> >> eaderV3.java:77) >> >> at org.apache.hadoop.hbase.io.hfi >> le.HFile.pickReaderVersion(HFi >> >> le.java:487) >> >> ... 28 more >> >> >> >> After a number of tries, RegionServer service is aborted. >> >> >> >> I wasn't able to reproduce this issue with MOB disabled. >> >> >> >> Regards, >> >> >> >> Daniel >> >> >> > >> > >> > >
