Bulk Load running files multiple times

Austin Heyne Thu, 28 Jun 2018 13:24:26 -0700

Hey again,

I'm running a bulk load on s3 and I'm seeing region servers beinginstructed to load an hfile multiple times. I've seen this behavior twodifferent ways. The first time I saw this was after deciding to bruteforce my way around the problem in HBASE-20774 [1] by just letting thecluster copy the data down from s3 and back up. After letting this runover night the input 15TB of data was now over 60TB in the hbasecatalog. I noticed in the logs that files were being copied multipletimes (I saw around 10 times on some files on a single region server).

After writing a patch for HBASE-20774 and deploying it to the cluster onthe next attempt I'm seeing the same behavior but because when you bulkload from the same FS as the HBase root, HBase renames files rather thancopying them. This is causing FileNotFoundExceptions the next time theregion server is instructed to load the file. I've cherry picked thelogs of an occurrence of this happening off of a region server:

2018-06-27 19:34:18,170 INFO[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]regionserver.HStore: Validating hfile ats3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe forinclusion in store d region z3_v2,\x00\x09\xD9\x00\x00\x00\x00\x00\x00\x00,1530122295326.f6e3567551881fbd48baed14bfe2252b.2018-06-27 19:34:18,236 INFO[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]s3n.S3NativeFileSystem: Opening's3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe'for reading2018-06-27 19:34:45,915 INFO[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]s3n.S3NativeFileSystem: renames3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffes3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_2018-06-27 19:34:59,005 INFO[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]regionserver.HStore: Loaded HFiles3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffeinto store 'd' ass3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_- updating store file list.2018-06-27 19:34:59,101 INFO[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]regionserver.HStore: Successfully loaded store files3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffeinto store d (new location:s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_)2018-06-27 19:40:03,463 INFO[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020]regionserver.HStore: Validating hfile ats3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe forinclusion in store d region z3_v2,\x00\x09\xD9\x00\x00\x00\x00\x00\x00\x00,1530122295326.f6e3567551881fbd48baed14bfe2252b.org.apache.hadoop.io.MultipleIOException: 18 exceptions[java.io.FileNotFoundException: No such file or directory's3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe',....]atorg.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:53)atorg.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:5782)atorg.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2170)atorg.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36615)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)

I'm not sure where the problem is coming from here, if the master isadding files to the queue multiple times or just not removing them. Ifanyone has knowledge on this or have an idea where things are goingwrong it would be very helpful since I'm not super familiar with thecode base.


Thanks,
Austin

[1] https://issues.apache.org/jira/browse/HBASE-20774

Bulk Load running files multiple times

Reply via email to