you are right. previous I bulkload one folder for experiment which is realy
fast. and next time bulkload cause split takes longer.
I know why this happen: we have many txt file. and I launch each importtsv mr
task for every txt file.
the result of each mr task generated ordered key-range HFile. but all HFile in
global not ordered!
Our row key is md5, and total records has 100 billion, 6TB. and each original
file size range from 100MB to 100GB.
that’s why I launch many mr task parallel. and that’s problem occurred!
Although I create pre-split region with `{NUMREGIONS = 16, SPLITALGO =
'HexStringSplit’}`
the way to figure out currently is just use only one MR importtsv job.
and bulkload will reduce global ordered HFile to satisify hbase’s key-range.
and I also modify pre-split key-range to 000-fff(totally 16*16*16=4096 regions)
But as you know, original txt file is too large, not only map task number is
too large, but also reduce task number large.
and this may also cause long time to finish.
Is there any way to store such huge data to hbase quickly?
I have also check cassandra and other kv store. But the first must step to read
original large txt file also too slow.
tks, qihuang.zheng
原始邮件
发件人:[email protected]
收件人:[email protected]
发送时间:2015年12月23日(周三) 16:52
主题:Re:completebulkload not mv or rename but copy and split manyattempt times
this is because the table region changes, not match with the regions when you
get the HFiles if the bulkload process is over, the files should be moved to
hbase i think it is better to delete all hfiles and dirs when the bulkload
over. At 2015-12-23 16:35:10, "qihuang.zheng" [email protected]
wrote: I Have a HFile generate by importtsv, the file is really large, from
100mb to 10G. I have changed hbase.hregion.max.filesize to 50GB(53687091200).
also specify src CanonicalServiceName same with hbase. hbase
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
hdfs://tdhdfs/user/tongdun/id_hbase/1 data.md5_id2 HADOOP_CLASSPATH=`hbase
classpath` hadoop jar hbase-1.0.2/lib/hbase-server-1.0.2.jar completebulkload
/user/tongdun/id_hbase/1 data.md5_id2 But both completebulkload and
LoadIncrementalHFiles did't just mv/rename hfile expected. but instead copy and
split hfile happening, which take long time. the logSplit occured while
grouping HFiles, retry attempt XXXwill create child _tmp dir one by one level.
2015-12-23 15:52:04,909 INFO [LoadIncrementalHFiles-0] hfile.CacheConfig:
CacheConfig:disabled 2015-12-23 15:52:05,006 INFO [LoadIncrementalHFiles-0]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae
first=f6eb30074a52ebb8c5f52ed1c85c2f0d last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:52:05,007 INFO [LoadIncrementalHFiles-0]
mapreduce.LoadIncrementalHFiles: HFile at
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae no
longer fits inside a single region. Splitting... 2015-12-23 15:53:38,639 INFO
[LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: Successfully split
into new HFiles
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.bottom
and
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top
2015-12-23 15:53:39,173 INFO [main] mapreduce.LoadIncrementalHFiles: Split
occured while grouping HFiles, retry attempt 1 with 2 files remaining to group
or split 2015-12-23 15:53:39,186 INFO [LoadIncrementalHFiles-1]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.bottom
first=f6eb30074a52ebb8c5f52ed1c85c2f0d last=f733d2c504f22f71b191014d72e4d124
2015-12-23 15:53:39,188 INFO [LoadIncrementalHFiles-2]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top
first=f733d2c6407f5758e860195b6d2c10c1 last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:53:39,189 INFO [LoadIncrementalHFiles-2]
mapreduce.LoadIncrementalHFiles: HFile at
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top
no longer fits inside a single region. Splitting... 2015-12-23 15:54:27,722
INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: Successfully
split into new HFiles
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.bottom
and
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top
2015-12-23 15:54:28,557 INFO [main] mapreduce.LoadIncrementalHFiles: Split
occured while grouping HFiles, retry attempt 2 with 2 files remaining to group
or split 2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-4]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.bottom
first=f733d2c6407f5758e860195b6d2c10c1 last=f77c7d357a76ff92bb16ec1ef79f31fb
2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-5]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top
first=f77c7d3915c9a8b71c83c414aabd587d last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-5]
mapreduce.LoadIncrementalHFiles: HFile at
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top
no longer fits inside a single region. Splitting... 2015-12-23 15:55:08,992
INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: Successfully
split into new HFiles
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.bottom
and
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top
2015-12-23 15:55:09,424 INFO [main] mapreduce.LoadIncrementalHFiles: Split
occured while grouping HFiles, retry attempt 3 with 2 files remaining to group
or split 2015-12-23 15:55:09,431 INFO [LoadIncrementalHFiles-7]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.bottom
first=f77c7d3915c9a8b71c83c414aabd587d last=f7c525a83ee19ea166414e972c5d5541
2015-12-23 15:55:09,433 INFO [LoadIncrementalHFiles-8]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top
first=f7c525aa2ec661c1c0707b02d1c4b4b3 last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:55:09,433 INFO [LoadIncrementalHFiles-8]
mapreduce.LoadIncrementalHFiles: HFile at
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top
no longer fits inside a single region. Splitting... 2015-12-23 15:55:42,165
INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: Successfully
split into new HFiles
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.bottom
and
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top
2015-12-23 15:55:42,490 INFO [main] mapreduce.LoadIncrementalHFiles: Split
occured while grouping HFiles, retry attempt 4 with 2 files remaining to group
or split 2015-12-23 15:55:42,498 INFO [LoadIncrementalHFiles-10]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.bottom
first=f7c525aa2ec661c1c0707b02d1c4b4b3 last=f80dcce8a4a14be406ddd1bdebc2eda2
2015-12-23 15:55:42,502 INFO [LoadIncrementalHFiles-11]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top
first=f80dccecf159d4999cb8e17446103d72 last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:55:42,502 INFO [LoadIncrementalHFiles-11]
mapreduce.LoadIncrementalHFiles: HFile at
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top
no longer fits inside a single region. Splitting... 2015-12-23 15:56:09,560
INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: Successfully
split into new HFiles
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.bottom
and
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top
2015-12-23 15:56:09,933 INFO [main] mapreduce.LoadIncrementalHFiles: Split
occured while grouping HFiles, retry attempt 5 with 2 files remaining to group
or split 2015-12-23 15:56:09,942 INFO [LoadIncrementalHFiles-13]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.bottom
first=f80dccecf159d4999cb8e17446103d72 last=f85673f473ead63c89e96c83b2058ca7
2015-12-23 15:56:09,943 INFO [LoadIncrementalHFiles-14]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top
first=f85673fde3138dac07ce08881c9d0ccc last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:56:09,944 INFO [LoadIncrementalHFiles-14]
mapreduce.LoadIncrementalHFiles: HFile at
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top
no longer fits inside a single region. Splitting... 2015-12-23 15:56:30,890
INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: Successfully
split into new HFiles
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.bottom
and
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top
2015-12-23 15:56:31,145 INFO [main] mapreduce.LoadIncrementalHFiles: Split
occured while grouping HFiles, retry attempt 6 with 2 files remaining to group
or split 2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-16]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.bottom
first=f85673fde3138dac07ce08881c9d0ccc last=f89f12a56b5af206188639f736877563
2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-17]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top
first=f89f12a59e4a9c9bcbb42d0504318e25 last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-17]
mapreduce.LoadIncrementalHFiles: HFile at
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top
no longer fits inside a single region. Splitting... 2015-12-23 15:56:44,959
INFO [LoadIncrementalHFiles-17] mapreduce.LoadIncrementalHFiles: Successfully
split into new HFiles
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.bottom
and
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.top
2015-12-23 15:56:46,826 INFO [main] mapreduce.LoadIncrementalHFiles: Split
occured while grouping HFiles, retry attempt 7 with 2 files remaining to group
or split 2015-12-23 15:56:46,832 INFO [LoadIncrementalHFiles-19]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.bottom
first=f89f12a59e4a9c9bcbb42d0504318e25 last=f8e7bc423ca4799459898439bf0f68b2
2015-12-23 15:56:46,833 INFO [LoadIncrementalHFiles-20]
mapreduce.LoadIncrementalHFiles: Trying to load
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.top
first=f8e7bc4bc8c2e7eac7f7e31bc116f8e0 last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:56:46,930 INFO [main]
client.ConnectionManager$HConnectionImplementation: Closing master protocol:
MasterService 2015-12-23 15:56:46,931 INFO [main]
client.ConnectionManager$HConnectionImplementation: Closing zookeeper
sessionid=0x3515d529acedbaa 2015-12-23 15:56:46,960 INFO [main]
zookeeper.ZooKeeper: Session: 0x3515d529acedbaa closed 2015-12-23 15:56:46,960
INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down even though
the process finished, original hfile did't delete. I was wondering why
mv/rename command not happend. [qihuang.zheng@spark047213 ~]$ hadoop fs -du -h
/user/tongdun/id_hbase/1/id/ 3.3 G
/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae 6.0 G
/user/tongdun/id_hbase/1/id/_tmp tks, qihuang.zheng