you are right. previous I bulkload one folder for experiment which is realy 
fast. and next time bulkload cause split takes longer.
I know why this happen: we have many txt file. and I launch each importtsv mr 
task for every txt file.
the result of each mr task generated ordered key-range HFile. but all HFile in 
global not ordered!


Our row key is md5, and total records has 100 billion, 6TB. and each original 
file size range from 100MB to 100GB. 
that’s why I launch many mr task parallel. and that’s problem occurred!
Although I create pre-split region with `{NUMREGIONS = 16, SPLITALGO = 
'HexStringSplit’}`


the way to figure out currently is just use only one MR importtsv job.
and bulkload will reduce global ordered HFile to satisify hbase’s key-range.
and I also modify pre-split key-range to 000-fff(totally 16*16*16=4096 regions)


But as you know, original txt file is too large, not only map task number is 
too large, but also reduce task number large.
and this may also cause long time to finish.


Is there any way to store such huge data to hbase quickly?
I have also check cassandra and other kv store. But the first must step to read 
original large txt file also too slow. 






tks, qihuang.zheng


原始邮件
发件人:[email protected]
收件人:[email protected]
发送时间:2015年12月23日(周三) 16:52
主题:Re:completebulkload not mv or rename but copy and split manyattempt times


this is because the table region changes, not match with the regions when you 
get the HFiles if the bulkload process is over, the files should be moved to 
hbase i think it is better to delete all hfiles and dirs when the bulkload 
over. At 2015-12-23 16:35:10, "qihuang.zheng" [email protected] 
wrote: I Have a HFile generate by importtsv, the file is really large, from 
100mb to 10G. I have changed hbase.hregion.max.filesize to 50GB(53687091200). 
also specify src CanonicalServiceName same with hbase. hbase 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles 
hdfs://tdhdfs/user/tongdun/id_hbase/1 data.md5_id2 HADOOP_CLASSPATH=`hbase 
classpath` hadoop jar hbase-1.0.2/lib/hbase-server-1.0.2.jar completebulkload 
/user/tongdun/id_hbase/1 data.md5_id2 But both completebulkload and 
LoadIncrementalHFiles did't just mv/rename hfile expected. but instead copy and 
split hfile happening, which take long time. the logSplit occured while 
grouping HFiles, retry attempt XXXwill create child _tmp dir one by one level. 
2015-12-23 15:52:04,909 INFO [LoadIncrementalHFiles-0] hfile.CacheConfig: 
CacheConfig:disabled 2015-12-23 15:52:05,006 INFO [LoadIncrementalHFiles-0] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae 
first=f6eb30074a52ebb8c5f52ed1c85c2f0d last=f93061a29e9458fada2521ffe45ca385 
2015-12-23 15:52:05,007 INFO [LoadIncrementalHFiles-0] 
mapreduce.LoadIncrementalHFiles: HFile at 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae no 
longer fits inside a single region. Splitting... 2015-12-23 15:53:38,639 INFO 
[LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: Successfully split 
into new HFiles 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.bottom
 and 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top
 2015-12-23 15:53:39,173 INFO [main] mapreduce.LoadIncrementalHFiles: Split 
occured while grouping HFiles, retry attempt 1 with 2 files remaining to group 
or split 2015-12-23 15:53:39,186 INFO [LoadIncrementalHFiles-1] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.bottom
 first=f6eb30074a52ebb8c5f52ed1c85c2f0d last=f733d2c504f22f71b191014d72e4d124 
2015-12-23 15:53:39,188 INFO [LoadIncrementalHFiles-2] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top
 first=f733d2c6407f5758e860195b6d2c10c1 last=f93061a29e9458fada2521ffe45ca385 
2015-12-23 15:53:39,189 INFO [LoadIncrementalHFiles-2] 
mapreduce.LoadIncrementalHFiles: HFile at 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top
 no longer fits inside a single region. Splitting... 2015-12-23 15:54:27,722 
INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: Successfully 
split into new HFiles 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.bottom
 and 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top
 2015-12-23 15:54:28,557 INFO [main] mapreduce.LoadIncrementalHFiles: Split 
occured while grouping HFiles, retry attempt 2 with 2 files remaining to group 
or split 2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-4] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.bottom
 first=f733d2c6407f5758e860195b6d2c10c1 last=f77c7d357a76ff92bb16ec1ef79f31fb 
2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-5] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top
 first=f77c7d3915c9a8b71c83c414aabd587d last=f93061a29e9458fada2521ffe45ca385 
2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-5] 
mapreduce.LoadIncrementalHFiles: HFile at 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top
 no longer fits inside a single region. Splitting... 2015-12-23 15:55:08,992 
INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: Successfully 
split into new HFiles 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.bottom
 and 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top
 2015-12-23 15:55:09,424 INFO [main] mapreduce.LoadIncrementalHFiles: Split 
occured while grouping HFiles, retry attempt 3 with 2 files remaining to group 
or split 2015-12-23 15:55:09,431 INFO [LoadIncrementalHFiles-7] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.bottom
 first=f77c7d3915c9a8b71c83c414aabd587d last=f7c525a83ee19ea166414e972c5d5541 
2015-12-23 15:55:09,433 INFO [LoadIncrementalHFiles-8] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top
 first=f7c525aa2ec661c1c0707b02d1c4b4b3 last=f93061a29e9458fada2521ffe45ca385 
2015-12-23 15:55:09,433 INFO [LoadIncrementalHFiles-8] 
mapreduce.LoadIncrementalHFiles: HFile at 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top
 no longer fits inside a single region. Splitting... 2015-12-23 15:55:42,165 
INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: Successfully 
split into new HFiles 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.bottom
 and 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top
 2015-12-23 15:55:42,490 INFO [main] mapreduce.LoadIncrementalHFiles: Split 
occured while grouping HFiles, retry attempt 4 with 2 files remaining to group 
or split 2015-12-23 15:55:42,498 INFO [LoadIncrementalHFiles-10] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.bottom
 first=f7c525aa2ec661c1c0707b02d1c4b4b3 last=f80dcce8a4a14be406ddd1bdebc2eda2 
2015-12-23 15:55:42,502 INFO [LoadIncrementalHFiles-11] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top
 first=f80dccecf159d4999cb8e17446103d72 last=f93061a29e9458fada2521ffe45ca385 
2015-12-23 15:55:42,502 INFO [LoadIncrementalHFiles-11] 
mapreduce.LoadIncrementalHFiles: HFile at 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top
 no longer fits inside a single region. Splitting... 2015-12-23 15:56:09,560 
INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: Successfully 
split into new HFiles 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.bottom
 and 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top
 2015-12-23 15:56:09,933 INFO [main] mapreduce.LoadIncrementalHFiles: Split 
occured while grouping HFiles, retry attempt 5 with 2 files remaining to group 
or split 2015-12-23 15:56:09,942 INFO [LoadIncrementalHFiles-13] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.bottom
 first=f80dccecf159d4999cb8e17446103d72 last=f85673f473ead63c89e96c83b2058ca7 
2015-12-23 15:56:09,943 INFO [LoadIncrementalHFiles-14] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top
 first=f85673fde3138dac07ce08881c9d0ccc last=f93061a29e9458fada2521ffe45ca385 
2015-12-23 15:56:09,944 INFO [LoadIncrementalHFiles-14] 
mapreduce.LoadIncrementalHFiles: HFile at 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top
 no longer fits inside a single region. Splitting... 2015-12-23 15:56:30,890 
INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: Successfully 
split into new HFiles 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.bottom
 and 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top
 2015-12-23 15:56:31,145 INFO [main] mapreduce.LoadIncrementalHFiles: Split 
occured while grouping HFiles, retry attempt 6 with 2 files remaining to group 
or split 2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-16] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.bottom
 first=f85673fde3138dac07ce08881c9d0ccc last=f89f12a56b5af206188639f736877563 
2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-17] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top
 first=f89f12a59e4a9c9bcbb42d0504318e25 last=f93061a29e9458fada2521ffe45ca385 
2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-17] 
mapreduce.LoadIncrementalHFiles: HFile at 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top
 no longer fits inside a single region. Splitting... 2015-12-23 15:56:44,959 
INFO [LoadIncrementalHFiles-17] mapreduce.LoadIncrementalHFiles: Successfully 
split into new HFiles 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.bottom
 and 
hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.top
 2015-12-23 15:56:46,826 INFO [main] mapreduce.LoadIncrementalHFiles: Split 
occured while grouping HFiles, retry attempt 7 with 2 files remaining to group 
or split 2015-12-23 15:56:46,832 INFO [LoadIncrementalHFiles-19] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.bottom
 first=f89f12a59e4a9c9bcbb42d0504318e25 last=f8e7bc423ca4799459898439bf0f68b2 
2015-12-23 15:56:46,833 INFO [LoadIncrementalHFiles-20] 
mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.top
 first=f8e7bc4bc8c2e7eac7f7e31bc116f8e0 last=f93061a29e9458fada2521ffe45ca385 
2015-12-23 15:56:46,930 INFO [main] 
client.ConnectionManager$HConnectionImplementation: Closing master protocol: 
MasterService 2015-12-23 15:56:46,931 INFO [main] 
client.ConnectionManager$HConnectionImplementation: Closing zookeeper 
sessionid=0x3515d529acedbaa 2015-12-23 15:56:46,960 INFO [main] 
zookeeper.ZooKeeper: Session: 0x3515d529acedbaa closed 2015-12-23 15:56:46,960 
INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down even though 
the process finished, original hfile did't delete. I was wondering why 
mv/rename command not happend. [qihuang.zheng@spark047213 ~]$ hadoop fs -du -h 
/user/tongdun/id_hbase/1/id/ 3.3 G 
/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae 6.0 G 
/user/tongdun/id_hbase/1/id/_tmp     tks, qihuang.zheng

Reply via email to