completebulkload problems

Billy Pearson Wed, 02 May 2012 18:07:08 -0700

ok I got a MR job I am trying to import into hbase it works with small inputloads with in secs and the command line returnswhen Iadd more input to the map reduce job the completebulkload hangs onthe command line never returning.

when I run a large completebulkload it keeps trying to copy the input filesmany times


cut out of regionserver log

2012-05-02 18:56:11,629 INFO org.apache.hadoop.hbase.regionserver.Store:File hdfs://node1/bulkoutput/idjuice/436678602442940864 on differentfilesystem than destination store - moving to this filesystem.2012-05-02 18:56:12,287 INFO org.apache.hadoop.hbase.regionserver.Store:Copied to temporary path on dst filesystem:hdfs://node1.hadoop.compspy.com/hbase/Repo/751e4b3c4a27e680e8d481be3e11507e/.tmp/69577184275664971052012-05-02 18:56:12,288 INFO org.apache.hadoop.hbase.regionserver.Store:Renaming bulk load filehdfs://node1.hadoop.compspy.com/hbase/Repo/751e4b3c4a27e680e8d481be3e11507e/.tmp/6957718427566497105tohdfs://node1.hadoop.compspy.com/hbase/Repo/751e4b3c4a27e680e8d481be3e11507e/idjuice/4992073554571623622012-05-02 18:56:12,310 INFO org.apache.hadoop.hbase.regionserver.Store:Moved hfilehdfs://node1.hadoop.compspy.com/hbase/Repo/751e4b3c4a27e680e8d481be3e11507e/.tmp/6957718427566497105into store directoryhdfs://node1.hadoop.compspy.com/hbase/Repo/751e4b3c4a27e680e8d481be3e11507e/idjuice- updating store file list.


I can see the same input file load over and over in the logs

The final size with TableMapReduceUtil.initTableReducerJob of the table wasaround 680m on all the stores total reported by the regionserverI ran the import on a larger dataset same data format same mr job just moreinput and I killed the regionserver and truncate the table and restart thewhole cluster

the after the table gets to 5GB

the mapreduce job maps output is set byHFileOutputFormat.configureIncrementalLoad(job, table);

the only difference I can see in the two sets of files is one has 1 file toimport per column family and the larget one has 8 file for each columnfamily.


Any suggestions on where the problem could be?

kind of odd that the small data input would work so easy and the large onewould just run out of control.


running
hbase-0.90.4-cdh3u3.jar
export HBASE_CLASSPATH=/etc/hbase/conf:/etc/hadoop/conf:/etc/zookeeper

completebulkload problems

Reply via email to