I'm using. 0.94.6-cdh4.4.0, I use the bulkload: FileInputFormat.addInputPath(job, new Path(INPUT_FOLDER)); FileOutputFormat.setOutputPath(job, hbasePath); HTable table = new HTable(jConf, HBASE_TABLE); HFileOutputFormat.configureIncrementalLoad(job, table);
It seems that it takes really long time when it starts to execute the Puts to HBase in the reduce phase. 2014-04-14 14:35 GMT+02:00 Ted Yu <[email protected]>: > Which hbase release did you run mapreduce job ? > > Cheers > > On Apr 14, 2014, at 4:50 AM, Guillermo Ortiz <[email protected]> wrote: > > > I want to create a large dateset for HBase with different versions and > > number of rows. It's about 10M rows and 100 versions to do some > benchmarks. > > > > What's the fastest way to create it?? I'm generating the dataset with a > > Mapreduce of 100.000rows and 10verions. It takes 17minutes and size > around > > 7Gb. I don't know if I could do it quickly. The bottleneck is when > > MapReduces write the output and when transfer the output to the Reduces. >
