Re: How to generate a large dataset quickly.

Guillermo Ortiz Mon, 14 Apr 2014 05:42:31 -0700

I'm using. 0.94.6-cdh4.4.0,

I use the bulkload:
FileInputFormat.addInputPath(job, new Path(INPUT_FOLDER));
FileOutputFormat.setOutputPath(job, hbasePath);
HTable table = new HTable(jConf, HBASE_TABLE);
HFileOutputFormat.configureIncrementalLoad(job, table);


It seems that it takes really long time when it starts to execute the Puts
to HBase in the reduce phase.



2014-04-14 14:35 GMT+02:00 Ted Yu <[email protected]>:

> Which hbase release did you run mapreduce job ?
>
> Cheers
>
> On Apr 14, 2014, at 4:50 AM, Guillermo Ortiz <[email protected]> wrote:
>
> > I want to create a large dateset for HBase with different versions and
> > number of rows. It's about 10M rows and 100 versions to do some
> benchmarks.
> >
> > What's the fastest way to create it?? I'm generating the dataset with a
> > Mapreduce of 100.000rows and 10verions. It takes 17minutes and size
> around
> > 7Gb. I don't know if I could do it quickly. The bottleneck is when
> > MapReduces write the output and when transfer the output to the Reduces.
>

Re: How to generate a large dataset quickly.

Reply via email to