Re: How to generate a large dataset quickly.

Ted Yu Mon, 14 Apr 2014 09:26:29 -0700

I looked at revision history for HFileOutputFormat.java
There was one patch, HBASE-8949, which went into 0.94.11 but it shouldn't
affect throughput much.


If you can use ganglia (or some similar tool) to pinpoint what caused the
low ingest rate, that would give us more clue.

BTW Is upgrading to newer release, such as 0.98.1 (which contains
HBASE-8755), an option for you ?

Cheers


On Mon, Apr 14, 2014 at 5:41 AM, Guillermo Ortiz <[email protected]>wrote:

> I'm using. 0.94.6-cdh4.4.0,
>
> I use the bulkload:
> FileInputFormat.addInputPath(job, new Path(INPUT_FOLDER));
> FileOutputFormat.setOutputPath(job, hbasePath);
> HTable table = new HTable(jConf, HBASE_TABLE);
> HFileOutputFormat.configureIncrementalLoad(job, table);
>
> It seems that it takes really long time when it starts to execute the Puts
> to HBase in the reduce phase.
>
>
>
> 2014-04-14 14:35 GMT+02:00 Ted Yu <[email protected]>:
>
> > Which hbase release did you run mapreduce job ?
> >
> > Cheers
> >
> > On Apr 14, 2014, at 4:50 AM, Guillermo Ortiz <[email protected]>
> wrote:
> >
> > > I want to create a large dateset for HBase with different versions and
> > > number of rows. It's about 10M rows and 100 versions to do some
> > benchmarks.
> > >
> > > What's the fastest way to create it?? I'm generating the dataset with a
> > > Mapreduce of 100.000rows and 10verions. It takes 17minutes and size
> > around
> > > 7Gb. I don't know if I could do it quickly. The bottleneck is when
> > > MapReduces write the output and when transfer the output to the
> Reduces.
> >
>

Re: How to generate a large dataset quickly.

Reply via email to