Re: How to generate a large dataset quickly.

Guillermo Ortiz Mon, 14 Apr 2014 11:22:42 -0700

Are there some benchmark about how long could it takes to insert data in
HBase to have a reference?
The output of my Mapper has 3.2mill. output. So, I execute 3.2Mill of Put's
in HBase.


Well, data has to be copied and sent to the reducers, but with a network of
1Gb it shouldn't take too much time. I'll check Ganglia.


2014-04-14 18:16 GMT+02:00 Ted Yu <[email protected]>:

> I looked at revision history for HFileOutputFormat.java
> There was one patch, HBASE-8949, which went into 0.94.11 but it shouldn't
> affect throughput much.
>
> If you can use ganglia (or some similar tool) to pinpoint what caused the
> low ingest rate, that would give us more clue.
>
> BTW Is upgrading to newer release, such as 0.98.1 (which contains
> HBASE-8755), an option for you ?
>
> Cheers
>
>
> On Mon, Apr 14, 2014 at 5:41 AM, Guillermo Ortiz <[email protected]
> >wrote:
>
> > I'm using. 0.94.6-cdh4.4.0,
> >
> > I use the bulkload:
> > FileInputFormat.addInputPath(job, new Path(INPUT_FOLDER));
> > FileOutputFormat.setOutputPath(job, hbasePath);
> > HTable table = new HTable(jConf, HBASE_TABLE);
> > HFileOutputFormat.configureIncrementalLoad(job, table);
> >
> > It seems that it takes really long time when it starts to execute the
> Puts
> > to HBase in the reduce phase.
> >
> >
> >
> > 2014-04-14 14:35 GMT+02:00 Ted Yu <[email protected]>:
> >
> > > Which hbase release did you run mapreduce job ?
> > >
> > > Cheers
> > >
> > > On Apr 14, 2014, at 4:50 AM, Guillermo Ortiz <[email protected]>
> > wrote:
> > >
> > > > I want to create a large dateset for HBase with different versions
> and
> > > > number of rows. It's about 10M rows and 100 versions to do some
> > > benchmarks.
> > > >
> > > > What's the fastest way to create it?? I'm generating the dataset
> with a
> > > > Mapreduce of 100.000rows and 10verions. It takes 17minutes and size
> > > around
> > > > 7Gb. I don't know if I could do it quickly. The bottleneck is when
> > > > MapReduces write the output and when transfer the output to the
> > Reduces.
> > >
> >
>

Re: How to generate a large dataset quickly.

Reply via email to