re: "So, I execute 3.2Mill of Put¹s in HBase." There will be 3.2 million Puts, but they won¹t be sent over 1 at a time if autoFlush on Htable is false. By default, htable should be using a 2mb write buffer, and then it groups the Puts by RegionServer.
On 4/14/14, 2:21 PM, "Guillermo Ortiz" <[email protected]> wrote: >Are there some benchmark about how long could it takes to insert data in >HBase to have a reference? >The output of my Mapper has 3.2mill. output. So, I execute 3.2Mill of >Put's >in HBase. > >Well, data has to be copied and sent to the reducers, but with a network >of >1Gb it shouldn't take too much time. I'll check Ganglia. > > >2014-04-14 18:16 GMT+02:00 Ted Yu <[email protected]>: > >> I looked at revision history for HFileOutputFormat.java >> There was one patch, HBASE-8949, which went into 0.94.11 but it >>shouldn't >> affect throughput much. >> >> If you can use ganglia (or some similar tool) to pinpoint what caused >>the >> low ingest rate, that would give us more clue. >> >> BTW Is upgrading to newer release, such as 0.98.1 (which contains >> HBASE-8755), an option for you ? >> >> Cheers >> >> >> On Mon, Apr 14, 2014 at 5:41 AM, Guillermo Ortiz <[email protected] >> >wrote: >> >> > I'm using. 0.94.6-cdh4.4.0, >> > >> > I use the bulkload: >> > FileInputFormat.addInputPath(job, new Path(INPUT_FOLDER)); >> > FileOutputFormat.setOutputPath(job, hbasePath); >> > HTable table = new HTable(jConf, HBASE_TABLE); >> > HFileOutputFormat.configureIncrementalLoad(job, table); >> > >> > It seems that it takes really long time when it starts to execute the >> Puts >> > to HBase in the reduce phase. >> > >> > >> > >> > 2014-04-14 14:35 GMT+02:00 Ted Yu <[email protected]>: >> > >> > > Which hbase release did you run mapreduce job ? >> > > >> > > Cheers >> > > >> > > On Apr 14, 2014, at 4:50 AM, Guillermo Ortiz <[email protected]> >> > wrote: >> > > >> > > > I want to create a large dateset for HBase with different versions >> and >> > > > number of rows. It's about 10M rows and 100 versions to do some >> > > benchmarks. >> > > > >> > > > What's the fastest way to create it?? I'm generating the dataset >> with a >> > > > Mapreduce of 100.000rows and 10verions. It takes 17minutes and >>size >> > > around >> > > > 7Gb. I don't know if I could do it quickly. The bottleneck is when >> > > > MapReduces write the output and when transfer the output to the >> > Reduces. >> > > >> > >>
