Sure..Both input & output are HBase tables Input (mapper phase) - scanning a HBase table for all records within time range (using hbase timestamps) Output (reduce phase) - doing a Put to 3 different HBase tables
-----Original Message----- From: Jean-Daniel Cryans <jdcry...@apache.org> To: user@hbase.apache.org Sent: Tue, Oct 5, 2010 11:14 pm Subject: Re: HBase map reduce job timing It'd be more useful if we knew where that data is coming from, and where it's going. Are you scanning HBase and/or writing to it? J-D On Tue, Oct 5, 2010 at 8:05 PM, Venkatesh <vramanatha...@aol.com> wrote: > > > > Sorry..yeah..i've to do some digging to provide some data.. > What sort of data would be helpful? Would stats reported by jobtracker.jsp suffice? I've pasted that in this email.. > I can gather more jvm stats..thanks > > Status: Succeeded > Started at: Tue Oct 05 21:39:58 EDT 2010 > Finished at: Tue Oct 05 22:36:43 EDT 2010 > Finished in: 56mins, 45sec > Job Cleanup: Successful > > > > Kind > % Complete > Num Tasks > Pending > Running > Complete > Killed > Failed/Killed > Task Attempts > > map > 100.00% > > > > > > 565 > 0 > 0 > 565 > 0 > 0 / 11 > > reduce > 100.00% > > > > > > 20 > 0 > 0 > 20 > 0 > 0 / 2 > > > > > > > > Counter > > Map > > Reduce > > Total > > > > Job Counters > > Launched reduce tasks > > 0 > > 0 > > 22 > > > > Rack-local map tasks > > 0 > > 0 > > 66 > > > > Launched map tasks > > 0 > > 0 > > 576 > > > > Data-local map tasks > > 0 > > 0 > > 510 > > > > com.JobRecords > > REDUCE_PHASE_RECORDS > > 0 > > 597,712 > > 597,712 > > > > MAP_PHASE_RECORDS > > 2,534,807 > > 0 > > 2,534,807 > > > > FileSystemCounters > > FILE_BYTES_READ > > 335,845,726 > > 861,146,518 > > 1,196,992,244 > > > > FILE_BYTES_WRITTEN > > 1,197,031,156 > > 861,146,518 > > 2,058,177,674 > > > > Map-Reduce Framework > > Reduce input groups > > 0 > > 597,712 > > 597,712 > > > > Combine output records > > 0 > > 0 > > 0 > > > > Map input records > > 2,534,807 > > 0 > > 2,534,807 > > > > Reduce shuffle bytes > > 0 > > 789,145,342 > > 789,145,342 > > > > Reduce output records > > 0 > > 0 > > 0 > > > > Spilled Records > > 3,522,428 > > 2,534,807 > > 6,057,235 > > > > Map output bytes > > 851,007,170 > > 0 > > 851,007,170 > > > > Map output records > > 2,534,807 > > 0 > > 2,534,807 > > > > Combine input records > > 0 > > 0 > > 0 > > > > Reduce input records > > 0 > > 2,534,807 > > 2,534,807 > > > > > > > > > -----Original Message----- > From: Jean-Daniel Cryans <jdcry...@apache.org> > To: user@hbase.apache.org > Sent: Tue, Oct 5, 2010 10:53 pm > Subject: Re: HBase map reduce job timing > > > I'd love to give you tips, but you didn't provide any data about the > input and output of your job, the kind of hardware you're using, etc. > At this point any suggestion would be a stab in the dark, the best I > can do is pointing to the existing documentation > http://wiki.apache.org/hadoop/PerformanceTuning > > J-D > > On Tue, Oct 5, 2010 at 7:12 PM, Venkatesh <vramanatha...@aol.com> wrote: >> >> >> >> I've a mapreduce job that is taking too long..over an hour..Trying to see > what can a tune >> to to bring it down..One thing I noticed, the job is kicking off >> - 500+ map tasks : 490 of them do not process any records..where as 10 of them > process all the records >> (200 K each..)..Any idea why that would be?... >> >> ..map phase takes about couple of minutes.. >> ..reduce phase takes the rest.. >> >> ..i'll try increasing # of reduce tasks..Open to other other suggestion for > tunables.. >> >> thanks for your input >> venkatesh >> >> >> > > >