Sorry..yeah..i've to do some digging to provide some data..
What sort of data would be helpful? Would stats reported by jobtracker.jsp
suffice? I've pasted that in this email..
I can gather more jvm stats..thanks
Status: Succeeded
Started at: Tue Oct 05 21:39:58 EDT 2010
Finished at: Tue Oct 05 22:36:43 EDT 2010
Finished in: 56mins, 45sec
Job Cleanup: Successful
Kind
% Complete
Num Tasks
Pending
Running
Complete
Killed
Failed/Killed
Task Attempts
map
100.00%
565
0
0
565
0
0 / 11
reduce
100.00%
20
0
0
20
0
0 / 2
Counter
Map
Reduce
Total
Job Counters
Launched reduce tasks
0
0
22
Rack-local map tasks
0
0
66
Launched map tasks
0
0
576
Data-local map tasks
0
0
510
com.JobRecords
REDUCE_PHASE_RECORDS
0
597,712
597,712
MAP_PHASE_RECORDS
2,534,807
0
2,534,807
FileSystemCounters
FILE_BYTES_READ
335,845,726
861,146,518
1,196,992,244
FILE_BYTES_WRITTEN
1,197,031,156
861,146,518
2,058,177,674
Map-Reduce Framework
Reduce input groups
0
597,712
597,712
Combine output records
0
0
0
Map input records
2,534,807
0
2,534,807
Reduce shuffle bytes
0
789,145,342
789,145,342
Reduce output records
0
0
0
Spilled Records
3,522,428
2,534,807
6,057,235
Map output bytes
851,007,170
0
851,007,170
Map output records
2,534,807
0
2,534,807
Combine input records
0
0
0
Reduce input records
0
2,534,807
2,534,807
-----Original Message-----
From: Jean-Daniel Cryans <[email protected]>
To: [email protected]
Sent: Tue, Oct 5, 2010 10:53 pm
Subject: Re: HBase map reduce job timing
I'd love to give you tips, but you didn't provide any data about the
input and output of your job, the kind of hardware you're using, etc.
At this point any suggestion would be a stab in the dark, the best I
can do is pointing to the existing documentation
http://wiki.apache.org/hadoop/PerformanceTuning
J-D
On Tue, Oct 5, 2010 at 7:12 PM, Venkatesh <[email protected]> wrote:
>
>
>
> I've a mapreduce job that is taking too long..over an hour..Trying to see
what can a tune
> to to bring it down..One thing I noticed, the job is kicking off
> - 500+ map tasks : 490 of them do not process any records..where as 10 of
> them
process all the records
> (200 K each..)..Any idea why that would be?...
>
> ..map phase takes about couple of minutes..
> ..reduce phase takes the rest..
>
> ..i'll try increasing # of reduce tasks..Open to other other suggestion for
tunables..
>
> thanks for your input
> venkatesh
>
>
>