Re: HBase map reduce job timing

Venkatesh Tue, 05 Oct 2010 20:48:31 -0700

 Thanks J-D
I already have 1 & 2..(write buffer & autoflush)..
I'll try 3 & 4 as well


Regarding number of map tasks 500+, 490 of them processing nothing, do you have 
an explanation
for that?..Wondering if its kicking off too many JVMs most doing nothing..

'top' reports less free memory (couple of gig.) though box has 36 gig total.. I 
don't quite trust
top since cached blocks don't show up under free column even if no process is 
running..

venkatesh

 


 

 

-----Original Message-----
From: Jean-Daniel Cryans <jdcry...@apache.org>
To: user@hbase.apache.org
Sent: Tue, Oct 5, 2010 11:30 pm
Subject: Re: HBase map reduce job timing


Ah ok, then using the write buffer should get you the speed you need
(providing that you have the hardware capacity and that you use HTable
in a efficient way).

In setup() set this to false on all 3 htables:
http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean)

In cleanup() call this on all htables:
http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#flushCommits()

Also to make your maps faster you could set this to 10 or more when
you create your input format:
http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/Scan.html#setCaching(int)

J-D

On Tue, Oct 5, 2010 at 8:23 PM, Venkatesh <vramanatha...@aol.com> wrote:
>
>  Sure..Both input & output are HBase tables
> Input (mapper phase) - scanning a HBase table for all records within time 
range (using hbase timestamps)
> Output (reduce phase) - doing a Put to 3 different HBase tables
>
>
>
> -----Original Message-----
> From: Jean-Daniel Cryans <jdcry...@apache.org>
> To: user@hbase.apache.org
> Sent: Tue, Oct 5, 2010 11:14 pm
> Subject: Re: HBase map reduce job timing
>
>
> It'd be more useful if we knew where that data is coming from, and
> where it's going. Are you scanning HBase and/or writing to it?
>
> J-D
>
> On Tue, Oct 5, 2010 at 8:05 PM, Venkatesh <vramanatha...@aol.com> wrote:
>>
>>
>>
>>  Sorry..yeah..i've to do some digging to provide some data..
>> What sort of data would be helpful? Would stats reported by jobtracker.jsp
> suffice? I've pasted that in this email..
>> I can gather more jvm stats..thanks
>>
>> Status: Succeeded
>> Started at: Tue Oct 05 21:39:58 EDT 2010
>> Finished at: Tue Oct 05 22:36:43 EDT 2010
>> Finished in: 56mins, 45sec
>> Job Cleanup: Successful
>>
>>
>>
>> Kind
>> % Complete
>> Num Tasks
>> Pending
>> Running
>> Complete
>> Killed
>> Failed/Killed
>> Task Attempts
>>
>> map
>> 100.00%
>>
>>
>>
>>
>>
>> 565
>> 0
>> 0
>> 565
>> 0
>> 0 / 11
>>
>> reduce
>> 100.00%
>>
>>
>>
>>
>>
>> 20
>> 0
>> 0
>> 20
>> 0
>> 0 / 2
>>
>>
>>
>>
>>
>>
>>
>> Counter
>>
>> Map
>>
>> Reduce
>>
>> Total
>>
>>
>>
>> Job Counters
>>
>> Launched reduce tasks
>>
>> 0
>>
>> 0
>>
>> 22
>>
>>
>>
>> Rack-local map tasks
>>
>> 0
>>
>> 0
>>
>> 66
>>
>>
>>
>> Launched map tasks
>>
>> 0
>>
>> 0
>>
>> 576
>>
>>
>>
>> Data-local map tasks
>>
>> 0
>>
>> 0
>>
>> 510
>>
>>
>>
>> com.JobRecords
>>
>> REDUCE_PHASE_RECORDS
>>
>> 0
>>
>> 597,712
>>
>> 597,712
>>
>>
>>
>> MAP_PHASE_RECORDS
>>
>> 2,534,807
>>
>> 0
>>
>> 2,534,807
>>
>>
>>
>> FileSystemCounters
>>
>> FILE_BYTES_READ
>>
>> 335,845,726
>>
>> 861,146,518
>>
>> 1,196,992,244
>>
>>
>>
>> FILE_BYTES_WRITTEN
>>
>> 1,197,031,156
>>
>> 861,146,518
>>
>> 2,058,177,674
>>
>>
>>
>> Map-Reduce Framework
>>
>> Reduce input groups
>>
>> 0
>>
>> 597,712
>>
>> 597,712
>>
>>
>>
>> Combine output records
>>
>> 0
>>
>> 0
>>
>> 0
>>
>>
>>
>> Map input records
>>
>> 2,534,807
>>
>> 0
>>
>> 2,534,807
>>
>>
>>
>> Reduce shuffle bytes
>>
>> 0
>>
>> 789,145,342
>>
>> 789,145,342
>>
>>
>>
>> Reduce output records
>>
>> 0
>>
>> 0
>>
>> 0
>>
>>
>>
>> Spilled Records
>>
>> 3,522,428
>>
>> 2,534,807
>>
>> 6,057,235
>>
>>
>>
>> Map output bytes
>>
>> 851,007,170
>>
>> 0
>>
>> 851,007,170
>>
>>
>>
>> Map output records
>>
>> 2,534,807
>>
>> 0
>>
>> 2,534,807
>>
>>
>>
>> Combine input records
>>
>> 0
>>
>> 0
>>
>> 0
>>
>>
>>
>> Reduce input records
>>
>> 0
>>
>> 2,534,807
>>
>> 2,534,807
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Jean-Daniel Cryans <jdcry...@apache.org>
>> To: user@hbase.apache.org
>> Sent: Tue, Oct 5, 2010 10:53 pm
>> Subject: Re: HBase map reduce job timing
>>
>>
>> I'd love to give you tips, but you didn't provide any data about the
>> input and output of your job, the kind of hardware you're using, etc.
>> At this point any suggestion would be a stab in the dark, the best I
>> can do is pointing to the existing documentation
>> http://wiki.apache.org/hadoop/PerformanceTuning
>>
>> J-D
>>
>> On Tue, Oct 5, 2010 at 7:12 PM, Venkatesh <vramanatha...@aol.com> wrote:
>>>
>>>
>>>
>>>  I've a mapreduce job that is taking too long..over an hour..Trying to see
>> what can a tune
>>> to to bring it down..One thing I noticed, the job is kicking off
>>> - 500+ map tasks : 490 of them do not process any records..where as 10 of
> them
>> process all the records
>>>  (200 K each..)..Any idea why that would be?...
>>>
>>> ..map phase takes about couple of minutes..
>>> ..reduce phase takes the rest..
>>>
>>> ..i'll try increasing # of reduce tasks..Open to other other suggestion for
>> tunables..
>>>
>>> thanks for your input
>>> venkatesh
>>>
>>>
>>>
>>
>>
>>
>
>
>

Re: HBase map reduce job timing

Reply via email to