Bejoy,

I've read somethere about keeping number of mapred.reduce.tasks below the 
reduce task capcity. Here is what I just tested:

Output 25Gb. 8DN cluster with 16 Map and Reduce Task Capacity:

1 Reducer   - 22mins
4 Reducers - 11.5mins
8 Reducers - 5mins
10 Reducers - 7mins
12 Reducers - 6:5mins
16 Reducers - 5.5mins

8 Reducers have won the race. But Reducers at the max capacity was very clos. :)

AK47


From: Bejoy KS [mailto:[email protected]]
Sent: Wednesday, November 21, 2012 11:51 AM
To: [email protected]
Subject: Re: guessing number of reducers.

Hi Sasha

In general the number of reduce tasks is chosen mainly based on the data volume 
to reduce phase. In tools like hive and pig by default for every 1GB of map 
output there will be a reducer. So if you have 100 gigs of map output then 100 
reducers.
If your tasks are more CPU intensive then you need lesser volume of data per 
reducer for better performance results.

In general it is better to have the number of reduce tasks slightly less than 
the number of available reduce slots in the cluster.
Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: jamal sasha <[email protected]>
Date: Wed, 21 Nov 2012 11:38:38 -0500
To: [email protected]<[email protected]>
ReplyTo: [email protected]
Subject: guessing number of reducers.

By default the number of reducers is set to 1..
Is there a good way to guess optimal number of reducers....
Or let's say i have tbs worth of data... mappers are of order 5000 or so...
But ultimately i am calculating , let's say, some average of whole data... say 
average transaction occurring...
Now the output will be just one line in one "part"... rest of them will be 
empty.So i am guessing i need loads of reducers but then most of them will be 
empty but at the same time one reducer won't suffice..
What's the best way to solve this..
How to guess optimal number of reducers..
Thanks
NOTICE: This e-mail message and any attachments are confidential, subject to 
copyright and may be privileged. Any unauthorized use, copying or disclosure is 
prohibited. If you are not the intended recipient, please delete and contact 
the sender immediately. Please consider the environment before printing this 
e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont 
confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le 
secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est 
interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, 
supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? 
l'environnement avant d'imprimer le pr?sent courriel

Reply via email to