Just for reference of others who might see this thread. Jira
corresponding to parameter on reduce input limit is MAPREDUCE-2324
On 7/14/12, Harsh J wrote:
> Subir,
>
> On Sat, Jul 14, 2012 at 5:30 PM, Subir S wrote:
>> Harsh, Thanks I think this is what I was looking for. I have 3 related
>> que
Subir,
On Sat, Jul 14, 2012 at 5:30 PM, Subir S wrote:
> Harsh, Thanks I think this is what I was looking for. I have 3 related
> questions.
>
> 1.) Will this work in 0.20.2-cdh3u3
Yes, will work. (Btw, best to ask CDH-specific questions on the
cdh-u...@cloudera.org lists)
> 2.) What is the har
>>> 2. still don't understand very well in which part of the
>>>> code(MapTask.java) the intermediate data is written do which partition.
>>>> So
>>>> MapOutputBuffer is the one who actually writes the data to buffer and
>>>> spill
>>>&g
uffer and
>>> spill
>>> after buffer is full. Could you please elaborate a bit on how the data is
>>> written to which partition ?
>>>
>>>
>>> Essentially you can think of the partition-id as the 'primary key' and
>>> the
>&g
e
>>> computed to find the corresponding partition ?
>>>
>>> Robert
>>>
>>> --
>>> *From:* Arun C Murthy
>>> *To:* mapreduce-user@hadoop.apache.org
>>> *Sent:* Monday, July 9, 2012 4:33 PM
>>
alue> is added into a partition a hash on the partition ID will be
>> computed to find the corresponding partition ?
>>
>> Robert
>>
>> ----------
>> *From:* Arun C Murthy
>> *To:* mapreduce-user@hadoop.apache.org
>> *Sent:* Mo
computed to find the corresponding partition ?
>
> Robert
>
> --
> *From:* Arun C Murthy
> *To:* mapreduce-user@hadoop.apache.org
> *Sent:* Monday, July 9, 2012 4:33 PM
>
> *Subject:* Re: Basic question on how reducer works
>
>
> On
' and the
actual 'key' in the map-output of as the 'secondary key'.
hth,
Arun
Thanks,
>Robert
>
>
>
>
> From: Arun C Murthy
>To: mapreduce-user@hadoop.apache.org
>Sent: Monday, July 9, 2012 9:24 AM
>Sub
27; and the
actual 'key' in the map-output of as the 'secondary key'.
hth,
Arun
> Thanks,
> Robert
>
> From: Arun C Murthy
> To: mapreduce-user@hadoop.apache.org
> Sent: Monday, July 9, 2012 9:24 AM
> Subject: Re: Basic question on how reducer works
e a bit on how the data is written to which partition ?
Thanks,
Robert
From: Arun C Murthy
To: mapreduce-user@hadoop.apache.org
Sent: Monday, July 9, 2012 9:24 AM
Subject: Re: Basic question on how reducer works
Robert,
On Jul 7, 2012, at 6:37 PM, Grandl Ro
Hi Manoj,
As Harsh said, we would almost always need multiple reducers. As each
reduce is potentially executed on a different core (same machine or a
different one), in most cases, we would want at least as many reduces as
the number of cores for maximum parallelism/performance.
Karthik
On Mon,
Hi Harsh,
Thanks for clarifying. I was in thought earlier that Partitioner is picking
the reducer.
My cluster setup provides options for multiple reducers so i want to know
when and in which scenario we have go for multiple reducers?
Cheers!
Manoj.
On Mon, Jul 9, 2012 at 11:27 PM, Harsh J wr
Manoj,
Think of it this way, and you shouldn't be confused: A reducer == a partition.
For (1) - Partitioners do not 'call' a reduce, just write the data
with a proper partition ID. The reducer thats same as the partition
ID, picks it up for itself later. This we have already explained
earlier.
F
Hi,
It would be more helpful, If you could more details for the below doubts.
1, How the partitioner knows which reducer needs to be called?
2, When we are using more than one reducers, the output gets separated.
Actually for what scenario we have to go for multiple reducers?
Cheers!
Manoj.
O
Robert,
On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:
> Hi,
>
> I have some questions related to basic functionality in Hadoop.
>
> 1. When a Mapper process the intermediate output data, how it knows how many
> partitions to do(how many reducers will be) and how much data to go in each
>
t;>
> >> ________________
> >> From: Harsh J
> >> To: Grandl Robert ; mapreduce-user
> >>
> >> Sent: Sunday, July 8, 2012 9:16 PM
> >>
> >> Subject: Re: Basic question on how reducer works
> >>
> >> The chan
t;>
>> I see. I was looking into tasktracker log :).
>>
>> Thanks a lot,
>> Robert
>>
>>
>> From: Harsh J
>> To: Grandl Robert ; mapreduce-user
>>
>> Sent: Sunday, July 8, 2012 9:16 PM
>>
>> Subject: R
e.org>
> *Sent:* Sunday, July 8, 2012 9:16 PM
>
> *Subject:* Re: Basic question on how reducer works
>
> The changes should appear in your Task's userlogs (not the TaskTracker
> logs). Have you deployed your changed code properly (i.e. do you
> generate a new tarball, or per
I see. I was looking into tasktracker log :).
Thanks a lot,
Robert
From: Harsh J
To: Grandl Robert ; mapreduce-user
Sent: Sunday, July 8, 2012 9:16 PM
Subject: Re: Basic question on how reducer works
The changes should appear in your Task's userlogs
called and which not. Even more in ReduceTask.java.
>
> Do you have any ideas ?
>
> Thanks a lot for your answer,
> Robert
>
>
> From: Harsh J
> To: mapreduce-user@hadoop.apache.org; Grandl Robert
> Sent: Sunday, July 8, 2012 1:34 AM
&g
Hi Robert,
Inline. (Answer is specific to Hadoop 1.x since you asked for that
alone, but certain things may vary for Hadoop 2.x).
On Sun, Jul 8, 2012 at 7:07 AM, Grandl Robert wrote:
> Hi,
>
> I have some questions related to basic functionality in Hadoop.
>
> 1. When a Mapper process the interm
21 matches
Mail list logo