Re: Prune out data to a specific reduce task

Azuryy Yu Mon, 16 Mar 2015 01:09:08 -0700

Hi,
Can you set only one reduce task? why did you want set up two reudce tasks
and only one work?



On Mon, Mar 16, 2015 at 9:04 AM, Drake민영근 <drake....@nexr.com> wrote:

> Hi,
>
> If you write custom partitioner, just call them to confrim the key match
> with which partition.
>
> You can get the number of reduer from mapcontext.getNumReduceTasks().
> then, get reducer number from Partitioner.getPartition(key, value,
> numReduc). Finally, just write wanted records to the reducers.
>
> Caution: In this way, the parallelism of mapreduce programming model is
> much broken. If you cut the records for Reducer 2, the task still up but
> nothing in action.
>
> Thanks.
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
> xeonmailingl...@gmail.com> wrote:
>
>>  Hi,
>>
>> The only obstacle is to know to which partition the map output would go.
>> 1 ~ From the map method, how can I know to which partition the output go?
>> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
>> map function?
>>
>> Thanks,
>>
>>
>>
>>
>>
>> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>>
>> I think Drake's comment
>> "In the map method, records would be ignored with no output.collect() or
>> context.write()."
>> is most valid way to do it as it will avoid further processing downstream
>> and hence less resources would be consumed, as unwanted records are pruned
>> at the source itself.
>> Is there any obstacle from doing this in your map method ?
>>
>>  Regards,
>> Naga
>>  ------------------------------
>> *From:* xeonmailinglist-gmail [xeonmailingl...@gmail.com]
>> *Sent:* Thursday, March 12, 2015 22:17
>> *To:* user@hadoop.apache.org
>> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>>
>>   If I use the partitioner, I must be able to tell map reduce to not
>> execute values from a certain reduce tasks.
>>
>> The method public int getPartition(K key, V value, int numReduceTasks)
>> must always return a partition. I can’t return -1. Thus, I don’ t know how
>> to tell Mapreduce to not execute data from a partition. Any suggestion?
>>
>> ———— Forwarded Message ————
>>
>> Subject: Re: Prune out data to a specific reduce task
>>
>> Date: Thu, 12 Mar 2015 12:40:04 -0400
>>
>> From: Fei Hu hufe...@gmail.com <http://mailto:hufe...@gmail.com>
>>
>> Reply-To: user@hadoop.apache.org
>>
>> To: user@hadoop.apache.org
>>
>> Maybe you could use Partitioner.class to solve your problem.
>>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
>> xeonmailingl...@gmail.com> wrote:
>>
>>  Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>    
>>
>>
>> --
>> --
>>
>>
>

Re: Prune out data to a specific reduce task

Reply via email to