Re: Reducer called twice for same key

Ravikant Dindokar Sun, 28 Jun 2015 23:11:05 -0700

Hi Harshit,

PFA


Thanks
Ravikant

On Mon, Jun 29, 2015 at 11:31 AM, Harshit Mathur <[email protected]>
wrote:

> Can you share PALReducer also?
>
> On Mon, Jun 29, 2015 at 11:21 AM, Ravikant Dindokar <
> [email protected]> wrote:
>
>> Adding source code for more clarity
>>
>> Problem statement is simple
>>
>> PartitionFileMapper : it takes input file which has tab separated value V
>> , P
>> It emits (V, -1#P)
>>
>> ALFileMapper : It takes input file which has tab separated values V, EL
>> It emits (V, E#-1)
>>
>> in reducer I want to emit
>> (V,E#P)
>>
>> Thanks
>> Ravikant
>>
>> On Mon, Jun 29, 2015 at 11:04 AM, Ravikant Dindokar <
>> [email protected]> wrote:
>>
>>> By custom key, did you meant some class object ? then no.
>>>
>>> I have two map methods each having different file as input. And both map
>>> methods emit *Longwritable key* type. But As in stdout of container
>>> file I can see,
>>>
>>> key & value separated by ':'
>>>
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:*391*:-1#11
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:*391*
>>> :3278620528725786624:5352454#-1
>>>
>>> for key 391 reducer is called twice. , one for value from first map
>>> while one for value from other map.
>>>
>>> In map method I parse the string from input file as Long variable and
>>> then emit it as LongWritable.
>>>
>>> Is there something I am missing when I use multipleInput
>>> (org.apache.hadoop.mapreduce.lib.input.MultipleInputs)?
>>>
>>> Thanks
>>> Ravikant
>>>
>>> On Mon, Jun 29, 2015 at 9:22 AM, Harshit Mathur <[email protected]>
>>> wrote:
>>>
>>>> As per Map Reduce, it is not possible that two different reducers will
>>>> get same keys.
>>>> I think you have created some custom key type? If that is the case then
>>>> there should be some issue with the comparator.
>>>>
>>>> On Mon, Jun 29, 2015 at 12:40 AM, Ravikant Dindokar <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Hadoop user,
>>>>>
>>>>> I have two map classes processing two different input files. Both map
>>>>> functions have same key,value format to emit.
>>>>>
>>>>> But Reducer called twice for same key , one for value from first map
>>>>> while one for value from other map.
>>>>>
>>>>> I am printing (key ,value) pairs in reducer  :
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:391:-1#11
>>>>>
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:391:3278620528725786624:5352454#-1
>>>>>
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:591:3278620528725852160:4194699#-1
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:591:-1#13
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:2391:-1#19
>>>>>
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:2391:3278620528725917696:5283986#-1
>>>>>
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:3291:3278620528725983232:4973087#-1
>>>>>
>>>>> both maps emit Longwritable key and Text value.
>>>>>
>>>>>
>>>>> Any idea why this is happening?
>>>>> Is there any way to get hash values generated by hadoop for keys
>>>>> emitted by mapper?
>>>>>
>>>>> Thanks
>>>>> Ravikant
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harshit Mathur
>>>>
>>>
>>>
>>
>
>
> --
> Harshit Mathur
>

package in.dream_lab.hadoopPipeline.cc;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
/*
 * Job ID : 2
 * Job Name : VP_AP_to_PAL
 * Job Description: Concatenate partition Id with vertex adjacency list
 * Map Input File: VP, EL
 * Map Input Format :V_id, [P_id]  V_src, [<E_id,V_sink>+]
 * Map Emit :V_id, [-1, P_id]    V_src, [V_sink, -1]
 * Reducer Emit: V_id, P_id, <E_id, V_sink>+
 * Reducer Output File :PAL
 * Note :Separator between P_id, <E_id, V_sink>+ is ":"
 * Separator between Vid & Pid is '#'
 */
public class PALReducer extends Reducer< LongWritable, Text,Text,Text> {
	
	protected void reduce(LongWritable key, Iterable<Text> values , Context context)
			throws IOException, InterruptedException {
		
		
		String partitionId="";
		String adjList="";
		StringBuilder sb=new StringBuilder();
		for(Text value:values){
			
			System.out.println("Reduce:" +key+":"+ value);
		   String line = value.toString();
		   
		   String[] strs = line.trim().split("#");
		   
		   if(strs[1]!= "-1"){  /*This has partitionId*/
			   
			    partitionId =strs[1];
			  
		   }
		   else{ /*This has adjacency List*/
			   
			   adjList=strs[0];
			   
		   }
	}
		sb.append(key).append("#").append(partitionId);
		context.write(new Text(sb.toString()), new Text(adjList));
		
	}

}

Re: Reducer called twice for same key

Reply via email to