Hello Saurav,

You are correct that it generally is not possible to pass an instance of a 
class directly to a mapper (or reducer).  This is because the mapper tasks 
execute on arbitrary nodes in the Hadoop cluster, running in different JVM 
processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with 
relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The 
Mapper can override the setup function to do one-time initialization at the 
start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of 
the Configuration.  As I said earlier, these will be only primitive types like 
String or int.  If it's helpful, your setup method can use the primitive values 
read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand 
Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Reply via email to