You could, but this is generally discouraged.  Pig does something like this by 
taking the object serializing it out into a byte array and then using base64 
encoding turns it into a string that is put in the config.  The problem with 
this is that the config can grow very large.  In the 1.0 line of Hadoop the 
maximum size of the Job's config is limited to avoid causing the Job Tracker to 
go out of memory.  In V2 this is less of a concern because it is your own 
application master that has to read it all in.

In general if it is a very small amount of data you can play games like this, 
if it is a large amount of data you probably want to use the distributed cache 
to do this instead.

--Bobby

From: Peter Cogan <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Friday, February 8, 2013 9:15 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Passing data via Configuration

Hi,

I have data stored in an object that I want to pass into my Mapper.

I see from Configuration that there are setters and getters for primitives, but 
is there a way of doing this with non-primitives - either my own classes or 
builtin classes (such as HashMap etc)

thanks!
Peter

Reply via email to