I'm partial to using Java and JNI and then use the distributed cache to push 
the native libraries out to each node if not already there. 

But that's just me... ;-) 

HTH

-Mike

On Mar 3, 2013, at 6:02 PM, Julian Bui <[email protected]> wrote:

> Hi hadoop users,
> 
> Trying to figure out which interface would be best and easiest to implement 
> my application : 1) hadoop pipes or 2) java with jni or 3) something else 
> that I'm not aware of yet, as a hadoop newbie.
> 
> I will use hadoop to take pictures as input and create output jpeg pictures 
> as output.  I do not think I need a reducer.
> 
> Requirements:
> I want to use libjpeg.a (a native static library) in my hadoop application.  
> If I use hadoop pipes, I should be able to statically link the libjpeg.a into 
> my hadoop application.  If I use the java hadoop interface with jni, I think 
> I have to ship the libjpeg.a library with my hadoop jobs, is that right?  Is 
> that easy?
> I need to be able to write uniquely named files into hdfs (i.e. I need to 
> name the files so that I know which inputs they were created from).  If I 
> recall, the hadoop streaming interface doesn't let you do this because it 
> only deals with stdin/stdout - does hadoop pipes have a similar constraint?  
> Will it allow me to write uniquely named files?
> I need to be able to exploit the locality of the data.  The application 
> should be executed on the same machine as the input data (pictures).  Does 
> the hadoop pipes interface allow me to do this?  
> 
> Other questions:
> When I tried to learn more about the hadoop pipes API, all I could find is 
> this one submitter class.  Is this really it or is there more? 
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html
> I'm not really familiar with swig which is to be used with pipes.   All I 
> could really find was the same simple word count example on every site.  Does 
> swig get difficult to use for more complex projects?
> 
> Thanks,
> -Julian
> 
> 

Reply via email to