I'm partial to using Java and JNI and then use the distributed cache to push the native libraries out to each node if not already there.
But that's just me... ;-) HTH -Mike On Mar 3, 2013, at 6:02 PM, Julian Bui <[email protected]> wrote: > Hi hadoop users, > > Trying to figure out which interface would be best and easiest to implement > my application : 1) hadoop pipes or 2) java with jni or 3) something else > that I'm not aware of yet, as a hadoop newbie. > > I will use hadoop to take pictures as input and create output jpeg pictures > as output. I do not think I need a reducer. > > Requirements: > I want to use libjpeg.a (a native static library) in my hadoop application. > If I use hadoop pipes, I should be able to statically link the libjpeg.a into > my hadoop application. If I use the java hadoop interface with jni, I think > I have to ship the libjpeg.a library with my hadoop jobs, is that right? Is > that easy? > I need to be able to write uniquely named files into hdfs (i.e. I need to > name the files so that I know which inputs they were created from). If I > recall, the hadoop streaming interface doesn't let you do this because it > only deals with stdin/stdout - does hadoop pipes have a similar constraint? > Will it allow me to write uniquely named files? > I need to be able to exploit the locality of the data. The application > should be executed on the same machine as the input data (pictures). Does > the hadoop pipes interface allow me to do this? > > Other questions: > When I tried to learn more about the hadoop pipes API, all I could find is > this one submitter class. Is this really it or is there more? > http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html > I'm not really familiar with swig which is to be used with pipes. All I > could really find was the same simple word count example on every site. Does > swig get difficult to use for more complex projects? > > Thanks, > -Julian > >
