You're confusing two things here. HDFS is a data storage filesystem. MR does not have anything to do with HDFS (generally speaking).
A reducer runs as a regular JVM on a provided node, and can execute any program you'd like it to by downloading it onto its configured local filesystem and executing it. If your goal is merely to run a regular program over data that is sitting in HDFS, that can be achieved. If your library is in C then simply use a streaming program to run it and use libhdfs' HDFS API (C/C++) to read data into your functions from HDFS files. Would this not suffice? On Sun, Mar 17, 2013 at 3:09 PM, Julian Bui <[email protected]> wrote: > Hi hadoop users, > > I just want to verify that there is no way to put a binary on HDFS and > execute it using the hadoop java api. If not, I would appreciate advice in > getting in creating an implementation that uses native libraries. > > "In contrast to the POSIX model, there are no sticky, setuid or setgid bits > for files as there is no notion of executable files." Is there no > workaround? > > A little bit more about what I'm trying to do. I have a binary that > converts my image to another image format. I currently want to put it in > the distributed cache and tell the reducer to execute the binary on the data > on hdfs. However, since I can't set the execute permission bit on that > file, it seems that I cannot do that. > > Since I cannot use the binary, it seems like I have to use my own > implementation to do this. The challenge is that these libraries that I can > use to do this are .a and .so files. Would I have to use JNI and package > the libraries in the distributed cache and then have the reducer find and > use those libraries on the task nodes? Actually, I wouldn't want to use > JNI, I'd probably want to use java native access (JNA) to do this. Has > anyone used JNA with hadoop and been successful? Are there problems I'll > encounter? > > Please let me know. > > Thanks, > -Julian -- Harsh J
