Re: Using hadoop streaming with binary data

Jay Hacker Thu, 21 Feb 2013 09:51:18 -0800

I was able to write a little code to make this happen, and submitted a
patch to Hadoop:


https://issues.apache.org/jira/browse/MAPREDUCE-5018

There is a jar file and shell script there for anybody who wants to try
this without recompiling all of Hadoop.  It lets you run something like
"mapstream indir md5sum outdir" and get one map job per file in indir with
real raw binary data passed to your map command and the output written to a
file in outdir.  This makes it easy to run all your favorite Unix commands
as map-only streaming jobs, taking advantage of reliable distributed
execution.

Re: Using hadoop streaming with binary data

Reply via email to