With hadoop streaming and no reducer, I would expect the output written to HDFS to be the exact STDOUT from the mapper. I noticed that tab characters (0x9) are getting inserted before every new line character (0xa). This is problematic for me because the output of my mapper is binary data which I would like to be written to HDFS unaltered.
I've narrowed my issue down to a very simple example that anybody can run. Create a simple test.txt file with 4 or more lines of text (must have newline characters to exemplify the problem). Copy this to HDFS, and run a simple streaming job with "cat" as the mapper: hadoop jar ../contrib/streaming/hadoop-streaming-1.0.3.jar -input /Users/hadoop/test/test.txt -output /Users/hadoop/test/output -mapper "cat" -reducer NONE Copy the output/part-00000 file to local, and hexdump the file. You'll notice that 0xA bytes have become 0x9 0xA. There must be a parameter to streaming that can fix this, but I have not been able to find it. Thanks in advance, Jason
