Hello,

I am using hadoop streaming, and found that if I specify -inputformat to use 
another InputFormat (e.g. 
org.apache.hadoop.mapred.lib.CombineTextInputFormat) instead of 
using the default org.apache.hadoop.mapred.lib.TextInputFormat, an extra key 
emits out to the mapper program.


After digging the hadoop streaming source code, I found that there is a 
undocumented job property stream.map.input.ignoreKey. If -inputformat is unset 
(or set to org.apache.hadoop.mapred.lib.TextInputFormat), then this property is 
default to true, otherwise false. I have to manually set this property to true 
(-D stream.map.input.ignoreKey=true) when issuing hadoop streaming command, if 
I want to change -inputformat.

Actually this property was documented before, but somehow disappeared in recent 
documentation. Is this property deprecated or simply somehow missed in 
documentation?

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to