Hello, I am using hadoop streaming, and found that if I specify -inputformat to use another InputFormat (e.g. org.apache.hadoop.mapred.lib.CombineTextInputFormat) instead of using the default org.apache.hadoop.mapred.lib.TextInputFormat, an extra key emits out to the mapper program.
After digging the hadoop streaming source code, I found that there is a undocumented job property stream.map.input.ignoreKey. If -inputformat is unset (or set to org.apache.hadoop.mapred.lib.TextInputFormat), then this property is default to true, otherwise false. I have to manually set this property to true (-D stream.map.input.ignoreKey=true) when issuing hadoop streaming command, if I want to change -inputformat. Actually this property was documented before, but somehow disappeared in recent documentation. Is this property deprecated or simply somehow missed in documentation? --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
