That did not work either. Thanks & Regards, Anvesh R
On Tue, Jun 9, 2015 at 11:12 PM, Kiran Dangeti <[email protected]> wrote: > \bbb > On Jun 10, 2015 10:58 AM, "anvesh ragi" <[email protected]> wrote: > >> Hello all, >> >> I know that the tab is default input separator for fields : >> >> stream.map.output.field.separator >> stream.reduce.input.field.separator >> stream.reduce.output.field.separator >> mapreduce.textoutputformat.separator >> >> but if i try to write the generic parser option : >> >> stream.map.output.field.separator=\t (or) >> stream.map.output.field.separator="\t" >> >> to test how hadoop parses white space characters like "\t,\n" when used >> as separators. I observed that hadoop reads it as \t character but not " >> " tab space itself. I checked it by printing each line in reducer >> (python) as it reads using : >> >> sys.stdout.write(str(line)) >> >> My mapper emits key/value pairs as : key value1 value2 >> >> using print (key,value1,value2,sep='\t',end='\n') command. >> >> So I expected my reducer to read each line as : key value1 value2 too, >> but instead sys.stdout.write(str(line)) printed : >> >> key value1 value2 \\with trailing space >> >> From Hadoop streaming - remove trailing tab from reducer output >> <http://stackoverflow.com/questions/18133290/hadoop-streaming-remove-trailing-tab-from-reducer-output>, >> I understood that the trailing space is due to >> mapreduce.textoutputformat.separator not being set and left as default. >> >> So, this confirmed my assumption that hadoop considered my total map >> output : >> >> key value1 value2 >> >> as key and value as empty Text object since it read the separator from >> stream.map.output.field.separator=\t as "\t" character instead of "" tab >> space itself. >> >> Please help me understand this behavior and how can I use \t as a >> separator if I want to? >> >> Thanks & Regards, >> Anvesh R >> >>
