\bbb On Jun 10, 2015 10:58 AM, "anvesh ragi" <[email protected]> wrote:
> Hello all, > > I know that the tab is default input separator for fields : > > stream.map.output.field.separator > stream.reduce.input.field.separator > stream.reduce.output.field.separator > mapreduce.textoutputformat.separator > > but if i try to write the generic parser option : > > stream.map.output.field.separator=\t (or) > stream.map.output.field.separator="\t" > > to test how hadoop parses white space characters like "\t,\n" when used as > separators. I observed that hadoop reads it as \t character but not " > " tab space itself. I checked it by printing each line in reducer (python) > as it reads using : > > sys.stdout.write(str(line)) > > My mapper emits key/value pairs as : key value1 value2 > > using print (key,value1,value2,sep='\t',end='\n') command. > > So I expected my reducer to read each line as : key value1 value2 too, > but instead sys.stdout.write(str(line)) printed : > > key value1 value2 \\with trailing space > > From Hadoop streaming - remove trailing tab from reducer output > <http://stackoverflow.com/questions/18133290/hadoop-streaming-remove-trailing-tab-from-reducer-output>, > I understood that the trailing space is due to > mapreduce.textoutputformat.separator not being set and left as default. > > So, this confirmed my assumption that hadoop considered my total map > output : > > key value1 value2 > > as key and value as empty Text object since it read the separator from > stream.map.output.field.separator=\t as "\t" character instead of "" tab > space itself. > > Please help me understand this behavior and how can I use \t as a > separator if I want to? > > Thanks & Regards, > Anvesh R > >
