Re: hadoop 2.4.0 streaming generic parser options using TAB as separator

anvesh ragi Wed, 10 Jun 2015 13:15:43 -0700

That did not work either.

Thanks & Regards,
Anvesh R


On Tue, Jun 9, 2015 at 11:12 PM, Kiran Dangeti <[email protected]>
wrote:

> \bbb
> On Jun 10, 2015 10:58 AM, "anvesh ragi" <[email protected]> wrote:
>
>> Hello all,
>>
>> I know that the tab is default input separator for fields :
>>
>> stream.map.output.field.separator
>> stream.reduce.input.field.separator
>> stream.reduce.output.field.separator
>> mapreduce.textoutputformat.separator
>>
>> but if i try to write the generic parser option :
>>
>> stream.map.output.field.separator=\t (or)
>> stream.map.output.field.separator="\t"
>>
>> to test how hadoop parses white space characters like "\t,\n" when used
>> as separators. I observed that hadoop reads it as \t character but not "
>>      " tab space itself. I checked it by printing each line in reducer
>> (python) as it reads using :
>>
>> sys.stdout.write(str(line))
>>
>> My mapper emits key/value pairs as : key value1 value2
>>
>> using print (key,value1,value2,sep='\t',end='\n') command.
>>
>> So I expected my reducer to read each line as : key value1 value2 too,
>> but instead sys.stdout.write(str(line)) printed :
>>
>> key value1 value2 \\with trailing space
>>
>> From Hadoop streaming - remove trailing tab from reducer output
>> <http://stackoverflow.com/questions/18133290/hadoop-streaming-remove-trailing-tab-from-reducer-output>,
>> I understood that the trailing space is due to
>> mapreduce.textoutputformat.separator not being set and left as default.
>>
>> So, this confirmed my assumption that hadoop considered my total map
>> output :
>>
>> key value1 value2
>>
>> as key and value as empty Text object since it read the separator from
>> stream.map.output.field.separator=\t as "\t" character instead of "" tab
>> space itself.
>>
>> Please help me understand this behavior and how can I use \t as a
>> separator if I want to?
>>
>> Thanks & Regards,
>> Anvesh R
>>
>>

Re: hadoop 2.4.0 streaming generic parser options using TAB as separator

Reply via email to