Re: What is the output format of org.apache.hadoop.examples.Join?

jingguo yao Wed, 27 Mar 2013 23:26:43 -0700

Yanbo:

Sorry for pasting the wrong result.


The output for joining a.txt, b.txt and c.txt is as follows (still not
the same produced by Chris):

AAAAAAAA        a0      [,,]
AAAAAAAA        b0      [,,]
AAAAAAAA        c0      [,,]
BBBBBBBB        a1      [,,]
BBBBBBBB        b1      [,,]
BBBBBBBB        b2      [,,]
BBBBBBBB        b3      [,,]
BBBBBBBB        c1      [,,]
CCCCCCCC        a2      [,,]
CCCCCCCC        a3      [,,]
DDDDDDDD        c2      [,,]
DDDDDDDD        c3      [,,]


On Thu, Mar 28, 2013 at 11:46 AM, Yanbo Liang <[email protected]> wrote:
> Your output is only a.txt join b.txt.
> You need to joint c.txt continually.
>
> 2013/3/26 jingguo yao <[email protected]>
>
>> I am reading the following mail:
>>
>> http://www.mail-archive.com/[email protected]/msg04066.html
>>
>> After running the following command (I am using Hadoop 1.0.4):
>>
>> bin/hadoop jar hadoop-examples-1.0.4.jar join \
>>    -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
>>    -outKey org.apache.hadoop.io.Text \
>>    -joinOp outer \
>>    join/a.txt join/b.txt join/c.txt joinout
>>
>>
>> Then I run "bin/hadoop fs -text joinout/part-00000". I see the following
>> result:
>>
>> AAAAAAAA        a0      [,]
>> AAAAAAAA        b0      [,]
>> BBBBBBBB        a1      [,]
>> BBBBBBBB        b1      [,]
>> BBBBBBBB        b2      [,]
>> BBBBBBBB        b3      [,]
>> CCCCCCCC        a2      [,]
>> CCCCCCCC        a3      [,]
>>
>> But Chris said that the result should be:
>>
>> AAAAAAAA        [a0,b0,c0]
>> BBBBBBBB        [a1,b1,c1]
>> BBBBBBBB        [a1,b2,c1]
>> BBBBBBBB        [a1,b3,c1]
>> CCCCCCCC        [a2,,]
>> CCCCCCCC        [a3,,]
>> DDDDDDDD        [,,c2]
>> DDDDDDDD        [,,c3]
>>
>> Is Join's output format changed for Hadoop 1.0.4?
>>
>>
>> --
>> Jingguo
>>



-- 
Jingguo

Re: What is the output format of org.apache.hadoop.examples.Join?

Reply via email to