Oh, I see what my confusion is... It's the "null"s on which join behaves
differently in pig than sql. Right? that's where things are different.


On Thu, Jun 10, 2010 at 12:48 PM, Alan Gates <[email protected]> wrote:

> That's already what happens, because flattening a bag that is empty results
> in 0 rows, regardless of how many rows came out of the other bag.
>
> Alan.
>
>
> On Jun 10, 2010, at 11:09 AM, hc busy wrote:
>
>  Isn't that kind of annoying? Since JOIN in sql implicitly is an inner
>> join.
>> Would have been great if
>>
>> C = JOIN A by id, B b id;
>>
>> is alias for
>> C1 = COGROUP A by id, B by id;
>> C2 = filter C1 by IsEmpty(A) OR IsEmpty(B);
>> C = foreach C2 generate FLATTEN(A), FLATTEN(B);
>>
>>
>> On Tue, Jun 8, 2010 at 12:03 PM, Alan Gates <[email protected]> wrote:
>>
>>  Historically
>>>
>>> C = JOIN A by a, B by a
>>>
>>> was defined in Pig Latin as shorthand for:
>>>
>>> C1 = COGROUP A by a, B by a;
>>> C = FOREACH C1 GENERATE flatten(A), flatten(B)
>>>
>>> which produces the doubling of keys.
>>>
>>> Also, given that Pig Latin does not require that key names be the same
>>> (as
>>> USING or NATURAL do in SQL) there would be issues if it did not have both
>>> keys in the output.  (For the same reason ON in SQL duplicates the keys
>>> in
>>> the results.)
>>>
>>> Alan.
>>>
>>>
>>> On Jun 8, 2010, at 4:45 AM, Alexander Schätzle wrote:
>>>
>>> Hi all,
>>>
>>>>
>>>> the JOIN operator of Pig produces duplicate columns in its output.
>>>> Let's say the statement is like this:
>>>>
>>>> C = JOIN A BY (var1, var2), B BY (var1, var2);
>>>>
>>>> Then C contains var1 and var2 two times (one for each input relation),
>>>> of
>>>> course with the same content.
>>>> This is somehow not what a user "usually" expects from a Join.
>>>> Why does Pig produce such redundant entries?
>>>> If you want to get rid of these entries you have to do a FOREACH for
>>>> projection.
>>>> Otherwise you shuffle unnecessary data through MR-phases.
>>>> In my opinion this is somehow really unnecessary.
>>>> I just wonder why Pig produces theo output of a Join the way it does?
>>>>
>>>> Cheers,
>>>> Alex
>>>>
>>>>
>>>>
>>>>
>>>
>

Reply via email to