Thanks Mridul and Daniel. 

For now I am doing flatten and referencing the details. It works but it
takes additional few steps. Would like to see the fix for this in the trunk.

Regards,
badri

-----Original Message-----
From: Mridul Muralidharan [mailto:[email protected]] 
Sent: Saturday, April 09, 2011 4:45 PM
To: [email protected]
Cc: Daniel Dai
Subject: Re: Dereferencing columns of nested bags


If you try to project that out, you will end up with exceptions - which 
was the issue being raised (not the expected functionality - which is 
understood well : whether flatten is required or not depends on the 
script/udf's in question).


To illustrate, please try :

A = load 'some_file' AS (id, visits:bag{visittuple:tuple(timestamp, 
visit:bag{details:tuple(Key:chararray, Value:chararray)})});
B = FOREACH A generate $1.$1;

This will result in exceptions.


While

A = load 'some_file' AS (id, visits:bag{visittuple:tuple(timestamp, 
visit:bag{details:tuple(Key:chararray, Value:chararray)})});
B = FOREACH A generate $1.$0;

works.




Regards,
Mridul


On Saturday 09 April 2011 02:39 AM, Daniel Dai wrote:
> Bag dereference results a bag with less columns. It does not reduce the
> nested levels.
>
> $1 refer to  visits: {(timestamp: bytearray,visit: {(Key:
> chararray,Value: chararray)})}
> $1.$1 slice the second column of the bag, all it does is drop timestamp
> column from bag "visits". The bag is still there. The schema for $1.$1
> is {(visit: {(Key: chararray,Value: chararray)})}. Note the mechanism is
> different than tuple. If $1 is a tuple, $1.$1 does reduce one level and
> get nested item out.
>
> To reduce the level of a bag, you can only flatten the bag.
>
> Daniel
>
> On 04/08/2011 09:41 AM, Mridul Muralidharan wrote:
>> On Friday 08 April 2011 08:16 PM, Badrinarayanan S wrote:
>>> I am trying it within foreach as part of generate.
>>>
>>> I believe the innermost tuple of Key and Value is considered as a single
>>> column. So I am able to refer only to $1.$1.($0) which gives the whole
>>> tuple. However would it be possible to generate the Key and Value as
>>> separate columns as part of foreach.
>>>
>>> The reference to $1.$1.($0, $1) results in an error like out of bound
>>> access.
>>
>> You are right, it looks pretty broken.
>> You can reference $1.$0 but not $1.$1 !
>>
>> You might want to file a JIRA I guess ...
>>
>>
>>
>> If you split it into multiple foreach/flatten invocations, you can get
>> to the data you want (but it is not the same functionally since you
>> loose record level aggregation that $1.$1.$0 (for ex) provides).
>>
>>
>> Regards,
>> Mridul
>>
>>> Thanks,
>>> badri
>>>
>>> -----Original Message-----
>>> From: Mridul Muralidharan [mailto:[email protected]]
>>> Sent: Friday, April 08, 2011 4:32 PM
>>> To: [email protected]
>>> Cc: Badrinarayanan S
>>> Subject: Re: Dereferencing columns of nested bags
>>>
>>>
>>> How are you trying to reference it ? Within foreach ? Filter ? Or
>>> elsewhere ?
>>>
>>> Doesn't something like $1.$1.($0, $1) not work to reference key, value
>>> as a tuple ?
>>>
>>>
>>> - Mridul
>>>
>>>
>>> On Friday 08 April 2011 03:38 PM, Badrinarayanan S wrote:
>>>> Is it possible to dereference a column part of a nested bag. In the
schema
>>>> given below, I am trying to dereference the columns Key and Value which
is
>>>> part of visit bag which is part of visits bag.
>>>>
>>>>
>>>>
>>>> (id, visits:bag{visittuple:tuple(timestamp,
>>>> visit:bag{details:tuple(Key:chararray, Value:chararray)})})
>>>>
>>>>
>>>>
>>>>     From the SVN trunk of Pig I could see a fix for this (PIG-1866, the
fix
>>> for
>>>> dereferencing a bag within tuple does not work). Does it also addresses
>>>> nested bags? If so for the above example can it be dereferenced as
>>>> visits.visit.Key. I tried it against the latest trunk, but it failed.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> badri
>>>>
>>>>
>>>>
>>>> Disclaimer: This message (including any attachments) is being sent from
>>> Fifth Generation Technologies India (P) Ltd. (5G) and may contain
>>> information that is proprietary, confidential and privileged. If you are
not
>>> the intended recipient, please inform the sender immediately by reply
e-mail
>>> and delete this message and attachments from your system, without
retaining
>>> a copy. Any unauthorized use or dissemination of this message in whole
or in
>>> part is strictly prohibited. 5G shall  not be liable for the improper or
>>> incomplete transmission of the information contained in this
communication
>>> nor for any delay in its receipt or damage to your system. 5G does not
>>> guarantee that the integrity of this communication has been maintained
nor
>>> that this communication is free of viruses, interceptions or
interference.
>>>
>>>
>>>
>>>
>>>
>>> Disclaimer: This message (including any attachments) is being sent from
Fifth Generation Technologies India (P) Ltd. (5G) and may contain
information that is proprietary, confidential and privileged. If you are not
the intended recipient, please inform the sender immediately by reply e-mail
and delete this message and attachments from your system, without retaining
a copy. Any unauthorized use or dissemination of this message in whole or in
part is strictly prohibited. 5G shall  not be liable for the improper or
incomplete transmission of the information contained in this  communication
nor for any delay in its receipt or damage to your system. 5G does not
guarantee that the integrity of this communication has been maintained nor
that this communication is free of viruses, interceptions or interference.
>>>
>





Disclaimer: This message (including any attachments) is being sent from Fifth 
Generation Technologies India (P) Ltd. (5G) and may contain information that is 
proprietary, confidential and privileged. If you are not the intended 
recipient, please inform the sender immediately by reply e-mail and delete this 
message and attachments from your system, without retaining a copy. Any 
unauthorized use or dissemination of this message in whole or in part is 
strictly prohibited. 5G shall  not be liable for the improper or incomplete 
transmission of the information contained in this  communication nor for any 
delay in its receipt or damage to your system. 5G does not guarantee that the 
integrity of this communication has been maintained nor that this communication 
is free of viruses, interceptions or interference.

Reply via email to