Are the tabs for these columns still there? In that case, there should
be an empty string in there. Something like this should work then:
Y = foreach X generate
(A == '' ? null : A),
(B == '' ? null : B),
...
Otherwise, you could load the full line using TextLoader and then use
STRSPLIT on it to extract your columns. That allows you to check if E
and F are present.
Best,
-Sven
On Wed, May 25, 2011 at 3:43 PM, Arun Chandy Thomas
<[email protected]> wrote:
> Thanks for the quick reply, but my question is a little different.
> I am sorry if i am not clear in my initial post.
>
> I want the Pig script to consider E and F as null if the values are not
> present in the input line.
>
> So basically all the lines should be loaded while firing :
>>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
>
> irrespective of whether any of the fields are null or not.
>
> How can we achieve this?
>
> Thanks & Regards,
> Arun
> On May 25, 2011, at 3:35 PM, Alan Gates wrote:
>
>> No, but you can make it by adding:
>>
>> B = filter A by E is not null;
>>
>> Alan.
>>
>> On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:
>>
>>> Hi ,
>>>
>>> I am trying to use pig to aggregate data from an applications log lines.
>>>
>>> Most of the data in the input file have the following format:
>>> A B C D E F
>>>
>>> I am aggregating the data as follows:
>>>
>>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
>>> D = group A by (A, B,C,D,E,F);
>>> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
>>> STORE E INTO '$in_dir._1' using PigStorage('\t');
>>>
>>> In some cases i see the input lines are only : A B C D
>>> (E,F columns are missing)
>>> Would the pig script ignore such lines.
>>>
>>> Thanks & Regards,
>>> Arun
>>
>
>
--
http://sites.google.com/site/krasser/