Hi,

I have this simple file and tried to remove lines that have itemid column as 
empty string '', but it
Won't work, I tried to set the == to some valid itemid in the file see if I can 
filter out those lines,
Still it wont' work,  any one knows how to use the '=='?

Pig script:

A = LOAD '$DATA' AS 
(timestamp:chararray,itemid:chararray,actiontype:chararray,actionid:chararray,anonid:chararray,deviceid:chararray,userid:chararray,mediouserid:chararray);
B = DISTINCT A PARALLEL 2;

-- none of the following filter would work.
C = FILTER B BY itemid == '';
D = FILTER B BY itemid == '591837';
E = FILTER B BY actiontype == 'AddToCart';

STORE A INTO 'OUTPUT1/A' USING PigStorage();
STORE B INTO 'OUTPUT1/B' USING PigStorage();
STORE C INTO 'OUTPUT1/C' USING PigStorage();
STORE D INTO 'OUTPUT1/D' USING PigStorage();
STORE E INTO 'OUTPUT1/E' USING PigStorage();


Data file is attached in the email.

Thanks.

Reply via email to