Hi, I have this simple file and tried to remove lines that have itemid column as empty string '', but it Won't work, I tried to set the == to some valid itemid in the file see if I can filter out those lines, Still it wont' work, any one knows how to use the '=='?
Pig script: A = LOAD '$DATA' AS (timestamp:chararray,itemid:chararray,actiontype:chararray,actionid:chararray,anonid:chararray,deviceid:chararray,userid:chararray,mediouserid:chararray); B = DISTINCT A PARALLEL 2; -- none of the following filter would work. C = FILTER B BY itemid == ''; D = FILTER B BY itemid == '591837'; E = FILTER B BY actiontype == 'AddToCart'; STORE A INTO 'OUTPUT1/A' USING PigStorage(); STORE B INTO 'OUTPUT1/B' USING PigStorage(); STORE C INTO 'OUTPUT1/C' USING PigStorage(); STORE D INTO 'OUTPUT1/D' USING PigStorage(); STORE E INTO 'OUTPUT1/E' USING PigStorage(); Data file is attached in the email. Thanks.
