raw = LOAD 'tutorial/excite.log' USING PigStorage('\t') AS (user, time, 
query:chararray);
queries = FILTER raw BY (INDEXOF(query,'yahoo') >= 0);
dump queries;


On 4/22/11 2:25 PM, "Steve Watt" <[email protected]> wrote:

Hi Folks

I've done a load of a dataset and I am attempting to filter out unwanted
records by checking that one of my tuple fields contains a particular
string. I've distilled this issue down to the sample excite.log that ships
with Pig for easy recreation. I've read through the INDEXOF code and I think
this should work (lots of queries that contain the word yahoo) but my
queries dump always contains zero records. Can anyone tell me what I am
doing wrong?

raw = LOAD 'tutorial/excite.log' USING PigStorage('\t') AS (user, time,
query);
queries = FILTER raw BY (INDEXOF(query,'yahoo') > 0);
dump queries;

Regards
Steve Watt

Reply via email to