raw = LOAD 'tutorial/excite.log' USING PigStorage('\t') AS (user, time,
query:chararray);
queries = FILTER raw BY (INDEXOF(query,'yahoo') >= 0);
dump queries;
On 4/22/11 2:25 PM, "Steve Watt" <[email protected]> wrote:
Hi Folks
I've done a load of a dataset and I am attempting to filter out unwanted
records by checking that one of my tuple fields contains a particular
string. I've distilled this issue down to the sample excite.log that ships
with Pig for easy recreation. I've read through the INDEXOF code and I think
this should work (lots of queries that contain the word yahoo) but my
queries dump always contains zero records. Can anyone tell me what I am
doing wrong?
raw = LOAD 'tutorial/excite.log' USING PigStorage('\t') AS (user, time,
query);
queries = FILTER raw BY (INDEXOF(query,'yahoo') > 0);
dump queries;
Regards
Steve Watt