Wondering if someone has reported this bug in pig 0.8 (maybe it's been
fixed?)

data.txt (tab seperated file, bad site has no canonical_url populated):
badsite.com        127.0.0.1
goodsite.com/1?foo=true    goodsite.com    127.0.0.1

data = LOAD 'data.txt' using PigStorage() as (referrer:chararray,
canonical_url:chararray, ip:chararray);
best_url = FOREACH data GENERATE ((canonical_url != '' and canonical_url is
not null) ? canonical_url : referrer) AS url, ip;
filtered = FILTER best_url BY url == 'badsite.com';
dump filtered;

If I run this it will not return anything, it is as if url isn't being
populated with the contents of canonical or referrer.
But if I start pig with -Dpig.usenewlogicalplan=false it will return just
badsite.com as expected.

Reply via email to