Wondering if someone has reported this bug in pig 0.8 (maybe it's been fixed?)
data.txt (tab seperated file, bad site has no canonical_url populated): badsite.com 127.0.0.1 goodsite.com/1?foo=true goodsite.com 127.0.0.1 data = LOAD 'data.txt' using PigStorage() as (referrer:chararray, canonical_url:chararray, ip:chararray); best_url = FOREACH data GENERATE ((canonical_url != '' and canonical_url is not null) ? canonical_url : referrer) AS url, ip; filtered = FILTER best_url BY url == 'badsite.com'; dump filtered; If I run this it will not return anything, it is as if url isn't being populated with the contents of canonical or referrer. But if I start pig with -Dpig.usenewlogicalplan=false it will return just badsite.com as expected.
