This is weird, because in my case it seems to be nondeterministic. I have a text file, thing.txt, that is simply
http://www.guardian.co.uk/ asjlkdajlkdad askjldajlksdjlkasjdlkajslkdjalds asdjaskdjlasjdlkad http://www.guardian.co.uk/adsasd http://www.guardian.co.uk/sadasd http://www.guardian.co.uk/asdad I am running this code: A = LOAD 'thing.txt' AS (c7:chararray); B = filter A by (c7 matches '.*guardian\\.co\\.uk.*'); dump B; For a while, I got no results! Then, it started working after I did "dump A", although then it KEPT working. However, it isn't working with the actual data that I care about. I can't seem to get it to not work again in local mode. I am running pig-0.8.0, latest trunk. The big files in question are .bcp.gz, and locally I just use a .txt. Any ideas what it will be? I will try to replicate on a smaller set of data again...
