I have parsed a json file structured as:
{"id":"xyz", "name":"John", "tags":"apples and oranges"}
{"id":"xyz", "name":"John", "tags":"\uac38\uc6b0"}...etcand I'd like to filter out the entries that contain unicode --like the second entry. I've tried using: rawdata = LOAD 'data' using PigJasonLoader() as (json:map[]); logs = FOREACH rawdata generate json#name as thingtag; result = FILTER logs by thingtag matches '.*\\\\[a-z].*'; dump result; This does not filter the second entry. What's more -- when I just look at the tags being loaded, it looks like the unicode characters have been converted (ie I see weird graphics) running: rawdata = LOAD 'data' using PigJasonLoader() as (json:map[]); logs = FOREACH rawdata generate json#name as thingtag; dump logs; Any help would be appreciated.
