I have parsed a json file structured as:
{"id":"xyz", "name":"John", "tags":"apples and oranges"}
{"id":"xyz", "name":"John", "tags":"\uac38\uc6b0"}...etc

and I'd like to filter out the entries that contain unicode --like the
second entry.
I've tried using:

rawdata = LOAD 'data' using PigJasonLoader() as (json:map[]);
logs = FOREACH rawdata generate json#name as thingtag;
result = FILTER logs by thingtag matches '.*\\\\[a-z].*';
dump result;

This does not filter the second entry. What's more -- when I just look
at the tags being loaded, it looks like the unicode characters have
been converted (ie I see weird graphics)

running:
rawdata = LOAD 'data' using PigJasonLoader() as (json:map[]);
logs = FOREACH rawdata generate json#name as thingtag;
dump logs;

Any help would be appreciated.

Reply via email to