Hi

I would like to load Json data remove any duplicates and write it back as
json. I am using the elephant-bird json libraries but I cant figure out how
to project a map.

DEFINE JsonLoader com.twitter.elephantbird.pig8.load.LzoJsonLoader();
DEFINE JsonStorage com.twitter.elephantbird.pig8.store.LzoJsonStorage();
raw = LOAD '$INPUT' USING JsonLoader AS (json:map[]);
logs = FOREACH raw GENERATE json#'host as host:chararray, json#'body' as
body:chararray;
dedupped = DISTINCT logs;
STORE dedupped INT '$OUTPUT' USING JsonStorage();

Reply via email to