Hi I would like to load Json data remove any duplicates and write it back as json. I am using the elephant-bird json libraries but I cant figure out how to project a map.
DEFINE JsonLoader com.twitter.elephantbird.pig8.load.LzoJsonLoader(); DEFINE JsonStorage com.twitter.elephantbird.pig8.store.LzoJsonStorage(); raw = LOAD '$INPUT' USING JsonLoader AS (json:map[]); logs = FOREACH raw GENERATE json#'host as host:chararray, json#'body' as body:chararray; dedupped = DISTINCT logs; STORE dedupped INT '$OUTPUT' USING JsonStorage();
