Hi, Honestly, I don't think JsonStorage in Pig will be improved in the future. More companies are adapting Parquet and ORC FFs, so I see more actions coming on that front. So you're probably better off switch to one of those new FFs in the long term. Just a thought.
Thanks, Cheolsoo On Tue, Feb 18, 2014 at 9:02 PM, Simon Reavely <[email protected]>wrote: > All, > > > > I am having trouble serializing to Json from Pig scripts (Storage). Here is > what I've tried and failed with: > > 1. - Pig 0.10+ PigStorage. Maps are assumed to be String to String, > so heavily nested structures are not handled. > > 2. - Hortonworks toJson UDF. Maps are not supported. > > 3. - Twitter ElephantBird LzoJsonStorage. Arrays/Bags are not > handled. > > > > I wondered if anyone is using something to store output from pig scripts as > Json and whether they use maps. > > If so, how are you writing out Json and what issues have you seen? > > If not, what structured format are you using and why? Avro? Thrift? > > > > Historically, all our pig jobs results in more tabular results and > therefore it's not been an issue. The input data is in Json and we've used > ElephantBird (from twitter) to load it as a map. > > > > Given the above experience, our only option is to use Pig's JsonLoader to > load the Json using a specified schema but this will pin us into a single > schema and the data is not consistent (schemas evolve). Previously we could > deal with this inside the script but not if we define a single schema for > the loaded data. So I'm honestly reconsidering our use of Json (which is a > historical conversation in itself). > > > > Cheers, > > Simon > > > > -- > Simon Reavely > [email protected] >
