I've found Twitter's elephantbird library very useful here (https://github.com/kevinweil/elephant-bird )
a = LOAD 'file3.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') Will parse the JSON into a map http://pig.apache.org/docs/r0.11.1/basic.html#map-schema the JSONArray gets parsed into a DataBag of maps. cf. https://stackoverflow.com/questions/11035105/processing-json-through-pig-scripts/16501542#16501542 On Fri, Jul 25, 2014 at 4:55 PM, Satish Kolli <feedwo...@gmail.com> wrote: > Did you try the standard JsonLoader? I didn't personally use it but it > looks like you can specify the schema to extract/parse from your json. > > http://pig.apache.org/docs/r0.13.0/func.html#jsonloadstore > > If not, you can also look at the following example I found googling: > > https://gist.github.com/kimsterv/601331 > > > Thanks. > > > > > On Fri, Jul 25, 2014 at 8:01 AM, praveenesh kumar <praveen...@gmail.com> > wrote: > >> One simple way is to write a UDF that will act as Json parser. Load your >> data and then call your UDF to parse and extract whatever you want from the >> Json. You need to build what you want to get. Pig doesn't do that for you, >> it gives you the capability to do that. How you do is upto you. >> >> >> On Fri, Jul 25, 2014 at 12:09 PM, unmesha sreeveni <unmeshab...@gmail.com> >> wrote: >> >> > Hi >> > >> > This is my code for sampling >> > >> > *--Load data* >> > *inputdata = LOAD '$input' using PigStorage('$delimiter');* >> > >> > *--Group data* >> > *groupedByAll = group inputdata all;* >> > >> > *--output into hdfs* >> > *sampled = SAMPLE inputdata $fraction;* >> > *store sampled into '$output' using PigStorage('$delimiter'); * >> > >> > --Sampling.pig >> > --pig -x mapreduce -f Sampling.pig -param input=foo.csv -param >> > output=OUT/pig -param delimiter="," -param fraction='0.05' >> > >> > --Load data >> > inputdata = LOAD '$input' using PigStorage('$delimiter'); >> > >> > --Group data >> > groupedByAll = group inputdata all; >> > >> > --output into hdfs >> > sampled = SAMPLE inputdata $fraction; >> > store sampled into '$output' using PigStorage('$delimiter'); >> > >> > I am taking input parameters as customized >> > pig -x mapreduce -f Sampling.pig -param input=foo.csv -param >> output=OUT/pig >> > -param delimiter="," -param fraction='0.05' >> > >> > I would like to do a modification in the same >> > I am trying to take my input as json >> > >> > sample json: >> > >> > >> *{"Name":"sampling","elementInfo":{"fraction":"3"},"destination":"/user/sree/OUT","source":"/user/sree/foo.txt"}* >> > >> > Now I need to parse the above json and take the needful params. >> > How to do the same >> > I know we can load json in apache pig but how to extract the needful from >> > the json >> > >> > from here I only need >> > fraction,destination,source >> > >> > Please suggest a way >> > >> > -- >> > *Thanks & Regards * >> > >> > >> > *Unmesha Sreeveni U.B* >> > *Hadoop, Bigdata Developer* >> > *Center for Cyber Security | Amrita Vishwa Vidyapeetham* >> > http://www.unmeshasreeveni.blogspot.in/ >> > >>