I've found Twitter's elephantbird library very useful here
(https://github.com/kevinweil/elephant-bird )

a = LOAD 'file3.json' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad')

Will parse the JSON into a map
http://pig.apache.org/docs/r0.11.1/basic.html#map-schema the JSONArray
gets parsed into a DataBag of maps.

cf. 
https://stackoverflow.com/questions/11035105/processing-json-through-pig-scripts/16501542#16501542

On Fri, Jul 25, 2014 at 4:55 PM, Satish Kolli <feedwo...@gmail.com> wrote:
> Did you try the standard JsonLoader? I didn't personally use it but it
> looks like you can specify the schema to extract/parse from your json.
>
> http://pig.apache.org/docs/r0.13.0/func.html#jsonloadstore
>
> If not, you can also look at the following example I found googling:
>
> https://gist.github.com/kimsterv/601331
>
>
> Thanks.
>
>
>
>
> On Fri, Jul 25, 2014 at 8:01 AM, praveenesh kumar <praveen...@gmail.com>
> wrote:
>
>> One simple way is to write a UDF that will act as Json parser. Load your
>> data and then call your UDF to parse and extract whatever you want from the
>> Json. You need to build what you want to get. Pig doesn't do that for you,
>> it gives you the capability to do that. How you do is upto you.
>>
>>
>> On Fri, Jul 25, 2014 at 12:09 PM, unmesha sreeveni <unmeshab...@gmail.com>
>> wrote:
>>
>> > Hi
>> >
>> > This is my code for sampling
>> >
>> > *--Load data*
>> > *inputdata = LOAD '$input' using PigStorage('$delimiter');*
>> >
>> > *--Group data*
>> > *groupedByAll = group inputdata all;*
>> >
>> > *--output into hdfs*
>> > *sampled = SAMPLE inputdata $fraction;*
>> > *store sampled into '$output' using PigStorage('$delimiter'); *
>> >
>> >  --Sampling.pig
>> > --pig -x mapreduce -f Sampling.pig -param input=foo.csv -param
>> > output=OUT/pig -param delimiter="," -param fraction='0.05'
>> >
>> > --Load data
>> > inputdata = LOAD '$input' using PigStorage('$delimiter');
>> >
>> > --Group data
>> > groupedByAll = group inputdata all;
>> >
>> > --output into hdfs
>> > sampled = SAMPLE inputdata $fraction;
>> > store sampled into '$output' using PigStorage('$delimiter');
>> >
>> > I am taking input parameters as customized
>> > pig -x mapreduce -f Sampling.pig -param input=foo.csv -param
>> output=OUT/pig
>> > -param delimiter="," -param fraction='0.05'
>> >
>> > I would like to do a modification in the same
>> > I am trying to take my input as json
>> >
>> > sample json:
>> >
>> >
>> *{"Name":"sampling","elementInfo":{"fraction":"3"},"destination":"/user/sree/OUT","source":"/user/sree/foo.txt"}*
>> >
>> > Now I need to parse the above json and take the needful params.
>> > How to do the same
>> > I know we can load json in apache pig but how to extract the needful from
>> > the json
>> >
>> > from here I only need
>> > fraction,destination,source
>> >
>> > Please suggest a way
>> >
>> > --
>> > *Thanks & Regards *
>> >
>> >
>> > *Unmesha Sreeveni U.B*
>> > *Hadoop, Bigdata Developer*
>> > *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>> > http://www.unmeshasreeveni.blogspot.in/
>> >
>>

Reply via email to