I got it. We can use Python and Jruby. On Sun, Nov 13, 2016 at 12:35 PM, Ajay Tirpude <tirpudeaj...@gmail.com> wrote:
> Hi Satya, > > Thanks I have already started checking JSON Serde. Lets see if it works. > By the way can we write UDFs in Python/Ruby? > > Regards, > Ajay T > > On Sun, Nov 13, 2016 at 12:30 PM, Satya Harish Appana < > satyaharish.app...@gmail.com> wrote: > >> You can use these >> >> *Json Serde: *https://github.com/rcongiu/Hive-JSON-Serde >> >> or else you can write a hive udtf, (Eg: http://beekeeperdata.com/ >> posts/hadoop/2015/07/26/Hive-UDTF-Tutorial.html) >> >> >> >> On Sun, Nov 13, 2016 at 12:22 PM, Ajay Tirpude <tirpudeaj...@gmail.com> >> wrote: >> >>> Hi Dudu, >>> >>> I want to parse my json file and get the desired output in csv file that >>> I pasted in the output section. Currently I am able to achieve this using >>> bash(jq command) but that is not an answer for json files that are in TBs. >>> So I am looking for a solution in PIG or HIVE. >>> >>> Regards, >>> Ajay T >>> >>> On Sun, Nov 13, 2016 at 12:10 PM, Markovitz, Dudu <dmarkov...@paypal.com >>> > wrote: >>> >>>> And your issue/question is? >>>> >>>> >>>> >>>> *From:* Ajay Tirpude [mailto:tirpudeaj...@gmail.com] >>>> *Sent:* Sunday, November 13, 2016 4:46 AM >>>> *To:* user@hive.apache.org >>>> *Subject:* Nested JSON Parsing >>>> >>>> >>>> >>>> Dear All, >>>> >>>> >>>> >>>> I am trying to parse this json file given below and my intention is to >>>> convert this json file into a csv. >>>> >>>> >>>> >>>> *{* >>>> >>>> * "devicetype": "SmartPhone",* >>>> >>>> * "uuid": "sg76fdhh7gfxhxfhgxf67x",* >>>> >>>> * "ts": {* >>>> >>>> * "date": "2016-03-23T10:58:34.660Z"* >>>> >>>> * },* >>>> >>>> * "events": [* >>>> >>>> * {* >>>> >>>> * "timestamp": "2016-03-23T10:58:37Z",* >>>> >>>> * "evt": "first",* >>>> >>>> * "ad": "v6v75v88n98778mn",* >>>> >>>> * "tkey": "ngbbc76fbc6fb6fb66fb6",* >>>> >>>> * "mtp": "Wed Mar 23 2016 19:04:22 GMT 0800 (PHT)",* >>>> >>>> * "eventid": "eytuy"* >>>> >>>> * },* >>>> >>>> * {* >>>> >>>> * "timestamp": "2016-03-23T10:58:35Z",* >>>> >>>> * "evt": "second",* >>>> >>>> * "ad": "v6v75v88n98778mn",* >>>> >>>> * "tkey": "ngbbc76fbc6fb6fb66fb6"* >>>> >>>> * },* >>>> >>>> * {* >>>> >>>> * "timestamp": "2016-03-23T10:58:36Z",* >>>> >>>> * "evt": "third",* >>>> >>>> * "ad": "v6v75v88n98778mn",* >>>> >>>> * "tkey": "ngbbc76fbc6fb6fb66fb6"* >>>> >>>> * }* >>>> >>>> * ],* >>>> >>>> * "adid": "v6v75v88n98778mn",* >>>> >>>> * "ad_tz": {* >>>> >>>> * "date": "2016-03-23T10:58:34.660Z"* >>>> >>>> * },* >>>> >>>> * "ua": "Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 >>>> Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile >>>> Safari/534.30"* >>>> >>>> *}* >>>> >>>> >>>> >>>> There are few conditions that I need to apply before I parse >>>> >>>> >>>> >>>> 1. I want to get all the fields except timestamp inside events nested >>>> key. >>>> >>>> 2. I want to loop events key for each evt. In above input file there >>>> are three evts but that would not fixed in the actual input file. There can >>>> be multiple evts and not just 3. >>>> >>>> 3. Not every evt block is similar. You can have different extra field >>>> in each evt block but we need to extract every key. In case we don't have >>>> key in one evt then the value should be blank for that env. For example for >>>> evt: first we have two extra key value pair i.,e, eventid/mtp and these >>>> value should be blank for other evts. Similarly we can have some key:value >>>> in other evts as well so that other key:values should be blank in other >>>> evts. >>>> >>>> >>>> >>>> At last I want the output to be like this >>>> >>>> >>>> >>>> devicetype >>>> >>>> uuid >>>> >>>> ts.date >>>> >>>> events.evt >>>> >>>> events.ad >>>> >>>> events.tkey >>>> >>>> events.mtp >>>> >>>> events.eventid >>>> >>>> adid >>>> >>>> ad_tz.date >>>> >>>> ua >>>> >>>> SmartPhone >>>> >>>> sg76fdhh7gfxhxfhgxf67x >>>> >>>> 2016-03-23T10:58:34.660Z >>>> >>>> first >>>> >>>> v6v75v88n98778mn >>>> >>>> ngbbc76fbc6fb6fb66fb6 >>>> >>>> Wed Mar 23 2016 19:04:22 GMT 0800 (PHT) >>>> >>>> eytuy >>>> >>>> v6v75v88n98778mn >>>> >>>> 2016-03-23T10:58:34.660Z >>>> >>>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) >>>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 >>>> >>>> SmartPhone >>>> >>>> sg76fdhh7gfxhxfhgxf67x >>>> >>>> 2016-03-23T10:58:34.660Z >>>> >>>> second >>>> >>>> v6v75v88n98778mn >>>> >>>> ngbbc76fbc6fb6fb66fb6 >>>> >>>> v6v75v88n98778mn >>>> >>>> 2016-03-23T10:58:34.660Z >>>> >>>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) >>>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 >>>> >>>> SmartPhone >>>> >>>> sg76fdhh7gfxhxfhgxf67x >>>> >>>> 2016-03-23T10:58:34.660Z >>>> >>>> third >>>> >>>> v6v75v88n98778mn >>>> >>>> ngbbc76fbc6fb6fb66fb6 >>>> >>>> >>>> >>>> >>>> >>>> v6v75v88n98778mn >>>> >>>> 2016-03-23T10:58:34.660Z >>>> >>>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) >>>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 >>>> >>>> >>>> >>>> Regards, >>>> >>>> Ajay T >>>> >>> >>> >> >> >> -- >> >> >> Regards, >> Satya Harish Appana, >> Software Development Engineer II, >> Flipkart,Bangalore, >> Ph:+91-9538797174. >> > >