Hi Dudu, I want to parse my json file and get the desired output in csv file that I pasted in the output section. Currently I am able to achieve this using bash(jq command) but that is not an answer for json files that are in TBs. So I am looking for a solution in PIG or HIVE.
Regards, Ajay T On Sun, Nov 13, 2016 at 12:10 PM, Markovitz, Dudu <dmarkov...@paypal.com> wrote: > And your issue/question is? > > > > *From:* Ajay Tirpude [mailto:tirpudeaj...@gmail.com] > *Sent:* Sunday, November 13, 2016 4:46 AM > *To:* user@hive.apache.org > *Subject:* Nested JSON Parsing > > > > Dear All, > > > > I am trying to parse this json file given below and my intention is to > convert this json file into a csv. > > > > *{* > > * "devicetype": "SmartPhone",* > > * "uuid": "sg76fdhh7gfxhxfhgxf67x",* > > * "ts": {* > > * "date": "2016-03-23T10:58:34.660Z"* > > * },* > > * "events": [* > > * {* > > * "timestamp": "2016-03-23T10:58:37Z",* > > * "evt": "first",* > > * "ad": "v6v75v88n98778mn",* > > * "tkey": "ngbbc76fbc6fb6fb66fb6",* > > * "mtp": "Wed Mar 23 2016 19:04:22 GMT 0800 (PHT)",* > > * "eventid": "eytuy"* > > * },* > > * {* > > * "timestamp": "2016-03-23T10:58:35Z",* > > * "evt": "second",* > > * "ad": "v6v75v88n98778mn",* > > * "tkey": "ngbbc76fbc6fb6fb66fb6"* > > * },* > > * {* > > * "timestamp": "2016-03-23T10:58:36Z",* > > * "evt": "third",* > > * "ad": "v6v75v88n98778mn",* > > * "tkey": "ngbbc76fbc6fb6fb66fb6"* > > * }* > > * ],* > > * "adid": "v6v75v88n98778mn",* > > * "ad_tz": {* > > * "date": "2016-03-23T10:58:34.660Z"* > > * },* > > * "ua": "Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 > Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile > Safari/534.30"* > > *}* > > > > There are few conditions that I need to apply before I parse > > > > 1. I want to get all the fields except timestamp inside events nested key. > > 2. I want to loop events key for each evt. In above input file there are > three evts but that would not fixed in the actual input file. There can be > multiple evts and not just 3. > > 3. Not every evt block is similar. You can have different extra field in > each evt block but we need to extract every key. In case we don't have key > in one evt then the value should be blank for that env. For example for > evt: first we have two extra key value pair i.,e, eventid/mtp and these > value should be blank for other evts. Similarly we can have some key:value > in other evts as well so that other key:values should be blank in other > evts. > > > > At last I want the output to be like this > > > > devicetype > > uuid > > ts.date > > events.evt > > events.ad > > events.tkey > > events.mtp > > events.eventid > > adid > > ad_tz.date > > ua > > SmartPhone > > sg76fdhh7gfxhxfhgxf67x > > 2016-03-23T10:58:34.660Z > > first > > v6v75v88n98778mn > > ngbbc76fbc6fb6fb66fb6 > > Wed Mar 23 2016 19:04:22 GMT 0800 (PHT) > > eytuy > > v6v75v88n98778mn > > 2016-03-23T10:58:34.660Z > > Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) > AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 > > SmartPhone > > sg76fdhh7gfxhxfhgxf67x > > 2016-03-23T10:58:34.660Z > > second > > v6v75v88n98778mn > > ngbbc76fbc6fb6fb66fb6 > > v6v75v88n98778mn > > 2016-03-23T10:58:34.660Z > > Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) > AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 > > SmartPhone > > sg76fdhh7gfxhxfhgxf67x > > 2016-03-23T10:58:34.660Z > > third > > v6v75v88n98778mn > > ngbbc76fbc6fb6fb66fb6 > > > > > > v6v75v88n98778mn > > 2016-03-23T10:58:34.660Z > > Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) > AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 > > > > Regards, > > Ajay T >