Try
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad')
This should allow access to nested object as nested map
($0#'level1#'level2'#'level3' …)
David
On Nov 21, 2012, at 12:56 AM, Saxifrage Cucvara
<[email protected]> wrote:
> I'm also experiencing problems working with JSON objects in Pig.
>
> I have managed to load in a log file in JSON format but only query the top
> level objects. Whenever I try to call anything that is nested it fails.
>
> -- Register JARS
> register elephant-bird-2.2.3.jar;
> register json-simple-1.1.jar;
>
> -- Load data
> nestobject = LOAD '/Users/Path/GoogleDrive/test.json'
> USING
> com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad=true')
> AS (json:map[]);
> DUMP nestobject;
>
> -- Example query
> tester = FOREACH nestobject GENERATE json#'event',json#'uid',
> json#'data'#'expired_reason' as reason;
> DUMP tester;
>
> The above fails ...
>
> Does anyone have any ideas?
>
> Thanks
>
> Sax
>
> On 20 November 2012 07:22, Deepak Tiwari <[email protected]> wrote:
>
>> I also ran into same dilemma..here is something that I found easier and
>> working for me .. I compiled some sources from http://www.json.org/java/
>>
>>
>> import java.io.IOException;
>> import java.io.UnsupportedEncodingException;
>> import java.util.List;
>>
>> import org.apache.pig.EvalFunc;
>> import org.apache.pig.data.Tuple;
>> import org.apache.pig.data.TupleFactory;
>> import org.json.JSONArray;
>> import org.json.JSONException;
>> import org.json.JSONObject;
>>
>>
>> public class JsonParser extends EvalFunc<Tuple> {
>> @Override
>> public Tuple exec(Tuple input) throws IOException {
>> TupleFactory tf = TupleFactory.getInstance();
>> Tuple t = tf.newTuple();
>>
>>
>> if ( input.get(0) != null ){
>> String inString = (String) input.get(0);
>> try {
>> JSONObject jsn = new JSONObject(inString);
>> t.append(getJsonArr(jsn));
>> } catch (JSONException e) {
>>
>> e.printStackTrace();
>>
>> }
>> }
>> return t;
>> }
>>
>> private String getJsonArr(JSONObject jsn) {
>> String jsnArrVal = "";
>>
>> try {
>> if (!jsn.has("jsonKey"))
>> return null;
>> JSONArray jTagArray = jsn.getJSONArray("jsonKey");
>> for (int i=0; i<jTagArray.length(); i++){
>> JSONObject hst = jTagArray.getJSONObject(i);
>> String jsnArrVal = hst.getString("text") + jsnArrVal;
>> }
>> } catch (JSONException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> }
>> return jsnArrVal;
>> }
>> }
>>
>>
>> On Mon, Nov 19, 2012 at 11:35 AM, Russell Jurney
>> <[email protected]>wrote:
>>
>>> Ok, its even worse. My data is a big array.
>>>
>>> Am I being negative in saying that JSON and Pig is like a nightmare?
>>>
>>>
>>> On Mon, Nov 19, 2012 at 2:33 PM, Russell Jurney <
>> [email protected]
>>>> wrote:
>>>
>>>> Wait... com.twitter.elephantbird.pig.load.JsonLoader() does not infer
>> the
>>>> schema from a record. This is what I was looking for. Looks like I have
>>> to
>>>> write that myself.
>>>>
>>>> And yes, I understand the tradeoffs in doing so. Assuming a sample is
>> the
>>>> overall schema is a big assumption.
>>>>
>>>>
>>>>
>>>> On Mon, Nov 19, 2012 at 2:30 PM, Russell Jurney <
>>> [email protected]>wrote:
>>>>
>>>>> Talking to myself... never mind, guava and json-simple are included
>> with
>>>>> Pig.
>>>>>
>>>>>
>>>>> On Mon, Nov 19, 2012 at 2:27 PM, Russell Jurney <
>>> [email protected]
>>>>>> wrote:
>>>>>
>>>>>> Got it building. Are google collections and json-simple external
>> deps?
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> It seems that everyone can build elephant-bird but me:
>>>>>>> https://github.com/kevinweil/elephant-bird/issues/272
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Nov 18, 2012 at 7:31 PM, Arian Pasquali <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I dont think you really need to build it.
>>>>>>>> you can find it at any maven repository.
>>>>>>>>
>>>>>>>> Arian Rodrigo Pasquali
>>>>>>>> FEUP, SAPO Labs
>>>>>>>> http://www.arianpasquali.com
>>>>>>>> twitter @arianpasquali
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2012/11/18 Arian Pasquali <[email protected]>
>>>>>>>>
>>>>>>>>> U dont need to build neither
>>>>>>>>> Just download those two jar I used in my example.
>>>>>>>>>
>>>>>>>>> Arian
>>>>>>>>>
>>>>>>>>> Em domingo, 18 de novembro de 2012, Russell Jurney escreveu:
>>>>>>>>>
>>>>>>>>>> Thanks - looks like I don't have to specify the schema, which is
>>>>>>>> good.
>>>>>>>>>>
>>>>>>>>>> I'll try and build elephant-bird.
>>>>>>>>>>
>>>>>>>>>> Russell Jurney http://datasyndrome.com
>>>>>>>>>>
>>>>>>>>>> On Nov 17, 2012, at 9:30 PM, Arian Pasquali <
>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> keep calm
>>>>>>>>>>> and use elephant-bird
>>>>>>>>>>> https://github.com/kevinweil/elephant-bird<
>>>>>>>>>>
>>>>>>>>
>>>
>> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I posted here yesterday an example how to load tweets in json
>>>>>>>>>>> here goes again. I hope it helps.
>>>>>>>>>>>
>>>>>>>>>>> register 'elephant-bird-core-3.0.0.jar'
>>>>>>>>>>> register 'elephant-bird-pig-3.0.0.jar'
>>>>>>>>>>> register 'google-collections-1.0.jar'
>>>>>>>>>>> register 'json-simple-1.1.jar'
>>>>>>>>>>>
>>>>>>>>>>> json_lines = LOAD
>>>>>>>>>>> '/twitter_data/tweets/stream/v1/json/2012_10_10/08' USING
>>>>>>>>>>> com.twitter.elephantbird.pig.load.JsonLoader();
>>>>>>>>>>>
>>>>>>>>>>> geo_tweets = FOREACH json_lines GENERATE (CHARARRAY)
>> $0#'id'
>>> AS
>>>>>>>>>>> id, (CHARARRAY) $0#'geoLocation' AS geoLocation;
>>>>>>>>>>>
>>>>>>>>>>> only_not_nulls = FILTER geo_tweets BY geoLocation is not
>>> null;
>>>>>>>>>>> store only_not_nulls into
>> '/twitter_data/results/geo_tweets';
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Arian Rodrigo Pasquali
>>>>>>>>>>> FEUP, SAPO Labs
>>>>>>>>>>> http://www.arianpasquali.com
>>>>>>>>>>> twitter @arianpasquali
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2012/11/18 Dan Young <[email protected]>
>>>>>>>>>>>
>>>>>>>>>>>> No sure if this helps, but in 0.11 I've been using this on
>> EMR
>>>>>>>> for
>>>>>>>>>> some of
>>>>>>>>>>>> our JSON data....
>>>>>>>>>>>>
>>>>>>>>>>>> raw = load
>>>>>>>> 'hdfs:///cleaned_logs/clicks2/$year_id/$month_id/part-*'
>>>>>>>>>> USING
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>
>> JsonLoader('a:chararray,at:chararray,c1:(url:chararray,useragent:chararray,referrer:chararray,window:(innerheight:chararray,innerwidth:chararray,outerheight:chararray,outerwidth:chararray),resolution:(height:chararray,width:chararray)),cst:chararray,d:(a:chararray,b:chararray),i:chararray,id:chararray,ip:chararray,k:chararray,l:(lat:chararray,lng:chararray),p:chararray,pv:chararray,sa:chararray,sid:chararray,sst:chararray,t:chararray,uuid:chararray,v:chararray');
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Dano
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Nov 17, 2012 at 3:09 PM, Russell Jurney <
>>>>>>>>>> [email protected]
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I have some JSON data with a uniform schema. I want to load
>> it
>>>>>>>> in Pig.
>>>>>>>>>>>>> JsonStorage doesn't work, because the data has no schema.
>>>>>>>>>>>>>
>>>>>>>>>>>>> How can I load JSON data in Pig?
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Russell Jurney twitter.com/rjurney [email protected]
>>>>>>>>>>>>> datasyndrome.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sent from Gmail Mobile
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Russell Jurney twitter.com/rjurney [email protected]
>>>>>>> datasyndrome.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Russell Jurney twitter.com/[email protected]
>>>>>> .com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Russell Jurney twitter.com/[email protected].
>>>>> com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Russell Jurney twitter.com/[email protected].
>>>> com
>>>>
>>>
>>>
>>>
>>> --
>>> Russell Jurney twitter.com/rjurney [email protected]
>>> datasyndrome.com
>>>
>>
>
>
>
> --
> *Saxifrage Cucvara*
> Senior Data Analyst
>
> [image: JBA Digital] <http://www.jbadigital.com/>
> *JBA Online Consultancy*
>
> E: [email protected]
> M: +61 424 622 534
> W: www.jbadigital.com
> A: Level 6, 69 Reservoir Street, Surry Hills NSW 2010
>
> The information contained in this email is confidential and is intended for
> the use of the individual or entity named above. If the receiver of this
> message is not the intended recipient, you are hereby notified that any
> dissemination, distribution or copy of this email is strictly prohibited.
> If you have received this e-mail in error, please notify our office by
> telephone. JB/A and their employees do not represent that this transmission
> is free from viruses or other defects and you should see it as your
> responsibility to check for viruses and defects. JB/A disclaims any
> liability to any person for loss or damage resulting (directly or
> indirectly) from the receipt of electronic mail (including enclosures).