Try

com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad')
This should allow access to nested object as nested map 
($0#'level1#'level2'#'level3' …)

David

On Nov 21, 2012, at 12:56 AM, Saxifrage Cucvara 
<[email protected]> wrote:

> I'm also experiencing problems working with JSON objects in Pig.
> 
> I have managed to load in a log file in JSON format but only query the top
> level objects.  Whenever I try to call anything that is nested it fails.
> 
> -- Register JARS
> register elephant-bird-2.2.3.jar;
> register json-simple-1.1.jar;
> 
> -- Load data
> nestobject = LOAD '/Users/Path/GoogleDrive/test.json'
>        USING
> com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad=true')
>        AS (json:map[]);
> DUMP nestobject;
> 
> -- Example query
> tester = FOREACH nestobject GENERATE json#'event',json#'uid',
> json#'data'#'expired_reason' as reason;
> DUMP tester;
> 
> The above fails ...
> 
> Does anyone have any ideas?
> 
> Thanks
> 
> Sax
> 
> On 20 November 2012 07:22, Deepak Tiwari <[email protected]> wrote:
> 
>> I also ran into same dilemma..here is something that I found easier and
>> working for me .. I compiled some sources from http://www.json.org/java/
>> 
>> 
>> import java.io.IOException;
>> import java.io.UnsupportedEncodingException;
>> import java.util.List;
>> 
>> import org.apache.pig.EvalFunc;
>> import org.apache.pig.data.Tuple;
>> import org.apache.pig.data.TupleFactory;
>> import org.json.JSONArray;
>> import org.json.JSONException;
>> import org.json.JSONObject;
>> 
>> 
>> public class JsonParser extends EvalFunc<Tuple> {
>>    @Override
>>    public Tuple exec(Tuple input) throws IOException {
>>        TupleFactory tf = TupleFactory.getInstance();
>>        Tuple t = tf.newTuple();
>> 
>> 
>>        if ( input.get(0) != null ){
>>            String inString = (String) input.get(0);
>>            try {
>>                JSONObject jsn = new JSONObject(inString);
>>                t.append(getJsonArr(jsn));
>>                    } catch (JSONException e) {
>> 
>>                e.printStackTrace();
>> 
>>            }
>>        }
>>        return t;
>>    }
>> 
>>    private String getJsonArr(JSONObject jsn) {
>>        String jsnArrVal = "";
>> 
>>        try {
>>            if (!jsn.has("jsonKey"))
>>                return null;
>>            JSONArray jTagArray = jsn.getJSONArray("jsonKey");
>>            for (int i=0; i<jTagArray.length(); i++){
>>                JSONObject hst = jTagArray.getJSONObject(i);
>>                String jsnArrVal = hst.getString("text") + jsnArrVal;
>>            }
>>        } catch (JSONException e) {
>>            // TODO Auto-generated catch block
>>            e.printStackTrace();
>>        }
>>        return jsnArrVal;
>>    }
>> }
>> 
>> 
>> On Mon, Nov 19, 2012 at 11:35 AM, Russell Jurney
>> <[email protected]>wrote:
>> 
>>> Ok, its even worse. My data is a big array.
>>> 
>>> Am I being negative in saying that JSON and Pig is like a nightmare?
>>> 
>>> 
>>> On Mon, Nov 19, 2012 at 2:33 PM, Russell Jurney <
>> [email protected]
>>>> wrote:
>>> 
>>>> Wait... com.twitter.elephantbird.pig.load.JsonLoader() does not infer
>> the
>>>> schema from a record. This is what I was looking for. Looks like I have
>>> to
>>>> write that myself.
>>>> 
>>>> And yes, I understand the tradeoffs in doing so. Assuming a sample is
>> the
>>>> overall schema is a big assumption.
>>>> 
>>>> 
>>>> 
>>>> On Mon, Nov 19, 2012 at 2:30 PM, Russell Jurney <
>>> [email protected]>wrote:
>>>> 
>>>>> Talking to myself... never mind, guava and json-simple are included
>> with
>>>>> Pig.
>>>>> 
>>>>> 
>>>>> On Mon, Nov 19, 2012 at 2:27 PM, Russell Jurney <
>>> [email protected]
>>>>>> wrote:
>>>>> 
>>>>>> Got it building. Are google collections and json-simple external
>> deps?
>>>>>> 
>>>>>> 
>>>>>> On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>> It seems that everyone can build elephant-bird but me:
>>>>>>> https://github.com/kevinweil/elephant-bird/issues/272
>>>>>>> 
>>>>>>> 
>>>>>>> On Sun, Nov 18, 2012 at 7:31 PM, Arian Pasquali <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>>> I dont think you really need to build it.
>>>>>>>> you can find it at any maven repository.
>>>>>>>> 
>>>>>>>> Arian Rodrigo Pasquali
>>>>>>>> FEUP, SAPO Labs
>>>>>>>> http://www.arianpasquali.com
>>>>>>>> twitter @arianpasquali
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2012/11/18 Arian Pasquali <[email protected]>
>>>>>>>> 
>>>>>>>>> U dont need to build neither
>>>>>>>>> Just download those two jar I used in my example.
>>>>>>>>> 
>>>>>>>>> Arian
>>>>>>>>> 
>>>>>>>>> Em domingo, 18 de novembro de 2012, Russell Jurney escreveu:
>>>>>>>>> 
>>>>>>>>>> Thanks - looks like I don't have to specify the schema, which is
>>>>>>>> good.
>>>>>>>>>> 
>>>>>>>>>> I'll try and build elephant-bird.
>>>>>>>>>> 
>>>>>>>>>> Russell Jurney http://datasyndrome.com
>>>>>>>>>> 
>>>>>>>>>> On Nov 17, 2012, at 9:30 PM, Arian Pasquali <
>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> keep calm
>>>>>>>>>>> and use elephant-bird
>>>>>>>>>>> https://github.com/kevinweil/elephant-bird<
>>>>>>>>>> 
>>>>>>>> 
>>> 
>> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I posted here yesterday an example how to load tweets in json
>>>>>>>>>>> here goes again. I hope it helps.
>>>>>>>>>>> 
>>>>>>>>>>> register 'elephant-bird-core-3.0.0.jar'
>>>>>>>>>>>   register 'elephant-bird-pig-3.0.0.jar'
>>>>>>>>>>>   register 'google-collections-1.0.jar'
>>>>>>>>>>>   register 'json-simple-1.1.jar'
>>>>>>>>>>> 
>>>>>>>>>>>   json_lines = LOAD
>>>>>>>>>>> '/twitter_data/tweets/stream/v1/json/2012_10_10/08' USING
>>>>>>>>>>> com.twitter.elephantbird.pig.load.JsonLoader();
>>>>>>>>>>> 
>>>>>>>>>>>   geo_tweets = FOREACH json_lines GENERATE (CHARARRAY)
>> $0#'id'
>>> AS
>>>>>>>>>>> id, (CHARARRAY) $0#'geoLocation' AS geoLocation;
>>>>>>>>>>> 
>>>>>>>>>>>   only_not_nulls = FILTER geo_tweets BY geoLocation is not
>>> null;
>>>>>>>>>>>   store only_not_nulls into
>> '/twitter_data/results/geo_tweets';
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Arian Rodrigo Pasquali
>>>>>>>>>>> FEUP, SAPO Labs
>>>>>>>>>>> http://www.arianpasquali.com
>>>>>>>>>>> twitter @arianpasquali
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2012/11/18 Dan Young <[email protected]>
>>>>>>>>>>> 
>>>>>>>>>>>> No sure if this helps, but in 0.11 I've been using this on
>> EMR
>>>>>>>> for
>>>>>>>>>> some of
>>>>>>>>>>>> our JSON data....
>>>>>>>>>>>> 
>>>>>>>>>>>> raw = load
>>>>>>>> 'hdfs:///cleaned_logs/clicks2/$year_id/$month_id/part-*'
>>>>>>>>>> USING
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>> 
>> JsonLoader('a:chararray,at:chararray,c1:(url:chararray,useragent:chararray,referrer:chararray,window:(innerheight:chararray,innerwidth:chararray,outerheight:chararray,outerwidth:chararray),resolution:(height:chararray,width:chararray)),cst:chararray,d:(a:chararray,b:chararray),i:chararray,id:chararray,ip:chararray,k:chararray,l:(lat:chararray,lng:chararray),p:chararray,pv:chararray,sa:chararray,sid:chararray,sst:chararray,t:chararray,uuid:chararray,v:chararray');
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> 
>>>>>>>>>>>> Dano
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Sat, Nov 17, 2012 at 3:09 PM, Russell Jurney <
>>>>>>>>>> [email protected]
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I have some JSON data with a uniform schema. I want to load
>> it
>>>>>>>> in Pig.
>>>>>>>>>>>>> JsonStorage doesn't work, because the data has no schema.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> How can I load JSON data in Pig?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Russell Jurney twitter.com/rjurney [email protected]
>>>>>>>>>>>>> datasyndrome.com
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Sent from Gmail Mobile
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Russell Jurney twitter.com/rjurney [email protected]
>>>>>>> datasyndrome.com
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Russell Jurney twitter.com/[email protected]
>>>>>> .com
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Russell Jurney twitter.com/[email protected].
>>>>> com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Russell Jurney twitter.com/[email protected].
>>>> com
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Russell Jurney twitter.com/rjurney [email protected]
>>> datasyndrome.com
>>> 
>> 
> 
> 
> 
> -- 
> *Saxifrage Cucvara*
> Senior Data Analyst
> 
> [image: JBA Digital] <http://www.jbadigital.com/>
> *JBA Online Consultancy*
> 
> E: [email protected]
> M: +61 424 622 534
> W: www.jbadigital.com
> A:  Level 6, 69 Reservoir Street, Surry Hills NSW 2010
> 
> The information contained in this email is confidential and is intended for
> the use of the individual or entity named above. If the receiver of this
> message is not the intended recipient, you are hereby notified that any
> dissemination, distribution or copy of this email is strictly prohibited.
> If you have received this e-mail in error, please notify our office by
> telephone. JB/A and their employees do not represent that this transmission
> is free from viruses or other defects and you should see it as your
> responsibility to check for viruses and defects. JB/A disclaims any
> liability to any person for loss or damage resulting (directly or
> indirectly) from the receipt of electronic mail (including enclosures).

Reply via email to