Ok, its even worse. My data is a big array. Am I being negative in saying that JSON and Pig is like a nightmare?
On Mon, Nov 19, 2012 at 2:33 PM, Russell Jurney <[email protected]>wrote: > Wait... com.twitter.elephantbird.pig.load.JsonLoader() does not infer the > schema from a record. This is what I was looking for. Looks like I have to > write that myself. > > And yes, I understand the tradeoffs in doing so. Assuming a sample is the > overall schema is a big assumption. > > > > On Mon, Nov 19, 2012 at 2:30 PM, Russell Jurney > <[email protected]>wrote: > >> Talking to myself... never mind, guava and json-simple are included with >> Pig. >> >> >> On Mon, Nov 19, 2012 at 2:27 PM, Russell Jurney <[email protected] >> > wrote: >> >>> Got it building. Are google collections and json-simple external deps? >>> >>> >>> On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney < >>> [email protected]> wrote: >>> >>>> It seems that everyone can build elephant-bird but me: >>>> https://github.com/kevinweil/elephant-bird/issues/272 >>>> >>>> >>>> On Sun, Nov 18, 2012 at 7:31 PM, Arian Pasquali < >>>> [email protected]> wrote: >>>> >>>>> I dont think you really need to build it. >>>>> you can find it at any maven repository. >>>>> >>>>> Arian Rodrigo Pasquali >>>>> FEUP, SAPO Labs >>>>> http://www.arianpasquali.com >>>>> twitter @arianpasquali >>>>> >>>>> >>>>> >>>>> 2012/11/18 Arian Pasquali <[email protected]> >>>>> >>>>> > U dont need to build neither >>>>> > Just download those two jar I used in my example. >>>>> > >>>>> > Arian >>>>> > >>>>> > Em domingo, 18 de novembro de 2012, Russell Jurney escreveu: >>>>> > >>>>> >> Thanks - looks like I don't have to specify the schema, which is >>>>> good. >>>>> >> >>>>> >> I'll try and build elephant-bird. >>>>> >> >>>>> >> Russell Jurney http://datasyndrome.com >>>>> >> >>>>> >> On Nov 17, 2012, at 9:30 PM, Arian Pasquali < >>>>> [email protected]> >>>>> >> wrote: >>>>> >> >>>>> >> > keep calm >>>>> >> > and use elephant-bird >>>>> >> > https://github.com/kevinweil/elephant-bird< >>>>> >> >>>>> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java >>>>> >> > >>>>> >> > >>>>> >> > I posted here yesterday an example how to load tweets in json >>>>> >> > here goes again. I hope it helps. >>>>> >> > >>>>> >> > register 'elephant-bird-core-3.0.0.jar' >>>>> >> > register 'elephant-bird-pig-3.0.0.jar' >>>>> >> > register 'google-collections-1.0.jar' >>>>> >> > register 'json-simple-1.1.jar' >>>>> >> > >>>>> >> > json_lines = LOAD >>>>> >> > '/twitter_data/tweets/stream/v1/json/2012_10_10/08' USING >>>>> >> > com.twitter.elephantbird.pig.load.JsonLoader(); >>>>> >> > >>>>> >> > geo_tweets = FOREACH json_lines GENERATE (CHARARRAY) $0#'id' AS >>>>> >> > id, (CHARARRAY) $0#'geoLocation' AS geoLocation; >>>>> >> > >>>>> >> > only_not_nulls = FILTER geo_tweets BY geoLocation is not null; >>>>> >> > store only_not_nulls into '/twitter_data/results/geo_tweets'; >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > Arian Rodrigo Pasquali >>>>> >> > FEUP, SAPO Labs >>>>> >> > http://www.arianpasquali.com >>>>> >> > twitter @arianpasquali >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > 2012/11/18 Dan Young <[email protected]> >>>>> >> > >>>>> >> >> No sure if this helps, but in 0.11 I've been using this on EMR >>>>> for >>>>> >> some of >>>>> >> >> our JSON data.... >>>>> >> >> >>>>> >> >> raw = load >>>>> 'hdfs:///cleaned_logs/clicks2/$year_id/$month_id/part-*' >>>>> >> USING >>>>> >> >> >>>>> >> >> >>>>> >> >>>>> JsonLoader('a:chararray,at:chararray,c1:(url:chararray,useragent:chararray,referrer:chararray,window:(innerheight:chararray,innerwidth:chararray,outerheight:chararray,outerwidth:chararray),resolution:(height:chararray,width:chararray)),cst:chararray,d:(a:chararray,b:chararray),i:chararray,id:chararray,ip:chararray,k:chararray,l:(lat:chararray,lng:chararray),p:chararray,pv:chararray,sa:chararray,sid:chararray,sst:chararray,t:chararray,uuid:chararray,v:chararray'); >>>>> >> >> >>>>> >> >> >>>>> >> >> Regards, >>>>> >> >> >>>>> >> >> Dano >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> On Sat, Nov 17, 2012 at 3:09 PM, Russell Jurney < >>>>> >> [email protected] >>>>> >> >>> wrote: >>>>> >> >> >>>>> >> >>> I have some JSON data with a uniform schema. I want to load it >>>>> in Pig. >>>>> >> >>> JsonStorage doesn't work, because the data has no schema. >>>>> >> >>> >>>>> >> >>> How can I load JSON data in Pig? >>>>> >> >>> >>>>> >> >>> -- >>>>> >> >>> Russell Jurney twitter.com/rjurney [email protected] >>>>> >> >>> datasyndrome.com >>>>> >> >>> >>>>> >> >> >>>>> >> >>>>> > >>>>> > >>>>> > -- >>>>> > Sent from Gmail Mobile >>>>> > >>>>> >>>> >>>> >>>> >>>> -- >>>> Russell Jurney twitter.com/rjurney [email protected] >>>> datasyndrome.com >>>> >>> >>> >>> >>> -- >>> Russell Jurney twitter.com/rjurney [email protected] datasyndrome >>> .com >>> >> >> >> >> -- >> Russell Jurney twitter.com/rjurney [email protected] datasyndrome. >> com >> > > > > -- > Russell Jurney twitter.com/rjurney [email protected] datasyndrome. > com > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
