Got it building. Are google collections and json-simple external deps?
On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney <[email protected]>wrote: > It seems that everyone can build elephant-bird but me: > https://github.com/kevinweil/elephant-bird/issues/272 > > > On Sun, Nov 18, 2012 at 7:31 PM, Arian Pasquali > <[email protected]>wrote: > >> I dont think you really need to build it. >> you can find it at any maven repository. >> >> Arian Rodrigo Pasquali >> FEUP, SAPO Labs >> http://www.arianpasquali.com >> twitter @arianpasquali >> >> >> >> 2012/11/18 Arian Pasquali <[email protected]> >> >> > U dont need to build neither >> > Just download those two jar I used in my example. >> > >> > Arian >> > >> > Em domingo, 18 de novembro de 2012, Russell Jurney escreveu: >> > >> >> Thanks - looks like I don't have to specify the schema, which is good. >> >> >> >> I'll try and build elephant-bird. >> >> >> >> Russell Jurney http://datasyndrome.com >> >> >> >> On Nov 17, 2012, at 9:30 PM, Arian Pasquali <[email protected]> >> >> wrote: >> >> >> >> > keep calm >> >> > and use elephant-bird >> >> > https://github.com/kevinweil/elephant-bird< >> >> >> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java >> >> > >> >> > >> >> > I posted here yesterday an example how to load tweets in json >> >> > here goes again. I hope it helps. >> >> > >> >> > register 'elephant-bird-core-3.0.0.jar' >> >> > register 'elephant-bird-pig-3.0.0.jar' >> >> > register 'google-collections-1.0.jar' >> >> > register 'json-simple-1.1.jar' >> >> > >> >> > json_lines = LOAD >> >> > '/twitter_data/tweets/stream/v1/json/2012_10_10/08' USING >> >> > com.twitter.elephantbird.pig.load.JsonLoader(); >> >> > >> >> > geo_tweets = FOREACH json_lines GENERATE (CHARARRAY) $0#'id' AS >> >> > id, (CHARARRAY) $0#'geoLocation' AS geoLocation; >> >> > >> >> > only_not_nulls = FILTER geo_tweets BY geoLocation is not null; >> >> > store only_not_nulls into '/twitter_data/results/geo_tweets'; >> >> > >> >> > >> >> > >> >> > Arian Rodrigo Pasquali >> >> > FEUP, SAPO Labs >> >> > http://www.arianpasquali.com >> >> > twitter @arianpasquali >> >> > >> >> > >> >> > >> >> > 2012/11/18 Dan Young <[email protected]> >> >> > >> >> >> No sure if this helps, but in 0.11 I've been using this on EMR for >> >> some of >> >> >> our JSON data.... >> >> >> >> >> >> raw = load 'hdfs:///cleaned_logs/clicks2/$year_id/$month_id/part-*' >> >> USING >> >> >> >> >> >> >> >> >> JsonLoader('a:chararray,at:chararray,c1:(url:chararray,useragent:chararray,referrer:chararray,window:(innerheight:chararray,innerwidth:chararray,outerheight:chararray,outerwidth:chararray),resolution:(height:chararray,width:chararray)),cst:chararray,d:(a:chararray,b:chararray),i:chararray,id:chararray,ip:chararray,k:chararray,l:(lat:chararray,lng:chararray),p:chararray,pv:chararray,sa:chararray,sid:chararray,sst:chararray,t:chararray,uuid:chararray,v:chararray'); >> >> >> >> >> >> >> >> >> Regards, >> >> >> >> >> >> Dano >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Nov 17, 2012 at 3:09 PM, Russell Jurney < >> >> [email protected] >> >> >>> wrote: >> >> >> >> >> >>> I have some JSON data with a uniform schema. I want to load it in >> Pig. >> >> >>> JsonStorage doesn't work, because the data has no schema. >> >> >>> >> >> >>> How can I load JSON data in Pig? >> >> >>> >> >> >>> -- >> >> >>> Russell Jurney twitter.com/rjurney [email protected] >> >> >>> datasyndrome.com >> >> >>> >> >> >> >> >> >> > >> > >> > -- >> > Sent from Gmail Mobile >> > >> > > > > -- > Russell Jurney twitter.com/rjurney [email protected] datasyndrome. > com > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
