initial = LOAD 'some_file.lzo' USING
com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
AS (col1, col2, col3, json_data*:chararray*);or map = FOREACH initial GENERATE com.twitter.elephantbird.pig.** piggybank.JsonStringToMap((chararray) **json_data) AS mapped_json_data; extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS type; On Tue, Sep 13, 2011 at 8:51 AM, Eli Finkelshteyn <[email protected]>wrote: > Correction: I forgot to run the JsonStringToMap function when writing my > last email, when I run that, I get the same error as before > (*org.apache.pig.data.**DataByteArray cannot be cast to > java.lang.String*). > > My full workflow is as follows: > > > initial = LOAD 'some_file.lzo' USING > com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t') > AS (col1, col2, col3, json_data); > map = FOREACH initial GENERATE com.twitter.elephantbird.pig.** > piggybank.JsonStringToMap(**json_data) AS mapped_json_data; > extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS > type; > dump extracted; > > Any ideas? > > Eli > > > On 9/13/11 11:20 AM, Eli Finkelshteyn wrote: > >> Well, it's not throwing me errors anymore. Now it's just discarding the >> field. When I run it on two records where I've verified a field exists in >> the json, I get: >> >> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s). >> >> More specifically, my json is of the following form: >> >> {"foo":0,"bar":"hi"} >> >> On that, I'm running: >> >> initial = LOAD 'some_file.lzo' USING >> com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t') >> AS (col1, col2, col3, json_data); >> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS type; >> dump extracted; >> >> Which gives me the above warning along with: >> >> () >> () >> >> I also tried it without the cast to chararray, but received the same >> results. Should I be casting json_data as some other data type when I load >> it initially? Seems by default it's cast to a bytearray when I describe >> initial. Would that be a problem? >> >> Thanks for all the help so far! >> >> Eli >> >> >> >> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote: >> >>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9, >>> theoretically). >>> The values are bytearrays. You are probably trying to treat them as >>> strings. >>> You have to do stuff like this: >>> >>> x = foreach myrelation generate >>> (chararray) mymap#'foo' as foo, >>> (chararray) mymap#'bar' as bar; >>> >>> >>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[email protected]> >>> wrote: >>> >>> Hmmm, now it gets past my mention of the function, but when I run a dump >>>> on >>>> generated information, I get: >>>> >>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt >>>> - >>>> ERROR 2997: Unable to recreate exception from backed error: >>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray >>>> cannot >>>> be cast to java.lang.String* >>>> >>>> Thanks for all the help so far! >>>> >>>> Eli >>>> >>>> >>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote: >>>> >>>> You also want json-simple-1.1.jar >>>>> >>>>> >>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.**** >>>>> com<[email protected]> >>>>> >>>>>> wrote: >>>>>> >>>>> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, >>>>> guava-*.jar, >>>>> >>>>>> and >>>>>> piggybank.jar, and then trying to use that UDF, but getting the >>>>>> following >>>>>> error: >>>>>> >>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/** >>>>>> ParseException >>>>>> >>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/**** >>>>>> ParseException >>>>>> at java.lang.Class.forName0(******Native Method) >>>>>> at java.lang.Class.forName(Class.******java:247) >>>>>> at org.apache.pig.impl.******PigContext.resolveClassName(** >>>>>> PigContext.java:426) >>>>>> at org.apache.pig.impl.******PigContext.**** >>>>>> instantiateFuncFromSpec(** >>>>>> PigContext.java:456) >>>>>> at org.apache.pig.impl.******PigContext.**** >>>>>> instantiateFuncFromSpec(** >>>>>> PigContext.java:508) >>>>>> at org.apache.pig.impl.******PigContext.**** >>>>>> instantiateFuncFromAlias(** >>>>>> PigContext.java:531) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.EvalFuncSpec(******QueryParser.java:5462) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.BaseEvalSpec(******QueryParser.java:5291) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.UnaryExpr(******QueryParser.java:5187) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.CastExpr(******QueryParser.java:5133) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.****** >>>>>> QueryParser.** >>>>>> MultiplicativeExpr(******QueryParser.java:5042) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.AdditiveExpr(******QueryParser.java:4968) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.InfixExpr(******QueryParser.java:4934) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.****** >>>>>> QueryParser.** >>>>>> FlattenedGenerateItem(******QueryParser.java:4861) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.****** >>>>>> QueryParser.** >>>>>> FlattenedGenerateItemList(******QueryParser.java:4747) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.GenerateStatement(******QueryParser.java:4704) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.NestedBlock(******QueryParser.java:4030) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.ForEachClause(******QueryParser.java:3433) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.BaseExpr(******QueryParser.java:1464) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.Expr(QueryParser.******java:1013) >>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>> QueryParser.Parse(QueryParser.******java:800) >>>>>> etc... >>>>>> >>>>>> Any ideas? I've verified that it recognizes the function itself, and >>>>>> that >>>>>> the data it's running on is valid json. Not sure what else I can >>>>>> check. >>>>>> >>>>>> Eli >>>>>> >>>>>> >>>>>> >>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote: >>>>>> >>>>>> They derive from the same classes as far as lzo handling goes, so I >>>>>> >>>>>>> suspect >>>>>>> something's up with your environment or inputs if you get >>>>>>> LzoTokenizedLoader >>>>>>> to work, but LzoJsonStorage does not. >>>>>>> >>>>>>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage. >>>>>>> >>>>>>> JsonLoader wouldn't work for you because it expects the complete >>>>>>> input >>>>>>> line >>>>>>> to be json, not part of it. You want to load with LzoPigStorage, and >>>>>>> then >>>>>>> apply the JsonStringToMap udf to the third field. >>>>>>> >>>>>>> -D >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.**** >>>>>>> ** >>>>>>> com<[email protected]>> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm currently working on trying to load lzos that contain some JSON >>>>>>>> elements. This is of the form: >>>>>>>> >>>>>>>> item1 item2 {'thing1':'1','thing2':'2'} >>>>>>>> item3 item4 {'thing3':'1','thing27':'2'} >>>>>>>> item5 item6 {'thing5':'1','thing19':'2'} >>>>>>>> >>>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps >>>>>>>> throwing >>>>>>>> me >>>>>>>> errors like: >>>>>>>> ERROR com.hadoop.compression.lzo.********LzoCodec - Cannot load >>>>>>>> native-lzo >>>>>>>> without native-hadoop >>>>>>>> >>>>>>>> This is despite the fact that I can load normal lzos just fine using >>>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. >>>>>>>> What >>>>>>>> should >>>>>>>> I do to go about loading these files? Does anyone have any ideas? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Eli >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >> >
