It's a straightforward fix by the way. Feel free to open a github issue for elephant-bird, or better yet, toss me a pull request :).
D On Tue, Sep 13, 2011 at 9:31 AM, Eli Finkelshteyn <[email protected]>wrote: > Sweet! Just got this working! For anyone with the same problem in the > future: apparently JsonStringToMap() *does not* like bytearrays. If you > simply cast your json as a chararray when you're loading, the error > disappears! > > Eli > > > On 9/13/11 11:51 AM, Eli Finkelshteyn wrote: > >> Correction: I forgot to run the JsonStringToMap function when writing my >> last email, when I run that, I get the same error as before >> (*org.apache.pig.data.**DataByteArray cannot be cast to >> java.lang.String*). >> >> My full workflow is as follows: >> >> initial = LOAD 'some_file.lzo' USING >> com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t') >> AS (col1, col2, col3, json_data); >> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.** >> piggybank.JsonStringToMap(**json_data) AS mapped_json_data; >> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS >> type; >> dump extracted; >> >> Any ideas? >> >> Eli >> >> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote: >> >>> Well, it's not throwing me errors anymore. Now it's just discarding the >>> field. When I run it on two records where I've verified a field exists in >>> the json, I get: >>> >>> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s). >>> >>> More specifically, my json is of the following form: >>> >>> {"foo":0,"bar":"hi"} >>> >>> On that, I'm running: >>> >>> initial = LOAD 'some_file.lzo' USING >>> com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t') >>> AS (col1, col2, col3, json_data); >>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS >>> type; >>> dump extracted; >>> >>> Which gives me the above warning along with: >>> >>> () >>> () >>> >>> I also tried it without the cast to chararray, but received the same >>> results. Should I be casting json_data as some other data type when I load >>> it initially? Seems by default it's cast to a bytearray when I describe >>> initial. Would that be a problem? >>> >>> Thanks for all the help so far! >>> >>> Eli >>> >>> >>> >>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote: >>> >>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9, >>>> theoretically). >>>> The values are bytearrays. You are probably trying to treat them as >>>> strings. >>>> You have to do stuff like this: >>>> >>>> x = foreach myrelation generate >>>> (chararray) mymap#'foo' as foo, >>>> (chararray) mymap#'bar' as bar; >>>> >>>> >>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[email protected]> >>>> wrote: >>>> >>>> Hmmm, now it gets past my mention of the function, but when I run a >>>>> dump on >>>>> generated information, I get: >>>>> >>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt >>>>> - >>>>> ERROR 2997: Unable to recreate exception from backed error: >>>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray >>>>> cannot >>>>> be cast to java.lang.String* >>>>> >>>>> Thanks for all the help so far! >>>>> >>>>> Eli >>>>> >>>>> >>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote: >>>>> >>>>> You also want json-simple-1.1.jar >>>>>> >>>>>> >>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.*** >>>>>> *com<[email protected]> >>>>>> >>>>>>> wrote: >>>>>>> >>>>>> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, >>>>>> guava-*.jar, >>>>>> >>>>>>> and >>>>>>> piggybank.jar, and then trying to use that UDF, but getting the >>>>>>> following >>>>>>> error: >>>>>>> >>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/** >>>>>>> ParseException >>>>>>> >>>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/**** >>>>>>> ParseException >>>>>>> at java.lang.Class.forName0(******Native Method) >>>>>>> at java.lang.Class.forName(Class.******java:247) >>>>>>> at org.apache.pig.impl.******PigContext.resolveClassName(** >>>>>>> PigContext.java:426) >>>>>>> at org.apache.pig.impl.******PigContext.**** >>>>>>> instantiateFuncFromSpec(** >>>>>>> PigContext.java:456) >>>>>>> at org.apache.pig.impl.******PigContext.**** >>>>>>> instantiateFuncFromSpec(** >>>>>>> PigContext.java:508) >>>>>>> at org.apache.pig.impl.******PigContext.**** >>>>>>> instantiateFuncFromAlias(** >>>>>>> PigContext.java:531) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.EvalFuncSpec(******QueryParser.java:5462) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.BaseEvalSpec(******QueryParser.java:5291) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.UnaryExpr(******QueryParser.java:5187) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.CastExpr(******QueryParser.java:5133) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.****** >>>>>>> QueryParser.** >>>>>>> MultiplicativeExpr(******QueryParser.java:5042) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.AdditiveExpr(******QueryParser.java:4968) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.InfixExpr(******QueryParser.java:4934) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.****** >>>>>>> QueryParser.** >>>>>>> FlattenedGenerateItem(******QueryParser.java:4861) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.****** >>>>>>> QueryParser.** >>>>>>> FlattenedGenerateItemList(******QueryParser.java:4747) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.GenerateStatement(******QueryParser.java:4704) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.NestedBlock(******QueryParser.java:4030) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.ForEachClause(******QueryParser.java:3433) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.BaseExpr(******QueryParser.java:1464) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.Expr(QueryParser.******java:1013) >>>>>>> at org.apache.pig.impl.******logicalLayer.parser.** >>>>>>> QueryParser.Parse(QueryParser.******java:800) >>>>>>> etc... >>>>>>> >>>>>>> Any ideas? I've verified that it recognizes the function itself, and >>>>>>> that >>>>>>> the data it's running on is valid json. Not sure what else I can >>>>>>> check. >>>>>>> >>>>>>> Eli >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote: >>>>>>> >>>>>>> They derive from the same classes as far as lzo handling goes, so I >>>>>>> >>>>>>>> suspect >>>>>>>> something's up with your environment or inputs if you get >>>>>>>> LzoTokenizedLoader >>>>>>>> to work, but LzoJsonStorage does not. >>>>>>>> >>>>>>>> Note that LzoTokenizedLoader is deprecated -- just use >>>>>>>> LzoPigStorage. >>>>>>>> >>>>>>>> JsonLoader wouldn't work for you because it expects the complete >>>>>>>> input >>>>>>>> line >>>>>>>> to be json, not part of it. You want to load with LzoPigStorage, and >>>>>>>> then >>>>>>>> apply the JsonStringToMap udf to the third field. >>>>>>>> >>>>>>>> -D >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.*** >>>>>>>> *** >>>>>>>> com<[email protected]>> >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm currently working on trying to load lzos that contain some JSON >>>>>>>>> elements. This is of the form: >>>>>>>>> >>>>>>>>> item1 item2 {'thing1':'1','thing2':'2'} >>>>>>>>> item3 item4 {'thing3':'1','thing27':'2'} >>>>>>>>> item5 item6 {'thing5':'1','thing19':'2'} >>>>>>>>> >>>>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps >>>>>>>>> throwing >>>>>>>>> me >>>>>>>>> errors like: >>>>>>>>> ERROR com.hadoop.compression.lzo.********LzoCodec - Cannot load >>>>>>>>> native-lzo >>>>>>>>> without native-hadoop >>>>>>>>> >>>>>>>>> This is despite the fact that I can load normal lzos just fine >>>>>>>>> using >>>>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. >>>>>>>>> What >>>>>>>>> should >>>>>>>>> I do to go about loading these files? Does anyone have any ideas? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Eli >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> >> >
