It's a straightforward fix by the way. Feel free to open a github issue for
elephant-bird, or better yet, toss me a pull request :).

D

On Tue, Sep 13, 2011 at 9:31 AM, Eli Finkelshteyn <[email protected]>wrote:

> Sweet! Just got this working! For anyone with the same problem in the
> future: apparently JsonStringToMap() *does not* like bytearrays. If you
> simply cast your json as a chararray when you're loading, the error
> disappears!
>
> Eli
>
>
> On 9/13/11 11:51 AM, Eli Finkelshteyn wrote:
>
>> Correction: I forgot to run the JsonStringToMap function when writing my
>> last email, when I run that, I get the same error as before
>> (*org.apache.pig.data.**DataByteArray cannot be cast to
>> java.lang.String*).
>>
>> My full workflow is as follows:
>>
>> initial = LOAD 'some_file.lzo' USING 
>> com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>> AS (col1, col2, col3, json_data);
>> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
>> piggybank.JsonStringToMap(**json_data) AS mapped_json_data;
>> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
>> type;
>> dump extracted;
>>
>> Any ideas?
>>
>> Eli
>>
>> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
>>
>>> Well, it's not throwing me errors anymore. Now it's just discarding the
>>> field. When I run it on two records where I've verified a field exists in
>>> the json, I get:
>>>
>>> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s).
>>>
>>> More specifically, my json is of the following form:
>>>
>>> {"foo":0,"bar":"hi"}
>>>
>>> On that, I'm running:
>>>
>>> initial = LOAD 'some_file.lzo' USING 
>>> com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>>> AS (col1, col2, col3, json_data);
>>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS
>>> type;
>>> dump extracted;
>>>
>>> Which gives me the above warning along with:
>>>
>>> ()
>>> ()
>>>
>>> I also tried it without the cast to chararray, but received the same
>>> results. Should I be casting json_data as some other data type when I load
>>> it initially? Seems by default it's cast to a bytearray when I describe
>>> initial. Would that be a problem?
>>>
>>> Thanks for all the help so far!
>>>
>>> Eli
>>>
>>>
>>>
>>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>>>
>>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>>>> theoretically).
>>>> The values are bytearrays. You are probably trying to treat them as
>>>> strings.
>>>>  You have to do stuff like this:
>>>>
>>>> x = foreach myrelation generate
>>>>   (chararray) mymap#'foo' as foo,
>>>>   (chararray) mymap#'bar' as bar;
>>>>
>>>>
>>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[email protected]>
>>>>  wrote:
>>>>
>>>>  Hmmm, now it gets past my mention of the function, but when I run a
>>>>> dump on
>>>>> generated information, I get:
>>>>>
>>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt
>>>>> -
>>>>> ERROR 2997: Unable to recreate exception from backed error:
>>>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray
>>>>> cannot
>>>>> be cast to java.lang.String*
>>>>>
>>>>> Thanks for all the help so far!
>>>>>
>>>>> Eli
>>>>>
>>>>>
>>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>>>
>>>>>  You also want json-simple-1.1.jar
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.***
>>>>>> *com<[email protected]>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>  Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
>>>>>> guava-*.jar,
>>>>>>
>>>>>>> and
>>>>>>> piggybank.jar, and then trying to use that UDF, but getting the
>>>>>>> following
>>>>>>> error:
>>>>>>>
>>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>>>> ParseException
>>>>>>>
>>>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/****
>>>>>>> ParseException
>>>>>>>        at java.lang.Class.forName0(******Native Method)
>>>>>>>        at java.lang.Class.forName(Class.******java:247)
>>>>>>>        at org.apache.pig.impl.******PigContext.resolveClassName(**
>>>>>>> PigContext.java:426)
>>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>>> instantiateFuncFromSpec(**
>>>>>>> PigContext.java:456)
>>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>>> instantiateFuncFromSpec(**
>>>>>>> PigContext.java:508)
>>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>>> instantiateFuncFromAlias(**
>>>>>>> PigContext.java:531)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.EvalFuncSpec(******QueryParser.java:5462)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.BaseEvalSpec(******QueryParser.java:5291)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.UnaryExpr(******QueryParser.java:5187)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.CastExpr(******QueryParser.java:5133)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>>> QueryParser.**
>>>>>>> MultiplicativeExpr(******QueryParser.java:5042)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.AdditiveExpr(******QueryParser.java:4968)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.InfixExpr(******QueryParser.java:4934)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>>> QueryParser.**
>>>>>>> FlattenedGenerateItem(******QueryParser.java:4861)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>>> QueryParser.**
>>>>>>> FlattenedGenerateItemList(******QueryParser.java:4747)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.GenerateStatement(******QueryParser.java:4704)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.NestedBlock(******QueryParser.java:4030)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.ForEachClause(******QueryParser.java:3433)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.BaseExpr(******QueryParser.java:1464)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.Expr(QueryParser.******java:1013)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.Parse(QueryParser.******java:800)
>>>>>>>        etc...
>>>>>>>
>>>>>>> Any ideas? I've verified that it recognizes the function itself, and
>>>>>>> that
>>>>>>> the data it's running on is valid json. Not sure what else I can
>>>>>>> check.
>>>>>>>
>>>>>>> Eli
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>>>>>>
>>>>>>>  They derive from the same classes as far as lzo handling goes, so I
>>>>>>>
>>>>>>>> suspect
>>>>>>>> something's up with your environment or inputs if you get
>>>>>>>> LzoTokenizedLoader
>>>>>>>> to work, but LzoJsonStorage does not.
>>>>>>>>
>>>>>>>> Note that LzoTokenizedLoader is deprecated -- just use
>>>>>>>> LzoPigStorage.
>>>>>>>>
>>>>>>>> JsonLoader wouldn't work for you because it expects the complete
>>>>>>>> input
>>>>>>>> line
>>>>>>>> to be json, not part of it. You want to load with LzoPigStorage, and
>>>>>>>> then
>>>>>>>> apply the JsonStringToMap udf to the third field.
>>>>>>>>
>>>>>>>> -D
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.***
>>>>>>>> ***
>>>>>>>> com<[email protected]>>
>>>>>>>>
>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>  Hi,
>>>>>>>>
>>>>>>>>  I'm currently working on trying to load lzos that contain some JSON
>>>>>>>>> elements. This is of the form:
>>>>>>>>>
>>>>>>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>>>>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>>>>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>>>>>>
>>>>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps
>>>>>>>>> throwing
>>>>>>>>> me
>>>>>>>>> errors like:
>>>>>>>>> ERROR com.hadoop.compression.lzo.********LzoCodec - Cannot load
>>>>>>>>> native-lzo
>>>>>>>>> without native-hadoop
>>>>>>>>>
>>>>>>>>> This is despite the fact that I can load normal lzos just fine
>>>>>>>>> using
>>>>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill.
>>>>>>>>> What
>>>>>>>>> should
>>>>>>>>> I do to go about loading these files? Does anyone have any ideas?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Eli
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>
>>
>

Reply via email to