initial = LOAD 'some_file.lzo' USING
com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
AS (col1, col2, col3, json_data*:chararray*);

or

map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
piggybank.JsonStringToMap((chararray) **json_data) AS mapped_json_data;


extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
type;


On Tue, Sep 13, 2011 at 8:51 AM, Eli Finkelshteyn <[email protected]>wrote:

> Correction: I forgot to run the JsonStringToMap function when writing my
> last email, when I run that, I get the same error as before
> (*org.apache.pig.data.**DataByteArray cannot be cast to
> java.lang.String*).
>
> My full workflow is as follows:
>
>
> initial = LOAD 'some_file.lzo' USING 
> com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
> AS (col1, col2, col3, json_data);
> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
> piggybank.JsonStringToMap(**json_data) AS mapped_json_data;
> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
> type;
> dump extracted;
>
> Any ideas?
>
> Eli
>
>
> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
>
>> Well, it's not throwing me errors anymore. Now it's just discarding the
>> field. When I run it on two records where I've verified a field exists in
>> the json, I get:
>>
>> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s).
>>
>> More specifically, my json is of the following form:
>>
>> {"foo":0,"bar":"hi"}
>>
>> On that, I'm running:
>>
>> initial = LOAD 'some_file.lzo' USING 
>> com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>> AS (col1, col2, col3, json_data);
>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS type;
>> dump extracted;
>>
>> Which gives me the above warning along with:
>>
>> ()
>> ()
>>
>> I also tried it without the cast to chararray, but received the same
>> results. Should I be casting json_data as some other data type when I load
>> it initially? Seems by default it's cast to a bytearray when I describe
>> initial. Would that be a problem?
>>
>> Thanks for all the help so far!
>>
>> Eli
>>
>>
>>
>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>>
>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>>> theoretically).
>>> The values are bytearrays. You are probably trying to treat them as
>>> strings.
>>>  You have to do stuff like this:
>>>
>>> x = foreach myrelation generate
>>>   (chararray) mymap#'foo' as foo,
>>>   (chararray) mymap#'bar' as bar;
>>>
>>>
>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[email protected]>
>>>  wrote:
>>>
>>>  Hmmm, now it gets past my mention of the function, but when I run a dump
>>>> on
>>>> generated information, I get:
>>>>
>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt
>>>> -
>>>> ERROR 2997: Unable to recreate exception from backed error:
>>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray
>>>> cannot
>>>> be cast to java.lang.String*
>>>>
>>>> Thanks for all the help so far!
>>>>
>>>> Eli
>>>>
>>>>
>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>>
>>>>  You also want json-simple-1.1.jar
>>>>>
>>>>>
>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.****
>>>>> com<[email protected]>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>  Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
>>>>> guava-*.jar,
>>>>>
>>>>>> and
>>>>>> piggybank.jar, and then trying to use that UDF, but getting the
>>>>>> following
>>>>>> error:
>>>>>>
>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>>> ParseException
>>>>>>
>>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/****
>>>>>> ParseException
>>>>>>        at java.lang.Class.forName0(******Native Method)
>>>>>>        at java.lang.Class.forName(Class.******java:247)
>>>>>>        at org.apache.pig.impl.******PigContext.resolveClassName(**
>>>>>> PigContext.java:426)
>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>> instantiateFuncFromSpec(**
>>>>>> PigContext.java:456)
>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>> instantiateFuncFromSpec(**
>>>>>> PigContext.java:508)
>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>> instantiateFuncFromAlias(**
>>>>>> PigContext.java:531)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.EvalFuncSpec(******QueryParser.java:5462)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.BaseEvalSpec(******QueryParser.java:5291)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.UnaryExpr(******QueryParser.java:5187)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.CastExpr(******QueryParser.java:5133)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>> QueryParser.**
>>>>>> MultiplicativeExpr(******QueryParser.java:5042)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.AdditiveExpr(******QueryParser.java:4968)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.InfixExpr(******QueryParser.java:4934)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>> QueryParser.**
>>>>>> FlattenedGenerateItem(******QueryParser.java:4861)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>> QueryParser.**
>>>>>> FlattenedGenerateItemList(******QueryParser.java:4747)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.GenerateStatement(******QueryParser.java:4704)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.NestedBlock(******QueryParser.java:4030)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.ForEachClause(******QueryParser.java:3433)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.BaseExpr(******QueryParser.java:1464)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.Expr(QueryParser.******java:1013)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.Parse(QueryParser.******java:800)
>>>>>>        etc...
>>>>>>
>>>>>> Any ideas? I've verified that it recognizes the function itself, and
>>>>>> that
>>>>>> the data it's running on is valid json. Not sure what else I can
>>>>>> check.
>>>>>>
>>>>>> Eli
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>>>>>
>>>>>>  They derive from the same classes as far as lzo handling goes, so I
>>>>>>
>>>>>>> suspect
>>>>>>> something's up with your environment or inputs if you get
>>>>>>> LzoTokenizedLoader
>>>>>>> to work, but LzoJsonStorage does not.
>>>>>>>
>>>>>>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.
>>>>>>>
>>>>>>> JsonLoader wouldn't work for you because it expects the complete
>>>>>>> input
>>>>>>> line
>>>>>>> to be json, not part of it. You want to load with LzoPigStorage, and
>>>>>>> then
>>>>>>> apply the JsonStringToMap udf to the third field.
>>>>>>>
>>>>>>> -D
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.****
>>>>>>> **
>>>>>>> com<[email protected]>>
>>>>>>>
>>>>>>>  wrote:
>>>>>>>
>>>>>>>  Hi,
>>>>>>>
>>>>>>>  I'm currently working on trying to load lzos that contain some JSON
>>>>>>>> elements. This is of the form:
>>>>>>>>
>>>>>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>>>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>>>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>>>>>
>>>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps
>>>>>>>> throwing
>>>>>>>> me
>>>>>>>> errors like:
>>>>>>>> ERROR com.hadoop.compression.lzo.********LzoCodec - Cannot load
>>>>>>>> native-lzo
>>>>>>>> without native-hadoop
>>>>>>>>
>>>>>>>> This is despite the fact that I can load normal lzos just fine using
>>>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill.
>>>>>>>> What
>>>>>>>> should
>>>>>>>> I do to go about loading these files? Does anyone have any ideas?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Eli
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>
>

Reply via email to