Haha, yeah; that. I literally just got it to work when you emailed.
Thanks for all the help, Dmitriy!
Eli
On 9/13/11 12:30 PM, Dmitriy Ryaboy wrote:
initial = LOAD 'some_file.lzo' USING
com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
AS (col1, col2, col3, json_data*:chararray*);
or
map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
piggybank.JsonStringToMap((chararray) **json_data) AS mapped_json_data;
extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
type;
On Tue, Sep 13, 2011 at 8:51 AM, Eli Finkelshteyn<[email protected]>wrote:
Correction: I forgot to run the JsonStringToMap function when writing my
last email, when I run that, I get the same error as before
(*org.apache.pig.data.**DataByteArray cannot be cast to
java.lang.String*).
My full workflow is as follows:
initial = LOAD 'some_file.lzo' USING
com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
AS (col1, col2, col3, json_data);
map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
piggybank.JsonStringToMap(**json_data) AS mapped_json_data;
extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
type;
dump extracted;
Any ideas?
Eli
On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
Well, it's not throwing me errors anymore. Now it's just discarding the
field. When I run it on two records where I've verified a field exists in
the json, I get:
Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s).
More specifically, my json is of the following form:
{"foo":0,"bar":"hi"}
On that, I'm running:
initial = LOAD 'some_file.lzo' USING
com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
AS (col1, col2, col3, json_data);
extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS type;
dump extracted;
Which gives me the above warning along with:
()
()
I also tried it without the cast to chararray, but received the same
results. Should I be casting json_data as some other data type when I load
it initially? Seems by default it's cast to a bytearray when I describe
initial. Would that be a problem?
Thanks for all the help so far!
Eli
On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
theoretically).
The values are bytearrays. You are probably trying to treat them as
strings.
You have to do stuff like this:
x = foreach myrelation generate
(chararray) mymap#'foo' as foo,
(chararray) mymap#'bar' as bar;
On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[email protected]>
wrote:
Hmmm, now it gets past my mention of the function, but when I run a dump
on
generated information, I get:
2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt
-
ERROR 2997: Unable to recreate exception from backed error:
java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray
cannot
be cast to java.lang.String*
Thanks for all the help so far!
Eli
On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
You also want json-simple-1.1.jar
On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.****
com<[email protected]>
wrote:
Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
guava-*.jar,
and
piggybank.jar, and then trying to use that UDF, but getting the
following
error:
ERROR 2998: Unhandled internal error. org/json/simple/parser/**
ParseException
java.lang.******NoClassDefFoundError: org/json/simple/parser/****
ParseException
at java.lang.Class.forName0(******Native Method)
at java.lang.Class.forName(Class.******java:247)
at org.apache.pig.impl.******PigContext.resolveClassName(**
PigContext.java:426)
at org.apache.pig.impl.******PigContext.****
instantiateFuncFromSpec(**
PigContext.java:456)
at org.apache.pig.impl.******PigContext.****
instantiateFuncFromSpec(**
PigContext.java:508)
at org.apache.pig.impl.******PigContext.****
instantiateFuncFromAlias(**
PigContext.java:531)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.EvalFuncSpec(******QueryParser.java:5462)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.BaseEvalSpec(******QueryParser.java:5291)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.UnaryExpr(******QueryParser.java:5187)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.CastExpr(******QueryParser.java:5133)
at org.apache.pig.impl.******logicalLayer.parser.******
QueryParser.**
MultiplicativeExpr(******QueryParser.java:5042)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.AdditiveExpr(******QueryParser.java:4968)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.InfixExpr(******QueryParser.java:4934)
at org.apache.pig.impl.******logicalLayer.parser.******
QueryParser.**
FlattenedGenerateItem(******QueryParser.java:4861)
at org.apache.pig.impl.******logicalLayer.parser.******
QueryParser.**
FlattenedGenerateItemList(******QueryParser.java:4747)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.GenerateStatement(******QueryParser.java:4704)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.NestedBlock(******QueryParser.java:4030)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.ForEachClause(******QueryParser.java:3433)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.BaseExpr(******QueryParser.java:1464)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.Expr(QueryParser.******java:1013)
at org.apache.pig.impl.******logicalLayer.parser.**
QueryParser.Parse(QueryParser.******java:800)
etc...
Any ideas? I've verified that it recognizes the function itself, and
that
the data it's running on is valid json. Not sure what else I can
check.
Eli
On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
They derive from the same classes as far as lzo handling goes, so I
suspect
something's up with your environment or inputs if you get
LzoTokenizedLoader
to work, but LzoJsonStorage does not.
Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.
JsonLoader wouldn't work for you because it expects the complete
input
line
to be json, not part of it. You want to load with LzoPigStorage, and
then
apply the JsonStringToMap udf to the third field.
-D
On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.****
**
com<[email protected]>>
wrote:
Hi,
I'm currently working on trying to load lzos that contain some JSON
elements. This is of the form:
item1 item2 {'thing1':'1','thing2':'2'}
item3 item4 {'thing3':'1','thing27':'2'}
item5 item6 {'thing5':'1','thing19':'2'}
I was thinking I could use LzoJsonLoader for this, but it keeps
throwing
me
errors like:
ERROR com.hadoop.compression.lzo.********LzoCodec - Cannot load
native-lzo
without native-hadoop
This is despite the fact that I can load normal lzos just fine using
LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill.
What
should
I do to go about loading these files? Does anyone have any ideas?
Cheers,
Eli