Joe, we'd be happy to take a pull request that addresses this cast
exception and maybe increments a counter

On Mon, Apr 9, 2012 at 2:27 PM, Joe Crobak <[email protected]> wrote:
> Hi Norbert,
>
> In some cases, I actually get a ClassCastException, which I guess are the
> eventual cause of the job failures:
>
> java.lang.ClassCastException: java.lang.Long cannot be cast to
> org.json.simple.JSONObject
>        at 
> com.twitter.elephantbird.pig.piggybank.JsonStringToMap.parseStringToMap(JsonStringToMap.java:52)
>        at 
> com.twitter.elephantbird.pig.piggybank.JsonStringToMap.exec(JsonStringToMap.java:42)
>        at 
> com.twitter.elephantbird.pig.piggybank.JsonStringToMap.exec(JsonStringToMap.java:22)
>
> (Note that I switched back to the 2.1.11 tag, so the stack trace
> corresponds to
> https://github.com/kevinweil/elephant-bird/blob/b300849f6d014aaac520e385a34aa37adb53b5fa/src/java/com/twitter/elephantbird/pig/piggybank/JsonStringToMap.java)
>
> I've put together a dummy heuristic to skip lines that don't match
> ^\\{.*\\}$ and this seems to get me past the CCE.
>
> Thanks for the info, though, I clearly missed the logging that you pointed
> out.
>
> Joe
>
>
>
> On Mon, Apr 9, 2012 at 4:36 PM, Norbert Burger 
> <[email protected]>wrote:
>
>> So in this case, it seems like JsonStringToMap is properly catching the
>> parse exception; in fact, it's the catch clause of the UDF that's
>> generating the "Could not json-decode string" message in your task tracker
>> logs.
>>
>> Take a look at line 63 here:
>>
>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/piggybank/JsonStringToMap.java
>>
>> When a parse exception happens, the UDF returns a null.  Are you filtering
>> out nulls before trying to project?
>>
>> Norbert
>>
>> On Mon, Apr 9, 2012 at 3:41 PM, Joe Crobak <[email protected]> wrote:
>>
>> > so it turns out our uncompressed data contains corrupted rows. Is there a
>> > way to easily tell wrap the JsonStringToMap UDF to catch exceptions on
>> > unparsable lines and just skip them?
>> >
>> > On Thu, Apr 5, 2012 at 11:44 AM, Joe Crobak <[email protected]> wrote:
>> >
>> > > Hi,
>> > >
>> > > I'm using pig 0.9.2 on cdh3u3 with a snapshot-build of elephant bird in
>> > > order to get json parsing. I have an incredibly unusual error that I
>> see
>> > > with certain gzip compressed files. It's probably easiest to show you a
>> > pig
>> > > session:
>> > >
>> > > grunt> register '/home/joe/elephant-bird-2.1.12-SNAPSHOT.jar';
>> > > grunt> register '/home/joe/json-simple-1.1.jar';
>> > > grunt> apiHits = LOAD '/user/joe/path/to/part-r-00000.gz' USING
>> > > TextLoader() as (line: chararray);
>> > > grunt> X = FOREACH apiHits GENERATE line,
>> > > com.twitter.elephantbird.pig.piggybank.JsonStringToMap(line) as json;
>> > > grunt> Y = LIMIT X 2;
>> > > grunt> dump Y;
>> > > (succeeds, and I get what I expect).
>> > >
>> > > Now, if I try to do a projection using the json field, I get the
>> > following:
>> > >
>> > > grunt> A = FILTER X BY
>> > > >>   json#'logtype' == 'foo'
>> > > >>   OR json#'consumer' == 'foo1'
>> > > >>   OR json#'consumer' == 'foo2'
>> > > >>   OR json#'consumer' == 'foo3'
>> > > >>   OR json#'consumer' == 'foo4'
>> > > >>   ;
>> > > grunt> B = LIMIT A 2;
>> > > grunt> dump B;
>> > >
>> > > ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR:
>> > java.lang.Long
>> > > cannot be cast to org.json.simple.JSONObject
>> > >
>> > > And in the task tracker logs, the stack trace suggests that the json
>> udf
>> > > is seeing compressed data [1]. Does anyone have any ideas how to debug
>> > > this, or guesses to what the problem is? Can I somehow determine if
>> > hadoop
>> > > is actually decompressing the data or not?
>> > >
>> > > Thanks!
>> > > Joe
>> > >
>> > > [1]
>> > >
>> > > 2012-04-05 14:39:20,211 WARN
>> > com.twitter.elephantbird.pig.piggybank.JsonStringToMap: Could not
>> > json-decode string:  � ���
>> > > Unexpected character ( ) at position 0.
>> > >       at org.json.simple.parser.Yylex.yylex(Unknown Source)
>> > >       at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
>> > >       at org.json.simple.parser.JSONParser.parse(Unknown Source)
>> > >       at org.json.simple.parser.JSONParser.parse(Unknown Source)
>> > >       at org.json.simple.parser.JSONParser.parse(Unknown Source)
>> > >       at
>> >
>> com.twitter.elephantbird.pig.piggybank.JsonStringToMap.parseStringToMap(JsonStringToMap.java:63)
>> > >       at
>> >
>> com.twitter.elephantbird.pig.piggybank.JsonStringToMap.exec(JsonStringToMap.java:53)
>> > >       at
>> >
>> com.twitter.elephantbird.pig.piggybank.JsonStringToMap.exec(JsonStringToMap.java:25)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:299)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:332)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:85)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>> > >       at
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> > >       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > >       at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>> > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>> > >       at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>> > >       at java.security.AccessController.doPrivileged(Native Method)
>> > >       at javax.security.auth.Subject.doAs(Subject.java:396)
>> > >       at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
>> > >       at org.apache.hadoop.mapred.Child.main(Child.java:264)
>> > >
>> > >
>> > >
>> >
>>

Reply via email to