Do you have a sample of the input data? Looks like the json is malformed.

The json parser is croaking:
Could not json-decode string: "type": "FeatureCollection",
Unexpected token COLON(:) at position 7.
at org.json.simple.parser.JSONParser.parse(Unknown Source)

Is your json multi-line (pretty-printed) instead of single record per line?

D

On Wed, Apr 18, 2012 at 7:34 PM, Fabio Souto Moure
<[email protected]> wrote:
> Hi,
>
> I'm using pig 0.9.2 with the JsonLoader included in elephant-bird 2.2.2 to
> process geojson data(Flickr shapefiles:
> http://code.flickr.com/blog/2011/01/08/flickr-shapefiles-public-dataset-2-0/
> ).
> But I'm unable to parse nested json, with this code:
>
> ##################
> REGISTER
> /Users/fabio/bigdata/pig/elephant-bird/build/elephant-bird-2.2.0.jar;
>
> raw_data = LOAD
> 'file:/Users/fabio/bigdata/flickr_shapes/flickr_shapes_continents.geojson'
> USING com.twitter.elephantbird.pig.load.JsonLoader() as (json: map[]);
> features = foreach raw_data generate json#'features'#'properties' as h;
> b = foreach features generate flatten(h) as h;
> c = foreach b generate h#'place_id' as h;
> dump c;
> #################
>
> I'm getting the following error:
>
>
> 2012-04-19 04:26:46,984 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: {
> Unexpected token END OF FILE at position 1.
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:46,986 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "type": "FeatureCollection",
> Unexpected token COLON(:) at position 7.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:46,987 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "name": "Flickr Shapes Public Dataset 2.0 - Continents",
> Unexpected token COLON(:) at position 7.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:46,988 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "description": "To the extent possible under law, Flickr has waived
> all copyright and related or neighboring rights to the Flickr Shapes Public
> Dataset, Version 2.0. This work is published from the United States. While
> you are under no obligation to do so, wherever possible it would be
> extra-super-duper-awesome if you would attribute Flickr.com when using the
> dataset. Thanks!",
> Unexpected token COLON(:) at position 14.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:46,989 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "license": "http://creativecommons.org/publicdomain/zero/1.0/";,
> Unexpected token COLON(:) at position 10.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:46,997 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "features": [
> Unexpected token COLON(:) at position 11.
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,034 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: {
> Unexpected token END OF FILE at position 3.
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,035 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "type": "Feature",
> Unexpected token COLON(:) at position 9.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,035 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "id": 24865670,
> Unexpected token COLON(:) at position 7.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,040 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "properties": {
> Unexpected token COLON(:) at position 15.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,041 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "woe_id": 24865670,
> Unexpected token COLON(:) at position 12.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,042 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "place_id": "lSYmioybBZTDvHjQsQ",
> Unexpected token COLON(:) at position 14.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,043 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "place_type": "continent",
> Unexpected token COLON(:) at position 16.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,043 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "place_type_id": 29,
> Unexpected token COLON(:) at position 19.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,044 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "label": "Africa",
> Unexpected token COLON(:) at position 11.
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,045 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: },
> Unexpected token RIGHT BRACE(}) at position 3.
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at
> com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160)
>  at
> com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>  at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 2012-04-19 04:26:47,046 [Thread-6] WARN
>  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string: "geometry":
> Unexpected token COLON(:) at position 13.
> ....
>
>
>
> Anybody can help me?
> Thanks
> Fabio

Reply via email to