Do you have a sample of the input data? Looks like the json is malformed. The json parser is croaking: Could not json-decode string: "type": "FeatureCollection", Unexpected token COLON(:) at position 7. at org.json.simple.parser.JSONParser.parse(Unknown Source)
Is your json multi-line (pretty-printed) instead of single record per line? D On Wed, Apr 18, 2012 at 7:34 PM, Fabio Souto Moure <[email protected]> wrote: > Hi, > > I'm using pig 0.9.2 with the JsonLoader included in elephant-bird 2.2.2 to > process geojson data(Flickr shapefiles: > http://code.flickr.com/blog/2011/01/08/flickr-shapefiles-public-dataset-2-0/ > ). > But I'm unable to parse nested json, with this code: > > ################## > REGISTER > /Users/fabio/bigdata/pig/elephant-bird/build/elephant-bird-2.2.0.jar; > > raw_data = LOAD > 'file:/Users/fabio/bigdata/flickr_shapes/flickr_shapes_continents.geojson' > USING com.twitter.elephantbird.pig.load.JsonLoader() as (json: map[]); > features = foreach raw_data generate json#'features'#'properties' as h; > b = foreach features generate flatten(h) as h; > c = foreach b generate h#'place_id' as h; > dump c; > ################# > > I'm getting the following error: > > > 2012-04-19 04:26:46,984 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: { > Unexpected token END OF FILE at position 1. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:46,986 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "type": "FeatureCollection", > Unexpected token COLON(:) at position 7. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:46,987 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "name": "Flickr Shapes Public Dataset 2.0 - Continents", > Unexpected token COLON(:) at position 7. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:46,988 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "description": "To the extent possible under law, Flickr has waived > all copyright and related or neighboring rights to the Flickr Shapes Public > Dataset, Version 2.0. This work is published from the United States. While > you are under no obligation to do so, wherever possible it would be > extra-super-duper-awesome if you would attribute Flickr.com when using the > dataset. Thanks!", > Unexpected token COLON(:) at position 14. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:46,989 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "license": "http://creativecommons.org/publicdomain/zero/1.0/", > Unexpected token COLON(:) at position 10. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:46,997 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "features": [ > Unexpected token COLON(:) at position 11. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,034 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: { > Unexpected token END OF FILE at position 3. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,035 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "type": "Feature", > Unexpected token COLON(:) at position 9. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,035 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "id": 24865670, > Unexpected token COLON(:) at position 7. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,040 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "properties": { > Unexpected token COLON(:) at position 15. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,041 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "woe_id": 24865670, > Unexpected token COLON(:) at position 12. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,042 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "place_id": "lSYmioybBZTDvHjQsQ", > Unexpected token COLON(:) at position 14. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,043 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "place_type": "continent", > Unexpected token COLON(:) at position 16. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,043 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "place_type_id": 29, > Unexpected token COLON(:) at position 19. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,044 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "label": "Africa", > Unexpected token COLON(:) at position 11. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,045 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: }, > Unexpected token RIGHT BRACE(}) at position 3. > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at > com.twitter.elephantbird.pig.load.JsonLoader.parseStringToTuple(JsonLoader.java:160) > at > com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:131) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 2012-04-19 04:26:47,046 [Thread-6] WARN > com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string: "geometry": > Unexpected token COLON(:) at position 13. > .... > > > > Anybody can help me? > Thanks > Fabio
