I've had poor experiences getting the default json loaders to work as well. I would highly recommend writing your own UDF JsonLoader extending LoadFunc over, say, importing twitter's elephantbird. A couple of ideas here:
- Use TextLoader<https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/TextLoader.java>as an example to learn how the LoadFunc<https://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/LoadFunc.java?view=markup>abstract class is implemented well - With unit testing, there's a guarantee that the json parsing will be performed exactly as desired - Also look into PigStorage<https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/PigStorage.java>do get an idea of how to extend FileInputLoadFunc and implement StoreFuncInterface well The learning exercise on its own is the valuable part here imho. It'll allow for more agile development going forward with future projects, with only the sunk cost of a few days of research and development. Hope this helps. -Dan On Fri, Aug 30, 2013 at 10:03 AM, Zhu Wayne <[email protected]> wrote: > try twitter's jsonloader. > > > > On Fri, Aug 30, 2013 at 2:20 AM, Ruslan Al-Fakikh <[email protected] > >wrote: > > > Hi, > > > > There are different json loaders available, but none of them worked for > me > > when I had to deal with json. I ended up loading the file as text file, > > reading one line at a time and then I parsed json inside my UDF with a > json > > java library > > > > Best Regards, > > Ruslan > > > > > > On Fri, Aug 30, 2013 at 2:53 AM, jamal sasha <[email protected]> > > wrote: > > > > > Umm.. I am trying .. but somehow i am not able to get my head around > > this: > > > a = load 'sample_json.json' using > > > JsonLoader('id:chararray,categories:[chararray], hostt:{ (variable_a: > > > {(first:int,last:int)})}, ns:[chararray],rep:chararray '); > > > > > > But i get this error: > > > org.codehaus.jackson.JsonParseException: Unexpected character ('D' > (code > > > 68)): expected a valid value (number, String, array, object, 'true', > > > 'false' or 'null') > > > at [Source: java.io.ByteArrayInputStream@4795b8e9; line: 1, column: > 50] > > > at > org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291) > > > at > > > > > > > > > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385) > > > at > > > > > > > > > org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306) > > > at > > > > > > > > > org.codehaus.jackson.impl.Utf8StreamParser._handleUnexpectedValue(Utf8StreamParser.java:1582) > > > at > > > > > > > > > org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:386) > > > at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:173) > > > at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:157) > > > at > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) > > > at > > > > > > > > > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > > > at > > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > > > > > > > > > > > > On Thu, Aug 29, 2013 at 3:22 PM, Shahab Yunus <[email protected] > > > >wrote: > > > > > > > Have you seen these? > > > > > > > > > > > > > > http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/builtin/JsonStorage.html > > > > > > > > http://hortonworks.com/blog/jsonize-anything-in-pig-with-tojson/ > > > > > > > > Regards, > > > > Shahab > > > > > > > > > > > > On Thu, Aug 29, 2013 at 6:19 PM, jamal sasha <[email protected]> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I have json file in follwoing format: > > > > > { "_id" : "foo.com", "categories" : [], "h1" : { "bar==" : { > > "first" : > > > > > 1281916800, "last" : 1316995200 }, "foo==" : { "first" : > 1281916800, > > > > "last" > > > > > : 1316995200 } }, "name2" : [ "foobarl.com", "foobar2.com" ], > "rep" > > : > > > > > null } > > > > > So, how do i parse this json in pig.. > > > > > > > > > > also, the categories and rep can have some char in it..and might > not > > be > > > > > always empty. > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > -- > Wayne Zhu > 847-282-0596 (Google Voice) >
