For what it's worth I have one as well. This one uses Jackson to parse everything.
https://github.com/xstevens/akela/blob/master/src/java/com/mozilla/pig/eval/json/JsonMap.java On 4/19/11 11:55 AM, Dmitriy Ryaboy wrote: > YES :) > > On Tue, Apr 19, 2011 at 11:49 AM, John Hui <[email protected]> wrote: > >> I have a JSON library and pig script working. Should I just contribute it >> instead of reinventing the wheel? >> >> John >> >> On Tue, Apr 19, 2011 at 2:44 PM, Daniel Eklund <[email protected]> wrote: >> >>> Bill, thanks... >>> >>> so that is a confirmation... people have rolled their own, and it's not >> in >>> piggybank. >>> I would absolutely be willing to work with you to get a contribution >> going, >>> but (as >>> a warning) I am extremely new to Pig. >>> >>> I was looking at this: >>> http://wiki.apache.org/pig/UDFManual >>> to get my mind wrapped around the framework. And I also discovered this >>> >>> >> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/piggybank/JsonStringToMap.java >>> ( I am assuming this was the UDF you mentioned that inspired you)... >>> >>> A quick question about the UDF's registered at the top of a pig script: >>> >>> does >>> REGISTER myJar.jar >>> distribute the jar across HDFS (like a Hadoop job jar) so that the >>> distribution of the code to the cluster nodes is transparent? >>> In other words, do we NOT have to distribute myJar.jar to each node on >> the >>> cluster. >>> >>> thanks more, >>> daniel >>> >>> >>> >>> On Tue, Apr 19, 2011 at 1:57 PM, Bill Graham <[email protected]> >> wrote: >>>> We're doing the same thing using a JsonToMap UDF followed by a >>>> MapToBag UDF. The former was similarly inspired by the elephant bird >>>> JSONLoader. I'd be glad to collaborate on a contribution if you'd >>>> like. >>>> >>>> Here's what our scripts look like: >>>> >>>> define mapToBag cnwk.hadoop.mapreduce.pig.udf.MapToBag(); >>>> define jsonToMap cnwk.hadoop.mapreduce.pig.udf.JsonToMap(); >>>> define concat org.apache.pig.builtin.StringConcat(); >>>> >>>> raw = LOAD 'hbase://user_info' >>>> USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( >> 'events:*') >>>> AS (events_map:map[]); >>>> >>>> -- Convert our maps to bags so we can flatten them out >>>> B = FOREACH raw GENERATE mapToBag(events_map) AS event_bag; >>>> >>>> C = FOREACH B GENERATE FLATTEN(event_bag) AS (event_k:chararray, >>>> event_v:chararray); >>>> >>>> -- Convert the JSON events into maps >>>> D = FOREACH C GENERATE social_k, jsonToMap(event_v) AS event_map:map[]; >>>> >>>> -- Example showing how to filter on a given field >>>> E = FILTER D BY (event_map#'levt.astid' IS NOT NULL AND >>>> event_map#'levt.asid' IS NOT NULL); >>>> >>>> -- Example showing how to pull data out of a map >>>> F = FOREACH E GENERATE event_map#'levt.asid' AS asid, >>>> event_map#'levt.astid' AS >>>> astid; >>>> >>>> >>>> thanks, >>>> Bill >>>> >>>> On Tue, Apr 19, 2011 at 10:08 AM, Daniel Eklund <[email protected]> >>>> wrote: >>>>> I noticed that there is a Pig JSON Loader (which might or might not >> be >>> in >>>>> piggbank). >>>>> Could anyone confirm the existence or absence of a JSONToTuple UDF? >>> (not >>>> a >>>>> loader) >>>>> >>>>> I am inspired by the UDF mentioned on Slide 23 here: >>>>> http://www.slideshare.net/danharvey/hbase-at-mendeley >>>>> >>>>> doc = FOREACH rawdocs GENERATE >> DocumentProtobufBytesToTuple(protodoc) >>> as >>>>> DOC; >>>>> >>>>> My desire is to store a raw JSON doc in a cell in HBase and run pig >>>> queries >>>>> against the tuples generated by the UDF. >>>>> I used the HBase Loader already to get the cell-data, and now I need >> a >>>>> JSON-deserializer. >>>>> >>>>> I would be willing to roll my own, (and contribute), but I figure I'd >>> see >>>> if >>>>> there was anything out there first. >>>>> >>>>> thanks, >>>>> daniel >>>>>
