Hey John, If you take a look at mine it looks explicitly for Lists and converts them to DataBags. I ran into that issue with our data. That said I won't make any claims that it'll work for all data.
Cheers, -Xavier On 4/19/11 12:02 PM, John Hui wrote: > I'll post my solution in a few hours =) > > On Tue, Apr 19, 2011 at 3:02 PM, John Hui <[email protected]> wrote: > >> I don't think one parser will work for all solution. It really depends on >> your data, since there might be a list within a list. >> >> But pick anyone as a starting point and customize it for your own json data >> format. >> >> >> On Tue, Apr 19, 2011 at 3:00 PM, Alan Gates <[email protected]> wrote: >> >>> On Apr 19, 2011, at 11:44 AM, Daniel Eklund wrote: >>> >>> <snip> >>>> A quick question about the UDF's registered at the top of a pig script: >>>> >>>> does >>>> REGISTER myJar.jar >>>> distribute the jar across HDFS (like a Hadoop job jar) so that the >>>> distribution of the code to the cluster nodes is transparent? >>>> In other words, do we NOT have to distribute myJar.jar to each node on >>>> the >>>> cluster. >>>> >>> Pig takes care of getting myJar.jar to the task nodes; you do not have to >>> worry about it. >>> >>> Alan. >>> >>>
