Really, cool. Let me take a look when I have some "downtime". If that's the case, Xavier's parser is much better than mine.
Who wants to take the lead in adding this to the piggybank, I am sure this makes for a very useful "storage" utility. John On Tue, Apr 19, 2011 at 3:09 PM, Xavier Stevens <[email protected]>wrote: > Hey John, > > If you take a look at mine it looks explicitly for Lists and converts > them to DataBags. I ran into that issue with our data. That said I won't > make any claims that it'll work for all data. > > Cheers, > > -Xavier > > On 4/19/11 12:02 PM, John Hui wrote: > > I'll post my solution in a few hours =) > > > > On Tue, Apr 19, 2011 at 3:02 PM, John Hui <[email protected]> wrote: > > > >> I don't think one parser will work for all solution. It really depends > on > >> your data, since there might be a list within a list. > >> > >> But pick anyone as a starting point and customize it for your own json > data > >> format. > >> > >> > >> On Tue, Apr 19, 2011 at 3:00 PM, Alan Gates <[email protected]> > wrote: > >> > >>> On Apr 19, 2011, at 11:44 AM, Daniel Eklund wrote: > >>> > >>> <snip> > >>>> A quick question about the UDF's registered at the top of a pig > script: > >>>> > >>>> does > >>>> REGISTER myJar.jar > >>>> distribute the jar across HDFS (like a Hadoop job jar) so that the > >>>> distribution of the code to the cluster nodes is transparent? > >>>> In other words, do we NOT have to distribute myJar.jar to each node on > >>>> the > >>>> cluster. > >>>> > >>> Pig takes care of getting myJar.jar to the task nodes; you do not have > to > >>> worry about it. > >>> > >>> Alan. > >>> > >>> >
