FYI there's a ticket open already though it didn't see much action: https://issues.apache.org/jira/browse/PIG-1914
Perhaps the best thing would be to discuss implementation approaches, etc, there. D On Tue, Apr 19, 2011 at 12:11 PM, John Hui <[email protected]> wrote: > Really, cool. Let me take a look when I have some "downtime". If that's > the case, Xavier's parser is much better than mine. > > Who wants to take the lead in adding this to the piggybank, I am sure this > makes for a very useful "storage" utility. > > John > > On Tue, Apr 19, 2011 at 3:09 PM, Xavier Stevens <[email protected] > >wrote: > > > Hey John, > > > > If you take a look at mine it looks explicitly for Lists and converts > > them to DataBags. I ran into that issue with our data. That said I won't > > make any claims that it'll work for all data. > > > > Cheers, > > > > -Xavier > > > > On 4/19/11 12:02 PM, John Hui wrote: > > > I'll post my solution in a few hours =) > > > > > > On Tue, Apr 19, 2011 at 3:02 PM, John Hui <[email protected]> > wrote: > > > > > >> I don't think one parser will work for all solution. It really > depends > > on > > >> your data, since there might be a list within a list. > > >> > > >> But pick anyone as a starting point and customize it for your own json > > data > > >> format. > > >> > > >> > > >> On Tue, Apr 19, 2011 at 3:00 PM, Alan Gates <[email protected]> > > wrote: > > >> > > >>> On Apr 19, 2011, at 11:44 AM, Daniel Eklund wrote: > > >>> > > >>> <snip> > > >>>> A quick question about the UDF's registered at the top of a pig > > script: > > >>>> > > >>>> does > > >>>> REGISTER myJar.jar > > >>>> distribute the jar across HDFS (like a Hadoop job jar) so that the > > >>>> distribution of the code to the cluster nodes is transparent? > > >>>> In other words, do we NOT have to distribute myJar.jar to each node > on > > >>>> the > > >>>> cluster. > > >>>> > > >>> Pig takes care of getting myJar.jar to the task nodes; you do not > have > > to > > >>> worry about it. > > >>> > > >>> Alan. > > >>> > > >>> > > >
