Okay guys some details after some digging. We've got this version of pig from CDH2 installed:
hadoop-pig-0.5.0+11.1-1 the list of patches that they applied on top of 0.5.0 are listed here: http://archive.cloudera.com/cdh/2/pig-0.5.0+11.1.CHANGES.txt <http://archive.cloudera.com/cdh/2/pig-0.5.0+11.1.CHANGES.txt>The patches listed there doesn't seem to deal with FLATTEN in any way. Any suggestions? On Fri, Apr 2, 2010 at 1:49 PM, hc busy <hc.b...@gmail.com> wrote: > > .... yeah, you have to implement outputSchema() method on the udf in order > to make the content of the tuple visible... There's a nice example in the > UDF Manual > > http://hadoop.apache.org/pig/docs/r0.6.0/udf.html > > <http://hadoop.apache.org/pig/docs/r0.6.0/udf.html>search for 'package > myudf' until u find it. > > > > On Fri, Apr 2, 2010 at 12:52 PM, Russell Jurney > <russell.jur...@gmail.com>wrote: > >> Not sure if this is exactly the same, but when I've created tuples within >> tuples in UDFs (to preserve order of pairs), from bag input, Pig has >> allowed >> it - but I can't work with that data in subsequent steps. >> >> On Fri, Apr 2, 2010 at 12:37 PM, hc busy <hc.b...@gmail.com> wrote: >> >> > Yeah, I'm sure it has nested tuples. Pig doesn't natively support >> > introduction of tuples >> > >> > h = foreach g generate ((x,y,z)), (x), ((((x)))) >> > >> > doesn't work, but i have a udf that does that.... don't ask why...., and >> > I've seen it print double pair of paren's when I took a dump. >> > >> > Our hadoop guys here says it's CDH2 and that the "upgrade" was just >> > re-installation of CDH2... ("same jars") But certainly my script >> suddenly >> > started doing weird things when it flattened that all the way through. >> > >> > I'd support the prior behavior as well, because that seems to match my >> > reading of documentation on behavior of FLATTEN. >> > >> > >> > >> > Has anybody else had this problem with recent cloudera/pig versions? >> > >> > >> > thnx!! >> > >> > >> > On Fri, Apr 2, 2010 at 11:43 AM, zaki rahaman <zaki.raha...@gmail.com >> > >wrote: >> > >> > > Stupid question but are you sure your bag has the dual sets of >> > parentheses? >> > > (And if I may ask, why is that the case?) >> > > >> > > On Fri, Apr 2, 2010 at 2:11 PM, zaki rahaman <zaki.raha...@gmail.com> >> > > wrote: >> > > >> > > > If I'm not mistaken, the output is the expected behavior. Flatten >> > should >> > > > unnest bags. I'm assuming your statement is something like FOREACH >> ... >> > > > GENERATE field1, field2, FLATTEN(bag1) which would 'duplicate' the >> > first >> > > two >> > > > fields of a tuple for every tuple in the nested bag. >> > > > >> > > > >> > > > >> > > > >> > > > On Fri, Apr 2, 2010 at 2:02 PM, hc busy <hc.b...@gmail.com> wrote: >> > > > >> > > >> doh!!!! s/map/bag/g >> > > >> >> > > >> I seem to get maps and bags mixed up or some reason... >> > > >> >> > > >> Guys, I have a row containing a *bag* >> > > >> >> > > >> 'id','data', {((1,2)), ((2,3)), ((4,5))} >> > > >> >> > > >> What is the expected behavior when I flatten on that bag? I had >> > expected >> > > >> it >> > > >> to result in >> > > >> >> > > >> 'id','data', (1,2) >> > > >> 'id','data', (2,3) >> > > >> 'id','data', (4,5) >> > > >> >> > > >> >> > > >> But it appears to me that the result of applying FLATTEN to that >> bag >> > is >> > > >> this >> > > >> instead: >> > > >> >> > > >> 'id','data', 1,2 >> > > >> 'id','data', 2,3 >> > > >> 'id','data', 4,5 >> > > >> >> > > >> >> > > >> The latter is returned by the current cloudera's CDH2 and I've seen >> > the >> > > >> prior behavior on other versions of pig. >> > > >> >> > > >> Which is the correct behavior by design? >> > > >> >> > > >> What will pig 0.6 do when it is released? >> > > >> >> > > >> thanks! >> > > >> On Fri, Apr 2, 2010 at 11:29 AM, hc busy <hc.b...@gmail.com> >> wrote: >> > > >> >> > > >> > Guys, I have a row containing a map >> > > >> > >> > > >> > 'id','data', {((1,2)), ((2,3)), ((4,5))} >> > > >> > >> > > >> > What is the expected behavior when I flatten on that bag? I had >> > > expected >> > > >> it >> > > >> > to result in >> > > >> > >> > > >> > 'id','data', (1,2) >> > > >> > 'id','data', (2,3) >> > > >> > 'id','data', (4,5) >> > > >> > >> > > >> > >> > > >> > But it appears to me that the result of applying FLATTEN to that >> bag >> > > is >> > > >> > this instead: >> > > >> > >> > > >> > 'id','data', 1,2 >> > > >> > 'id','data', 2,3 >> > > >> > 'id','data', 4,5 >> > > >> > >> > > >> > >> > > >> > The latter is returned by the current cloudera's CDH2 and I've >> seen >> > > the >> > > >> > prior behavior on other versions of pig. >> > > >> > >> > > >> > Which is the correct behavior by design? >> > > >> > >> > > >> > What will pig 0.6 do when it is released? >> > > >> > >> > > >> > thanks! >> > > >> > >> > > >> >> > > > >> > > > >> > > > >> > > > -- >> > > > Zaki Rahaman >> > > > >> > > > >> > > >> > > >> > > -- >> > > Zaki Rahaman >> > > >> > >> > >