The hadoop version: hadoop-0.20-0.20.1+169.68-1
On Fri, Apr 2, 2010 at 2:33 PM, hc busy <hc.b...@gmail.com> wrote: > Okay guys some details after some digging. We've got this version of pig > from CDH2 installed: > > hadoop-pig-0.5.0+11.1-1 > > > the list of patches that they applied on top of 0.5.0 are listed here: > > http://archive.cloudera.com/cdh/2/pig-0.5.0+11.1.CHANGES.txt > > <http://archive.cloudera.com/cdh/2/pig-0.5.0+11.1.CHANGES.txt>The patches > listed there doesn't seem to deal with FLATTEN in any way. > > Any suggestions? > > > > > On Fri, Apr 2, 2010 at 1:49 PM, hc busy <hc.b...@gmail.com> wrote: > >> >> .... yeah, you have to implement outputSchema() method on the udf in order >> to make the content of the tuple visible... There's a nice example in the >> UDF Manual >> >> http://hadoop.apache.org/pig/docs/r0.6.0/udf.html >> >> <http://hadoop.apache.org/pig/docs/r0.6.0/udf.html>search for 'package >> myudf' until u find it. >> >> >> >> On Fri, Apr 2, 2010 at 12:52 PM, Russell Jurney <russell.jur...@gmail.com >> > wrote: >> >>> Not sure if this is exactly the same, but when I've created tuples within >>> tuples in UDFs (to preserve order of pairs), from bag input, Pig has >>> allowed >>> it - but I can't work with that data in subsequent steps. >>> >>> On Fri, Apr 2, 2010 at 12:37 PM, hc busy <hc.b...@gmail.com> wrote: >>> >>> > Yeah, I'm sure it has nested tuples. Pig doesn't natively support >>> > introduction of tuples >>> > >>> > h = foreach g generate ((x,y,z)), (x), ((((x)))) >>> > >>> > doesn't work, but i have a udf that does that.... don't ask why...., >>> and >>> > I've seen it print double pair of paren's when I took a dump. >>> > >>> > Our hadoop guys here says it's CDH2 and that the "upgrade" was just >>> > re-installation of CDH2... ("same jars") But certainly my script >>> suddenly >>> > started doing weird things when it flattened that all the way through. >>> > >>> > I'd support the prior behavior as well, because that seems to match my >>> > reading of documentation on behavior of FLATTEN. >>> > >>> > >>> > >>> > Has anybody else had this problem with recent cloudera/pig versions? >>> > >>> > >>> > thnx!! >>> > >>> > >>> > On Fri, Apr 2, 2010 at 11:43 AM, zaki rahaman <zaki.raha...@gmail.com >>> > >wrote: >>> > >>> > > Stupid question but are you sure your bag has the dual sets of >>> > parentheses? >>> > > (And if I may ask, why is that the case?) >>> > > >>> > > On Fri, Apr 2, 2010 at 2:11 PM, zaki rahaman <zaki.raha...@gmail.com >>> > >>> > > wrote: >>> > > >>> > > > If I'm not mistaken, the output is the expected behavior. Flatten >>> > should >>> > > > unnest bags. I'm assuming your statement is something like FOREACH >>> ... >>> > > > GENERATE field1, field2, FLATTEN(bag1) which would 'duplicate' the >>> > first >>> > > two >>> > > > fields of a tuple for every tuple in the nested bag. >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > On Fri, Apr 2, 2010 at 2:02 PM, hc busy <hc.b...@gmail.com> wrote: >>> > > > >>> > > >> doh!!!! s/map/bag/g >>> > > >> >>> > > >> I seem to get maps and bags mixed up or some reason... >>> > > >> >>> > > >> Guys, I have a row containing a *bag* >>> > > >> >>> > > >> 'id','data', {((1,2)), ((2,3)), ((4,5))} >>> > > >> >>> > > >> What is the expected behavior when I flatten on that bag? I had >>> > expected >>> > > >> it >>> > > >> to result in >>> > > >> >>> > > >> 'id','data', (1,2) >>> > > >> 'id','data', (2,3) >>> > > >> 'id','data', (4,5) >>> > > >> >>> > > >> >>> > > >> But it appears to me that the result of applying FLATTEN to that >>> bag >>> > is >>> > > >> this >>> > > >> instead: >>> > > >> >>> > > >> 'id','data', 1,2 >>> > > >> 'id','data', 2,3 >>> > > >> 'id','data', 4,5 >>> > > >> >>> > > >> >>> > > >> The latter is returned by the current cloudera's CDH2 and I've >>> seen >>> > the >>> > > >> prior behavior on other versions of pig. >>> > > >> >>> > > >> Which is the correct behavior by design? >>> > > >> >>> > > >> What will pig 0.6 do when it is released? >>> > > >> >>> > > >> thanks! >>> > > >> On Fri, Apr 2, 2010 at 11:29 AM, hc busy <hc.b...@gmail.com> >>> wrote: >>> > > >> >>> > > >> > Guys, I have a row containing a map >>> > > >> > >>> > > >> > 'id','data', {((1,2)), ((2,3)), ((4,5))} >>> > > >> > >>> > > >> > What is the expected behavior when I flatten on that bag? I had >>> > > expected >>> > > >> it >>> > > >> > to result in >>> > > >> > >>> > > >> > 'id','data', (1,2) >>> > > >> > 'id','data', (2,3) >>> > > >> > 'id','data', (4,5) >>> > > >> > >>> > > >> > >>> > > >> > But it appears to me that the result of applying FLATTEN to that >>> bag >>> > > is >>> > > >> > this instead: >>> > > >> > >>> > > >> > 'id','data', 1,2 >>> > > >> > 'id','data', 2,3 >>> > > >> > 'id','data', 4,5 >>> > > >> > >>> > > >> > >>> > > >> > The latter is returned by the current cloudera's CDH2 and I've >>> seen >>> > > the >>> > > >> > prior behavior on other versions of pig. >>> > > >> > >>> > > >> > Which is the correct behavior by design? >>> > > >> > >>> > > >> > What will pig 0.6 do when it is released? >>> > > >> > >>> > > >> > thanks! >>> > > >> > >>> > > >> >>> > > > >>> > > > >>> > > > >>> > > > -- >>> > > > Zaki Rahaman >>> > > > >>> > > > >>> > > >>> > > >>> > > -- >>> > > Zaki Rahaman >>> > > >>> > >>> >> >> >