Re: What should FLATTEN do?

hc busy Fri, 02 Apr 2010 14:34:23 -0700

The hadoop version:

hadoop-0.20-0.20.1+169.68-1


On Fri, Apr 2, 2010 at 2:33 PM, hc busy <hc.b...@gmail.com> wrote:

> Okay guys some details after some digging. We've got this version of  pig
> from CDH2 installed:
>
> hadoop-pig-0.5.0+11.1-1
>
>
> the list of patches that they applied on top of 0.5.0 are listed here:
>
> http://archive.cloudera.com/cdh/2/pig-0.5.0+11.1.CHANGES.txt
>
> <http://archive.cloudera.com/cdh/2/pig-0.5.0+11.1.CHANGES.txt>The patches
> listed there doesn't seem to deal with FLATTEN in any way.
>
> Any suggestions?
>
>
>
>
> On Fri, Apr 2, 2010 at 1:49 PM, hc busy <hc.b...@gmail.com> wrote:
>
>>
>> .... yeah, you have to implement outputSchema() method on the udf in order
>> to make the content of the tuple visible... There's a nice example in the
>> UDF Manual
>>
>> http://hadoop.apache.org/pig/docs/r0.6.0/udf.html
>>
>> <http://hadoop.apache.org/pig/docs/r0.6.0/udf.html>search for 'package
>> myudf' until u find it.
>>
>>
>>
>> On Fri, Apr 2, 2010 at 12:52 PM, Russell Jurney <russell.jur...@gmail.com
>> > wrote:
>>
>>> Not sure if this is exactly the same, but when I've created tuples within
>>> tuples in UDFs (to preserve order of pairs), from bag input, Pig has
>>> allowed
>>> it - but I can't work with that data in subsequent steps.
>>>
>>> On Fri, Apr 2, 2010 at 12:37 PM, hc busy <hc.b...@gmail.com> wrote:
>>>
>>> > Yeah, I'm sure it has nested tuples. Pig doesn't natively support
>>> > introduction of tuples
>>> >
>>> > h = foreach g generate ((x,y,z)), (x), ((((x))))
>>> >
>>> > doesn't work, but i have a udf that does that.... don't ask why....,
>>> and
>>> > I've seen it print double pair of paren's when I took a dump.
>>> >
>>> > Our hadoop guys here says it's CDH2 and that the "upgrade" was just
>>> > re-installation of CDH2... ("same jars") But certainly my script
>>> suddenly
>>> > started doing weird things when it flattened that all the way through.
>>> >
>>> > I'd support the prior behavior as well, because that seems to match my
>>> > reading of documentation on behavior of FLATTEN.
>>> >
>>> >
>>> >
>>> > Has anybody else had this problem with recent cloudera/pig versions?
>>> >
>>> >
>>> > thnx!!
>>> >
>>> >
>>> > On Fri, Apr 2, 2010 at 11:43 AM, zaki rahaman <zaki.raha...@gmail.com
>>> > >wrote:
>>> >
>>> > > Stupid question but are you sure your bag has the dual sets of
>>> > parentheses?
>>> > > (And if I may ask, why is that the case?)
>>> > >
>>> > > On Fri, Apr 2, 2010 at 2:11 PM, zaki rahaman <zaki.raha...@gmail.com
>>> >
>>> > > wrote:
>>> > >
>>> > > > If I'm not mistaken, the output is the expected behavior. Flatten
>>> > should
>>> > > > unnest bags. I'm assuming your statement is something like FOREACH
>>> ...
>>> > > > GENERATE field1, field2, FLATTEN(bag1) which would 'duplicate' the
>>> > first
>>> > > two
>>> > > > fields of a tuple for every tuple in the nested bag.
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Fri, Apr 2, 2010 at 2:02 PM, hc busy <hc.b...@gmail.com> wrote:
>>> > > >
>>> > > >> doh!!!! s/map/bag/g
>>> > > >>
>>> > > >> I seem to get maps and bags mixed up or some reason...
>>> > > >>
>>> > > >> Guys, I have a row containing a *bag*
>>> > > >>
>>> > > >> 'id','data', {((1,2)), ((2,3)), ((4,5))}
>>> > > >>
>>> > > >> What is the expected behavior when I flatten on that bag? I had
>>> > expected
>>> > > >> it
>>> > > >> to result in
>>> > > >>
>>> > > >> 'id','data', (1,2)
>>> > > >> 'id','data', (2,3)
>>> > > >> 'id','data', (4,5)
>>> > > >>
>>> > > >>
>>> > > >> But it appears to me that the result of applying FLATTEN to that
>>> bag
>>> > is
>>> > > >> this
>>> > > >> instead:
>>> > > >>
>>> > > >> 'id','data', 1,2
>>> > > >> 'id','data', 2,3
>>> > > >> 'id','data', 4,5
>>> > > >>
>>> > > >>
>>> > > >> The latter is returned by the current cloudera's CDH2 and I've
>>> seen
>>> > the
>>> > > >> prior behavior on other versions of pig.
>>> > > >>
>>> > > >> Which is the correct behavior by design?
>>> > > >>
>>> > > >> What will pig 0.6 do when it is released?
>>> > > >>
>>> > > >> thanks!
>>> > > >> On Fri, Apr 2, 2010 at 11:29 AM, hc busy <hc.b...@gmail.com>
>>> wrote:
>>> > > >>
>>> > > >> > Guys, I have a row containing a map
>>> > > >> >
>>> > > >> > 'id','data', {((1,2)), ((2,3)), ((4,5))}
>>> > > >> >
>>> > > >> > What is the expected behavior when I flatten on that bag? I had
>>> > > expected
>>> > > >> it
>>> > > >> > to result in
>>> > > >> >
>>> > > >> > 'id','data', (1,2)
>>> > > >> > 'id','data', (2,3)
>>> > > >> > 'id','data', (4,5)
>>> > > >> >
>>> > > >> >
>>> > > >> > But it appears to me that the result of applying FLATTEN to that
>>> bag
>>> > > is
>>> > > >> > this instead:
>>> > > >> >
>>> > > >> > 'id','data', 1,2
>>> > > >> > 'id','data', 2,3
>>> > > >> > 'id','data', 4,5
>>> > > >> >
>>> > > >> >
>>> > > >> > The latter is returned by the current cloudera's CDH2 and I've
>>> seen
>>> > > the
>>> > > >> > prior behavior on other versions of pig.
>>> > > >> >
>>> > > >> > Which is the correct behavior by design?
>>> > > >> >
>>> > > >> > What will pig 0.6 do when it is released?
>>> > > >> >
>>> > > >> > thanks!
>>> > > >> >
>>> > > >>
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Zaki Rahaman
>>> > > >
>>> > > >
>>> > >
>>> > >
>>> > > --
>>> > > Zaki Rahaman
>>> > >
>>> >
>>>
>>
>>
>

Re: What should FLATTEN do?

Reply via email to