Isnt FLATTEN similar to explode? On Sun, Jan 29, 2012 at 5:46 PM, Stan Rosenberg < [email protected]> wrote:
> Hi Jonathan, > > What you recommended below is not quite right. The right solution > would need to do something similar to 'explode'. > > Thanks, > > stan > > On Thu, Jan 26, 2012 at 3:04 PM, Jonathan Coveney <[email protected]> > wrote: > > I think this might give you what you want > > > > X = LOAD 'input.txt' using PigStorage(',') AS (id1:chararray, > > id2:chararray, id3:chararray, id4:chararray, id5:chararray); > > Y_0 = foreach X generate FLATTEN(TOBAG(*)); > > Y = filter Y_0 by $0 is not null; > > > > 2012/1/25 Prashant Kommireddi <[email protected]> > > > >> Sorry I misunderstood your initial question. You would have to write a > >> custom UDF to do this. > >> > >> Thanks, > >> Prashant > >> > >> On Jan 25, 2012, at 7:32 PM, Stan Rosenberg > >> <[email protected]> wrote: > >> > >> > To clarify, here is our input: > >> > > >> > X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray, > >> > id3:charrarray, id4:chararray, id5:chararray); > >> > > >> > We want to compute Y that consists of a single column denoting the set > >> > of all (non-null) ids coming from X. > >> > > >> > stan > >> > > >> > > >> > On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg > >> > <[email protected]> wrote: > >> >> I don't see how flatten would help in this case. > >> >> > >> >> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi > >> >> <[email protected]> wrote: > >> >>> Hi Stan, > >> >>> > >> >>> Would using FLATTEN and then DISTINCT work? > >> >>> > >> >>> Thanks, > >> >>> Prashant > >> >>> > >> >>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < > >> >>> [email protected]> wrote: > >> >>> > >> >>>> Hi Guys, > >> >>>> > >> >>>> I came across a use case that seems to require an 'explode' > operation > >> >>>> which to my knowledge is not currently available. > >> >>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples > >> >>>> (x), (y), (z). > >> >>>> > >> >>>> E.g., consider a relation that contains an arbitrary number of > >> >>>> different identifier columns, say, > >> >>>> social security id, student id, etc. We want to compute the set of > >> >>>> all distinct identifiers. Assume that the number of identifier > >> >>>> columns is large and intermingled with other > >> >>>> columns that should be projected out; this is to avoid a solution > >> >>>> using 'SPLIT', e.g. > >> >>>> > >> >>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is > such > >> >>>> a relation, then the answer we want is > >> >>>> Y={2,3,4,5}. > >> >>>> > >> >>>> Any suggestions? > >> >>>> > >> >>>> Thanks, > >> >>>> > >> >>>> stan > >> >>>> > >> > -- "...:::Aniket:::... Quetzalco@tl"
