Hi Jonathan, What you recommended below is not quite right. The right solution would need to do something similar to 'explode'.
Thanks, stan On Thu, Jan 26, 2012 at 3:04 PM, Jonathan Coveney <[email protected]> wrote: > I think this might give you what you want > > X = LOAD 'input.txt' using PigStorage(',') AS (id1:chararray, > id2:chararray, id3:chararray, id4:chararray, id5:chararray); > Y_0 = foreach X generate FLATTEN(TOBAG(*)); > Y = filter Y_0 by $0 is not null; > > 2012/1/25 Prashant Kommireddi <[email protected]> > >> Sorry I misunderstood your initial question. You would have to write a >> custom UDF to do this. >> >> Thanks, >> Prashant >> >> On Jan 25, 2012, at 7:32 PM, Stan Rosenberg >> <[email protected]> wrote: >> >> > To clarify, here is our input: >> > >> > X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray, >> > id3:charrarray, id4:chararray, id5:chararray); >> > >> > We want to compute Y that consists of a single column denoting the set >> > of all (non-null) ids coming from X. >> > >> > stan >> > >> > >> > On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg >> > <[email protected]> wrote: >> >> I don't see how flatten would help in this case. >> >> >> >> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi >> >> <[email protected]> wrote: >> >>> Hi Stan, >> >>> >> >>> Would using FLATTEN and then DISTINCT work? >> >>> >> >>> Thanks, >> >>> Prashant >> >>> >> >>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < >> >>> [email protected]> wrote: >> >>> >> >>>> Hi Guys, >> >>>> >> >>>> I came across a use case that seems to require an 'explode' operation >> >>>> which to my knowledge is not currently available. >> >>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples >> >>>> (x), (y), (z). >> >>>> >> >>>> E.g., consider a relation that contains an arbitrary number of >> >>>> different identifier columns, say, >> >>>> social security id, student id, etc. We want to compute the set of >> >>>> all distinct identifiers. Assume that the number of identifier >> >>>> columns is large and intermingled with other >> >>>> columns that should be projected out; this is to avoid a solution >> >>>> using 'SPLIT', e.g. >> >>>> >> >>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such >> >>>> a relation, then the answer we want is >> >>>> Y={2,3,4,5}. >> >>>> >> >>>> Any suggestions? >> >>>> >> >>>> Thanks, >> >>>> >> >>>> stan >> >>>> >>
