I think this might give you what you want
X = LOAD 'input.txt' using PigStorage(',') AS (id1:chararray,
id2:chararray, id3:chararray, id4:chararray, id5:chararray);
Y_0 = foreach X generate FLATTEN(TOBAG(*));
Y = filter Y_0 by $0 is not null;
2012/1/25 Prashant Kommireddi <[email protected]>
> Sorry I misunderstood your initial question. You would have to write a
> custom UDF to do this.
>
> Thanks,
> Prashant
>
> On Jan 25, 2012, at 7:32 PM, Stan Rosenberg
> <[email protected]> wrote:
>
> > To clarify, here is our input:
> >
> > X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray,
> > id3:charrarray, id4:chararray, id5:chararray);
> >
> > We want to compute Y that consists of a single column denoting the set
> > of all (non-null) ids coming from X.
> >
> > stan
> >
> >
> > On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg
> > <[email protected]> wrote:
> >> I don't see how flatten would help in this case.
> >>
> >> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi
> >> <[email protected]> wrote:
> >>> Hi Stan,
> >>>
> >>> Would using FLATTEN and then DISTINCT work?
> >>>
> >>> Thanks,
> >>> Prashant
> >>>
> >>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg <
> >>> [email protected]> wrote:
> >>>
> >>>> Hi Guys,
> >>>>
> >>>> I came across a use case that seems to require an 'explode' operation
> >>>> which to my knowledge is not currently available.
> >>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples
> >>>> (x), (y), (z).
> >>>>
> >>>> E.g., consider a relation that contains an arbitrary number of
> >>>> different identifier columns, say,
> >>>> social security id, student id, etc. We want to compute the set of
> >>>> all distinct identifiers. Assume that the number of identifier
> >>>> columns is large and intermingled with other
> >>>> columns that should be projected out; this is to avoid a solution
> >>>> using 'SPLIT', e.g.
> >>>>
> >>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such
> >>>> a relation, then the answer we want is
> >>>> Y={2,3,4,5}.
> >>>>
> >>>> Any suggestions?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> stan
> >>>>
>