Sorry I misunderstood your initial question. You would have to write a custom UDF to do this.
Thanks, Prashant On Jan 25, 2012, at 7:32 PM, Stan Rosenberg <[email protected]> wrote: > To clarify, here is our input: > > X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray, > id3:charrarray, id4:chararray, id5:chararray); > > We want to compute Y that consists of a single column denoting the set > of all (non-null) ids coming from X. > > stan > > > On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg > <[email protected]> wrote: >> I don't see how flatten would help in this case. >> >> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi >> <[email protected]> wrote: >>> Hi Stan, >>> >>> Would using FLATTEN and then DISTINCT work? >>> >>> Thanks, >>> Prashant >>> >>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < >>> [email protected]> wrote: >>> >>>> Hi Guys, >>>> >>>> I came across a use case that seems to require an 'explode' operation >>>> which to my knowledge is not currently available. >>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples >>>> (x), (y), (z). >>>> >>>> E.g., consider a relation that contains an arbitrary number of >>>> different identifier columns, say, >>>> social security id, student id, etc. We want to compute the set of >>>> all distinct identifiers. Assume that the number of identifier >>>> columns is large and intermingled with other >>>> columns that should be projected out; this is to avoid a solution >>>> using 'SPLIT', e.g. >>>> >>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such >>>> a relation, then the answer we want is >>>> Y={2,3,4,5}. >>>> >>>> Any suggestions? >>>> >>>> Thanks, >>>> >>>> stan >>>>
