Sorry I misunderstood your initial question. You would have to write a
custom UDF to do this.

Thanks,
Prashant

On Jan 25, 2012, at 7:32 PM, Stan Rosenberg
<[email protected]> wrote:

> To clarify, here is our input:
>
> X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray,
> id3:charrarray, id4:chararray, id5:chararray);
>
> We want to compute Y that consists of a single column denoting the set
> of all (non-null) ids coming from X.
>
> stan
>
>
> On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg
> <[email protected]> wrote:
>> I don't see how flatten would help in this case.
>>
>> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi
>> <[email protected]> wrote:
>>> Hi Stan,
>>>
>>> Would using FLATTEN and then DISTINCT work?
>>>
>>> Thanks,
>>> Prashant
>>>
>>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg <
>>> [email protected]> wrote:
>>>
>>>> Hi Guys,
>>>>
>>>> I came across a use case that seems to require an 'explode' operation
>>>> which to my knowledge is not currently available.
>>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples
>>>> (x), (y), (z).
>>>>
>>>> E.g., consider a relation that contains an arbitrary number of
>>>> different identifier columns, say,
>>>> social security id, student id, etc.  We want to compute the set of
>>>> all distinct identifiers.  Assume that the number of identifier
>>>> columns is large and intermingled with other
>>>> columns that should be projected out; this is to avoid a solution
>>>> using 'SPLIT', e.g.
>>>>
>>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such
>>>> a relation, then the answer we want is
>>>> Y={2,3,4,5}.
>>>>
>>>> Any suggestions?
>>>>
>>>> Thanks,
>>>>
>>>> stan
>>>>

Reply via email to