Re: Merging multiple columns into 2 columns

Dmitriy Ryaboy Mon, 25 Jul 2011 12:51:35 -0700

Sounds like you need a udf that takes the hash returned by Pig when you load
a column family, and returns a bag of column names, each column name
repeated as many times as indicated by the value of the column. You would
then flatten the result of this udf.


D

On Mon, Jul 25, 2011 at 11:01 AM, Juan Martin Pampliega <
[email protected]> wrote:

> I have data in an HBase table in stored in the following format:
>
> rowkey  group_id:1 group_id:2       ...  group_id:n
> 2fcab50712467eab4004583eb8fb7f89 1 0 1
> 085125e8f7cdc99fd91dbd7280373c5b 0 1 0
> dd53e23487da03fd02396306d248cda0 2 1 0
>
> where the column family group_id contains one column for each set of data
> and the number is the number of times that the hash is present in the set
> of
> data.
>
> I need to reformat the data and obtain the output in the following format:
>
> hash group_id
> 2fcab50712467eab4004583eb8fb7f89             1
> dd53e23487da03fd02396306d248cda0             1
> dd53e23487da03fd02396306d248cda0             1
> 085125e8f7cdc99fd91dbd7280373c5b             2
> dd53e23487da03fd02396306d248cda0             2
> ...
> 2fcab50712467eab4004583eb8fb7f89             n
>
> Any ideas on how to achieve this? I'm really at a loss here.
>

Re: Merging multiple columns into 2 columns

Reply via email to