Sounds like you need a udf that takes the hash returned by Pig when you load a column family, and returns a bag of column names, each column name repeated as many times as indicated by the value of the column. You would then flatten the result of this udf.
D On Mon, Jul 25, 2011 at 11:01 AM, Juan Martin Pampliega < [email protected]> wrote: > I have data in an HBase table in stored in the following format: > > rowkey group_id:1 group_id:2 ... group_id:n > 2fcab50712467eab4004583eb8fb7f89 1 0 1 > 085125e8f7cdc99fd91dbd7280373c5b 0 1 0 > dd53e23487da03fd02396306d248cda0 2 1 0 > > where the column family group_id contains one column for each set of data > and the number is the number of times that the hash is present in the set > of > data. > > I need to reformat the data and obtain the output in the following format: > > hash group_id > 2fcab50712467eab4004583eb8fb7f89 1 > dd53e23487da03fd02396306d248cda0 1 > dd53e23487da03fd02396306d248cda0 1 > 085125e8f7cdc99fd91dbd7280373c5b 2 > dd53e23487da03fd02396306d248cda0 2 > ... > 2fcab50712467eab4004583eb8fb7f89 n > > Any ideas on how to achieve this? I'm really at a loss here. >
