Hi Dmitriy, Issue filed: https://issues.apache.org/jira/browse/PIG-1782
I meant to say columns in my previous message. It should read as "Make alteration of a column in a bug, but not specifying other columns in the same bag". Let's assume PIG-1782 is address and CpuMetrics from PIG-1782 example should contains 250 columns. The next line that I write, would look like this: ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), $1) as rowId, $2, $3, $4, $5, $6, $7, $8, $9, $10, ... $250; It would be nice if the statement can be written like this: ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), $1) as rowID, MIRROR($2..$250); Is there something like this in pig built-in functions? regards, Eric On Wed, Dec 29, 2010 at 6:09 PM, Dmitriy Ryaboy <[email protected]> wrote: > Hi Eric, > Yes, we can certainly add the convention that a string without a ":" refers > to a complete column family. > It should be fairly straightforward.. step 1 is to open a ticket on the > Jira, step to is to do it :). > > I am not sure what you mean by "make alteration of a tuple in a bag, but not > specifying other tuples in the same bag" -- can you provide an example that > illustrates what you want to do? > > Thanks, > -Dmitriy > > On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[email protected]> wrote: > >> Hi, >> >> Consider this use case: >> >> There is a program store cpu usage metrics to a HBase table. This >> HBase table has a column family called cpu, and individual cpu core >> usage is stored in columns like, cpu:user.0, cpu:user.1 etc. The >> suffix number represent unique cpu core id in the system. >> >> While it is possible to write query like: >> >> SystemMetrics = load 'hbase://SystemMetrics' USING >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster >> cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS >> (rowKey: chararray, cluster: chararray, cpuCombined0:float, >> cpuCombined1:float ... LoadAverage:float); >> >> To get a long list of columns to load and specify the same list in >> group by command like: >> >> CleanseBuffer = foreach SystemMetrics generate >> REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0, >> cpuCombined1, ..., LoadAverage; >> >> The syntax works fine, but it would be nice to load all columns of a >> given column family without specifying individual columns. >> >> i.e. SystemMetrics = load 'hbase://SystemMetrics' USING >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu >> system'); >> >> Is this syntax possible to implement in pig? >> >> Second question, is it possible to make alteration of a tuple in a >> bag, but not specifying other tuples in the same bag? >> >> For large column tables, it would be nice if there is short hand >> syntax to make pig syntax shorter to write. >> Any tip on making foreach and group by shorter? Thanks >> >> regards, >> Eric >> >
