Ah, I see. There is no such function available right now. There is some discussion of such a feature here: https://issues.apache.org/jira/browse/PIG-1693 As you can see, there isn't yet a consensus on how such syntax would work. Feel free to weigh in.
-Dmitriy On Wed, Dec 29, 2010 at 9:12 PM, Eric Yang <[email protected]> wrote: > Hi Dmitriy, > > Issue filed: https://issues.apache.org/jira/browse/PIG-1782 > > I meant to say columns in my previous message. It should read as > "Make alteration of a column in a bug, but not specifying other > columns in the same bag". > > Let's assume PIG-1782 is address and CpuMetrics from PIG-1782 example > should contains 250 columns. > The next line that I write, would look like this: > > ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), > $1) as rowId, $2, $3, $4, $5, $6, $7, $8, $9, $10, ... $250; > > It would be nice if the statement can be written like this: > > ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), > $1) as rowID, MIRROR($2..$250); > > Is there something like this in pig built-in functions? > > regards, > Eric > > On Wed, Dec 29, 2010 at 6:09 PM, Dmitriy Ryaboy <[email protected]> > wrote: > > Hi Eric, > > Yes, we can certainly add the convention that a string without a ":" > refers > > to a complete column family. > > It should be fairly straightforward.. step 1 is to open a ticket on the > > Jira, step to is to do it :). > > > > I am not sure what you mean by "make alteration of a tuple in a bag, but > not > > specifying other tuples in the same bag" -- can you provide an example > that > > illustrates what you want to do? > > > > Thanks, > > -Dmitriy > > > > On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[email protected]> wrote: > > > >> Hi, > >> > >> Consider this use case: > >> > >> There is a program store cpu usage metrics to a HBase table. This > >> HBase table has a column family called cpu, and individual cpu core > >> usage is stored in columns like, cpu:user.0, cpu:user.1 etc. The > >> suffix number represent unique cpu core id in the system. > >> > >> While it is possible to write query like: > >> > >> SystemMetrics = load 'hbase://SystemMetrics' USING > >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster > >> cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS > >> (rowKey: chararray, cluster: chararray, cpuCombined0:float, > >> cpuCombined1:float ... LoadAverage:float); > >> > >> To get a long list of columns to load and specify the same list in > >> group by command like: > >> > >> CleanseBuffer = foreach SystemMetrics generate > >> REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0, > >> cpuCombined1, ..., LoadAverage; > >> > >> The syntax works fine, but it would be nice to load all columns of a > >> given column family without specifying individual columns. > >> > >> i.e. SystemMetrics = load 'hbase://SystemMetrics' USING > >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu > >> system'); > >> > >> Is this syntax possible to implement in pig? > >> > >> Second question, is it possible to make alteration of a tuple in a > >> bag, but not specifying other tuples in the same bag? > >> > >> For large column tables, it would be nice if there is short hand > >> syntax to make pig syntax shorter to write. > >> Any tip on making foreach and group by shorter? Thanks > >> > >> regards, > >> Eric > >> > > >
