Thanks for the pointer. :) regards, Eric
On Thu, Dec 30, 2010 at 2:15 AM, Dmitriy Ryaboy <[email protected]> wrote: > Ah, I see. There is no such function available right now. > There is some discussion of such a feature here: > https://issues.apache.org/jira/browse/PIG-1693 > As you can see, there isn't yet a consensus on how such syntax would work. > Feel free to weigh in. > > -Dmitriy > > On Wed, Dec 29, 2010 at 9:12 PM, Eric Yang <[email protected]> wrote: > >> Hi Dmitriy, >> >> Issue filed: https://issues.apache.org/jira/browse/PIG-1782 >> >> I meant to say columns in my previous message. It should read as >> "Make alteration of a column in a bug, but not specifying other >> columns in the same bag". >> >> Let's assume PIG-1782 is address and CpuMetrics from PIG-1782 example >> should contains 250 columns. >> The next line that I write, would look like this: >> >> ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), >> $1) as rowId, $2, $3, $4, $5, $6, $7, $8, $9, $10, ... $250; >> >> It would be nice if the statement can be written like this: >> >> ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'), >> $1) as rowID, MIRROR($2..$250); >> >> Is there something like this in pig built-in functions? >> >> regards, >> Eric >> >> On Wed, Dec 29, 2010 at 6:09 PM, Dmitriy Ryaboy <[email protected]> >> wrote: >> > Hi Eric, >> > Yes, we can certainly add the convention that a string without a ":" >> refers >> > to a complete column family. >> > It should be fairly straightforward.. step 1 is to open a ticket on the >> > Jira, step to is to do it :). >> > >> > I am not sure what you mean by "make alteration of a tuple in a bag, but >> not >> > specifying other tuples in the same bag" -- can you provide an example >> that >> > illustrates what you want to do? >> > >> > Thanks, >> > -Dmitriy >> > >> > On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[email protected]> wrote: >> > >> >> Hi, >> >> >> >> Consider this use case: >> >> >> >> There is a program store cpu usage metrics to a HBase table. This >> >> HBase table has a column family called cpu, and individual cpu core >> >> usage is stored in columns like, cpu:user.0, cpu:user.1 etc. The >> >> suffix number represent unique cpu core id in the system. >> >> >> >> While it is possible to write query like: >> >> >> >> SystemMetrics = load 'hbase://SystemMetrics' USING >> >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster >> >> cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS >> >> (rowKey: chararray, cluster: chararray, cpuCombined0:float, >> >> cpuCombined1:float ... LoadAverage:float); >> >> >> >> To get a long list of columns to load and specify the same list in >> >> group by command like: >> >> >> >> CleanseBuffer = foreach SystemMetrics generate >> >> REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0, >> >> cpuCombined1, ..., LoadAverage; >> >> >> >> The syntax works fine, but it would be nice to load all columns of a >> >> given column family without specifying individual columns. >> >> >> >> i.e. SystemMetrics = load 'hbase://SystemMetrics' USING >> >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu >> >> system'); >> >> >> >> Is this syntax possible to implement in pig? >> >> >> >> Second question, is it possible to make alteration of a tuple in a >> >> bag, but not specifying other tuples in the same bag? >> >> >> >> For large column tables, it would be nice if there is short hand >> >> syntax to make pig syntax shorter to write. >> >> Any tip on making foreach and group by shorter? Thanks >> >> >> >> regards, >> >> Eric >> >> >> > >> >
