Hi Eric, Yes, we can certainly add the convention that a string without a ":" refers to a complete column family. It should be fairly straightforward.. step 1 is to open a ticket on the Jira, step to is to do it :).
I am not sure what you mean by "make alteration of a tuple in a bag, but not specifying other tuples in the same bag" -- can you provide an example that illustrates what you want to do? Thanks, -Dmitriy On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[email protected]> wrote: > Hi, > > Consider this use case: > > There is a program store cpu usage metrics to a HBase table. This > HBase table has a column family called cpu, and individual cpu core > usage is stored in columns like, cpu:user.0, cpu:user.1 etc. The > suffix number represent unique cpu core id in the system. > > While it is possible to write query like: > > SystemMetrics = load 'hbase://SystemMetrics' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster > cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS > (rowKey: chararray, cluster: chararray, cpuCombined0:float, > cpuCombined1:float ... LoadAverage:float); > > To get a long list of columns to load and specify the same list in > group by command like: > > CleanseBuffer = foreach SystemMetrics generate > REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0, > cpuCombined1, ..., LoadAverage; > > The syntax works fine, but it would be nice to load all columns of a > given column family without specifying individual columns. > > i.e. SystemMetrics = load 'hbase://SystemMetrics' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu > system'); > > Is this syntax possible to implement in pig? > > Second question, is it possible to make alteration of a tuple in a > bag, but not specifying other tuples in the same bag? > > For large column tables, it would be nice if there is short hand > syntax to make pig syntax shorter to write. > Any tip on making foreach and group by shorter? Thanks > > regards, > Eric >
