Re: Syntax and HBaseStorage questions

Dmitriy Ryaboy Thu, 30 Dec 2010 02:16:04 -0800

Ah, I see. There is no such function available right now.
There is some discussion of such a feature here:
https://issues.apache.org/jira/browse/PIG-1693
As you can see, there isn't yet a consensus on how such syntax would work.
Feel free to weigh in.


-Dmitriy

On Wed, Dec 29, 2010 at 9:12 PM, Eric Yang <[email protected]> wrote:

> Hi Dmitriy,
>
> Issue filed: https://issues.apache.org/jira/browse/PIG-1782
>
> I meant to say columns in my previous message.  It should read as
> "Make alteration of a column in a bug, but not specifying other
> columns in the same bag".
>
> Let's assume PIG-1782 is address and CpuMetrics from PIG-1782 example
> should contains 250 columns.
> The next line that I write, would look like this:
>
> ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'),
> $1) as rowId, $2, $3, $4, $5, $6, $7, $8, $9, $10, ... $250;
>
> It would be nice if the statement can be written like this:
>
> ConcatBuffer = foreach CpuMentrics generate CONCAT(CONCAT($0, '-'),
> $1) as rowID, MIRROR($2..$250);
>
> Is there something like this in pig built-in functions?
>
> regards,
> Eric
>
> On Wed, Dec 29, 2010 at 6:09 PM, Dmitriy Ryaboy <[email protected]>
> wrote:
> > Hi Eric,
> > Yes, we can certainly add the convention that a string without a ":"
> refers
> > to a complete column family.
> > It should be fairly straightforward.. step 1 is to open a ticket on the
> > Jira, step to is to do it :).
> >
> > I am not sure what you mean by "make alteration of a tuple in a bag, but
> not
> > specifying other tuples in the same bag" -- can you provide an example
> that
> > illustrates what you want to do?
> >
> > Thanks,
> > -Dmitriy
> >
> > On Tue, Dec 28, 2010 at 11:10 PM, Eric Yang <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> Consider this use case:
> >>
> >> There is a program store cpu usage metrics to a HBase table.  This
> >> HBase table has a column family called cpu, and individual cpu core
> >> usage is stored in columns like, cpu:user.0, cpu:user.1 etc.  The
> >> suffix number represent unique cpu core id in the system.
> >>
> >> While it is possible to write query like:
> >>
> >> SystemMetrics = load 'hbase://SystemMetrics' USING
> >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster
> >> cpu:combined.0 cpu:combined.1 ... system:LoadAverage.1','-loadKey') AS
> >> (rowKey: chararray, cluster: chararray, cpuCombined0:float,
> >> cpuCombined1:float ... LoadAverage:float);
> >>
> >> To get a long list of columns to load and specify the same list in
> >> group by command like:
> >>
> >> CleanseBuffer = foreach SystemMetrics generate
> >> REGEX_EXTRACT($0,'^\\d+',0) as time, cluster, cpuCombined0,
> >> cpuCombined1, ..., LoadAverage;
> >>
> >> The syntax works fine, but it would be nice to load all columns of a
> >> given column family without specifying individual columns.
> >>
> >> i.e. SystemMetrics = load 'hbase://SystemMetrics' USING
> >> org.apache.pig.backend.hadoop.hbase.HBaseStorage('tags:cluster cpu
> >> system');
> >>
> >> Is this syntax possible to implement in pig?
> >>
> >> Second question, is it possible to make alteration of a tuple in a
> >> bag, but not specifying other tuples in the same bag?
> >>
> >> For large column tables, it would be nice if there is short hand
> >> syntax to make pig syntax shorter to write.
> >> Any tip on making foreach and group by shorter?  Thanks
> >>
> >> regards,
> >> Eric
> >>
> >
>

Re: Syntax and HBaseStorage questions

Reply via email to