I don't think this is macro-able, Pradeep.  Every step of the way a
different column gets updated.  For example, for FACT_TABLE3 we update
'col1' from DIMENSION1, for FACT_TABLE5 we update 'col2' from DIMENSION2 &
so on.

Feel free to correct me if I am wrong.  Thanks.





On Thu, Jul 18, 2013 at 8:25 AM, Pradeep Gollakota <[email protected]>wrote:

> Looks like this might be macroable. Not entirely sure how that can be done
> yet... but I'd look into that if I were you.
>
>
> On Thu, Jul 18, 2013 at 11:16 AM, Something Something <
> [email protected]> wrote:
>
> > Wow, Bertrand, on the Pig mailing list you're recommending not to use
> > Pig... LOL!  Jokes apart, I would think this would be a common use case
> for
> > Pig, no?  Generating a Pig script on the fly is a decent idea, but we're
> > hoping to avoid that - unless there's no other way.  Thanks for the
> > pointers.
> >
> >
> > On Thu, Jul 18, 2013 at 2:52 AM, Bertrand Dechoux <[email protected]
> > >wrote:
> >
> > > I would say either generate the script using another language (eg
> Python)
> > > or use a true programming language with an API having the same level of
> > > abstraction (eg Java and Cascading).
> > >
> > > Bertrand
> > >
> > >
> > > On Thu, Jul 18, 2013 at 8:44 AM, Something Something <
> > > [email protected]> wrote:
> > >
> > > > There must be a better way to do this in Pig.  Here's how my script
> > looks
> > > > like right now:  (omitted some snippet for saving space, but you will
> > get
> > > > the idea).
> > > >
> > > > FACT_TABLE = LOAD 'XYZ'  as (col1 :chararray,………. col30: chararray);
> > > >
> > > > FACT_TABLE1  = FOREACH FACT_TABLE GENERATE col1, udf1(col2) as
> col2,…..
> > > > udf10(col30) as col30;
> > > >
> > > > DIMENSION1 = LOAD 'DIM1' as (key, value);
> > > >
> > > > FACT_TABLE2 = JOIN FACT_TABLE1 BY col1 LEFT OUTER, DIMENSION1 BY key;
> > > >
> > > > FACT_TABLE3  = FOREACH FACT_TABLE2 GENERATE DIMENSION1::value as
> > col1,…….
> > > >  FACT_TABLE1::col30 as col30;
> > > >
> > > > DIMENSION2 = LOAD 'DIM2' as (key, value);
> > > >
> > > > FACT_TABLE4 = JOIN FACT_TABLE3 BY col2 LEFT OUTER, DIMENSION2 BY key;
> > > >
> > > > FACT_TABLE5  = FOREACH FACT_TABLE4 GENERATE  FACT_TABLE3::col1 as
> > > > col1, DIMENSION2::value as col2,…….  FACT_TABLE3::col30 as col30;
> > > >
> > > > & so on!  There are 10 more such dimension tables to join.
> > > >
> > > > In short, each row on the fact table needs to be joined to a key
> field
> > > on a
> > > > dimension table to get it's associated value.
> > > >
> > > > This is beginning to look ugly.  Plus it's maintenance nightmare when
> > it
> > > > comes to adding new fields.  What's the best way to code this in Pig?
> > > >
> > > > Thanks in advance.
> > > >
> > >
> >
>

Reply via email to