I don't think this is macro-able, Pradeep. Every step of the way a different column gets updated. For example, for FACT_TABLE3 we update 'col1' from DIMENSION1, for FACT_TABLE5 we update 'col2' from DIMENSION2 & so on.
Feel free to correct me if I am wrong. Thanks. On Thu, Jul 18, 2013 at 8:25 AM, Pradeep Gollakota <[email protected]>wrote: > Looks like this might be macroable. Not entirely sure how that can be done > yet... but I'd look into that if I were you. > > > On Thu, Jul 18, 2013 at 11:16 AM, Something Something < > [email protected]> wrote: > > > Wow, Bertrand, on the Pig mailing list you're recommending not to use > > Pig... LOL! Jokes apart, I would think this would be a common use case > for > > Pig, no? Generating a Pig script on the fly is a decent idea, but we're > > hoping to avoid that - unless there's no other way. Thanks for the > > pointers. > > > > > > On Thu, Jul 18, 2013 at 2:52 AM, Bertrand Dechoux <[email protected] > > >wrote: > > > > > I would say either generate the script using another language (eg > Python) > > > or use a true programming language with an API having the same level of > > > abstraction (eg Java and Cascading). > > > > > > Bertrand > > > > > > > > > On Thu, Jul 18, 2013 at 8:44 AM, Something Something < > > > [email protected]> wrote: > > > > > > > There must be a better way to do this in Pig. Here's how my script > > looks > > > > like right now: (omitted some snippet for saving space, but you will > > get > > > > the idea). > > > > > > > > FACT_TABLE = LOAD 'XYZ' as (col1 :chararray,………. col30: chararray); > > > > > > > > FACT_TABLE1 = FOREACH FACT_TABLE GENERATE col1, udf1(col2) as > col2,….. > > > > udf10(col30) as col30; > > > > > > > > DIMENSION1 = LOAD 'DIM1' as (key, value); > > > > > > > > FACT_TABLE2 = JOIN FACT_TABLE1 BY col1 LEFT OUTER, DIMENSION1 BY key; > > > > > > > > FACT_TABLE3 = FOREACH FACT_TABLE2 GENERATE DIMENSION1::value as > > col1,……. > > > > FACT_TABLE1::col30 as col30; > > > > > > > > DIMENSION2 = LOAD 'DIM2' as (key, value); > > > > > > > > FACT_TABLE4 = JOIN FACT_TABLE3 BY col2 LEFT OUTER, DIMENSION2 BY key; > > > > > > > > FACT_TABLE5 = FOREACH FACT_TABLE4 GENERATE FACT_TABLE3::col1 as > > > > col1, DIMENSION2::value as col2,……. FACT_TABLE3::col30 as col30; > > > > > > > > & so on! There are 10 more such dimension tables to join. > > > > > > > > In short, each row on the fact table needs to be joined to a key > field > > > on a > > > > dimension table to get it's associated value. > > > > > > > > This is beginning to look ugly. Plus it's maintenance nightmare when > > it > > > > comes to adding new fields. What's the best way to code this in Pig? > > > > > > > > Thanks in advance. > > > > > > > > > >
