I would say either generate the script using another language (eg Python) or use a true programming language with an API having the same level of abstraction (eg Java and Cascading).
Bertrand On Thu, Jul 18, 2013 at 8:44 AM, Something Something < [email protected]> wrote: > There must be a better way to do this in Pig. Here's how my script looks > like right now: (omitted some snippet for saving space, but you will get > the idea). > > FACT_TABLE = LOAD 'XYZ' as (col1 :chararray,………. col30: chararray); > > FACT_TABLE1 = FOREACH FACT_TABLE GENERATE col1, udf1(col2) as col2,….. > udf10(col30) as col30; > > DIMENSION1 = LOAD 'DIM1' as (key, value); > > FACT_TABLE2 = JOIN FACT_TABLE1 BY col1 LEFT OUTER, DIMENSION1 BY key; > > FACT_TABLE3 = FOREACH FACT_TABLE2 GENERATE DIMENSION1::value as col1,……. > FACT_TABLE1::col30 as col30; > > DIMENSION2 = LOAD 'DIM2' as (key, value); > > FACT_TABLE4 = JOIN FACT_TABLE3 BY col2 LEFT OUTER, DIMENSION2 BY key; > > FACT_TABLE5 = FOREACH FACT_TABLE4 GENERATE FACT_TABLE3::col1 as > col1, DIMENSION2::value as col2,……. FACT_TABLE3::col30 as col30; > > & so on! There are 10 more such dimension tables to join. > > In short, each row on the fact table needs to be joined to a key field on a > dimension table to get it's associated value. > > This is beginning to look ugly. Plus it's maintenance nightmare when it > comes to adding new fields. What's the best way to code this in Pig? > > Thanks in advance. >
