There must be a better way to do this in Pig. Here's how my script looks like right now: (omitted some snippet for saving space, but you will get the idea).
FACT_TABLE = LOAD 'XYZ' as (col1 :chararray,………. col30: chararray); FACT_TABLE1 = FOREACH FACT_TABLE GENERATE col1, udf1(col2) as col2,….. udf10(col30) as col30; DIMENSION1 = LOAD 'DIM1' as (key, value); FACT_TABLE2 = JOIN FACT_TABLE1 BY col1 LEFT OUTER, DIMENSION1 BY key; FACT_TABLE3 = FOREACH FACT_TABLE2 GENERATE DIMENSION1::value as col1,……. FACT_TABLE1::col30 as col30; DIMENSION2 = LOAD 'DIM2' as (key, value); FACT_TABLE4 = JOIN FACT_TABLE3 BY col2 LEFT OUTER, DIMENSION2 BY key; FACT_TABLE5 = FOREACH FACT_TABLE4 GENERATE FACT_TABLE3::col1 as col1, DIMENSION2::value as col2,……. FACT_TABLE3::col30 as col30; & so on! There are 10 more such dimension tables to join. In short, each row on the fact table needs to be joined to a key field on a dimension table to get it's associated value. This is beginning to look ugly. Plus it's maintenance nightmare when it comes to adding new fields. What's the best way to code this in Pig? Thanks in advance.
