There must be a better way to do this in Pig.  Here's how my script looks
like right now:  (omitted some snippet for saving space, but you will get
the idea).

FACT_TABLE = LOAD 'XYZ'  as (col1 :chararray,………. col30: chararray);

FACT_TABLE1  = FOREACH FACT_TABLE GENERATE col1, udf1(col2) as col2,…..
udf10(col30) as col30;

DIMENSION1 = LOAD 'DIM1' as (key, value);

FACT_TABLE2 = JOIN FACT_TABLE1 BY col1 LEFT OUTER, DIMENSION1 BY key;

FACT_TABLE3  = FOREACH FACT_TABLE2 GENERATE DIMENSION1::value as col1,…….
 FACT_TABLE1::col30 as col30;

DIMENSION2 = LOAD 'DIM2' as (key, value);

FACT_TABLE4 = JOIN FACT_TABLE3 BY col2 LEFT OUTER, DIMENSION2 BY key;

FACT_TABLE5  = FOREACH FACT_TABLE4 GENERATE  FACT_TABLE3::col1 as
col1, DIMENSION2::value as col2,…….  FACT_TABLE3::col30 as col30;

& so on!  There are 10 more such dimension tables to join.

In short, each row on the fact table needs to be joined to a key field on a
dimension table to get it's associated value.

This is beginning to look ugly.  Plus it's maintenance nightmare when it
comes to adding new fields.  What's the best way to code this in Pig?

Thanks in advance.

Reply via email to