I would say either generate the script using another language (eg Python)
or use a true programming language with an API having the same level of
abstraction (eg Java and Cascading).

Bertrand


On Thu, Jul 18, 2013 at 8:44 AM, Something Something <
[email protected]> wrote:

> There must be a better way to do this in Pig.  Here's how my script looks
> like right now:  (omitted some snippet for saving space, but you will get
> the idea).
>
> FACT_TABLE = LOAD 'XYZ'  as (col1 :chararray,………. col30: chararray);
>
> FACT_TABLE1  = FOREACH FACT_TABLE GENERATE col1, udf1(col2) as col2,…..
> udf10(col30) as col30;
>
> DIMENSION1 = LOAD 'DIM1' as (key, value);
>
> FACT_TABLE2 = JOIN FACT_TABLE1 BY col1 LEFT OUTER, DIMENSION1 BY key;
>
> FACT_TABLE3  = FOREACH FACT_TABLE2 GENERATE DIMENSION1::value as col1,…….
>  FACT_TABLE1::col30 as col30;
>
> DIMENSION2 = LOAD 'DIM2' as (key, value);
>
> FACT_TABLE4 = JOIN FACT_TABLE3 BY col2 LEFT OUTER, DIMENSION2 BY key;
>
> FACT_TABLE5  = FOREACH FACT_TABLE4 GENERATE  FACT_TABLE3::col1 as
> col1, DIMENSION2::value as col2,…….  FACT_TABLE3::col30 as col30;
>
> & so on!  There are 10 more such dimension tables to join.
>
> In short, each row on the fact table needs to be joined to a key field on a
> dimension table to get it's associated value.
>
> This is beginning to look ugly.  Plus it's maintenance nightmare when it
> comes to adding new fields.  What's the best way to code this in Pig?
>
> Thanks in advance.
>

Reply via email to