It seems like your fact table and its corresponding dimension tables follow a traditional data warehousing star topology relational diagram. I would have to ask what is the purpose/deliverable of coalescing your star topology into a 1NF of fields in Pig? In this case, I'd agree with Pradeep; that the use of MR for queries against the warehouse using Hive might be more appropriate. If you really do need the coalesced 1NF of your warehouse, I'd recommend a parent bash/perl/php script to dynamically generate a pig script, with your table names and paths as parameter inputs to the script.
-Dan On Fri, Jul 19, 2013 at 10:10 AM, Sajid Raza <[email protected]> wrote: > Could you potentially store all of your facts in one table, join it agains > your dimension table, and filter as needed? > > On Jul 18, 2013, at 10:51 AM, Pradeep Gollakota <[email protected]> > wrote: > > > Unfortunately I can't think of any good way of doing this (other than > what > > Bertrand suggested with using a different language to generate the > script). > > > > I'd also recommend Hive... it may be easier to do this in Hive since you > > have SQL like syntax. (Haven't used Hive, but it looks like this type of > > thing would be far more natural in Hive) > > > > > > On Thu, Jul 18, 2013 at 12:09 PM, Something Something < > > [email protected]> wrote: > > > >> I don't think this is macro-able, Pradeep. Every step of the way a > >> different column gets updated. For example, for FACT_TABLE3 we update > >> 'col1' from DIMENSION1, for FACT_TABLE5 we update 'col2' from > DIMENSION2 & > >> so on. > >> > >> Feel free to correct me if I am wrong. Thanks. > >> > >> > >> > >> > >> > >> On Thu, Jul 18, 2013 at 8:25 AM, Pradeep Gollakota < > [email protected] > >>> wrote: > >> > >>> Looks like this might be macroable. Not entirely sure how that can be > >> done > >>> yet... but I'd look into that if I were you. > >>> > >>> > >>> On Thu, Jul 18, 2013 at 11:16 AM, Something Something < > >>> [email protected]> wrote: > >>> > >>>> Wow, Bertrand, on the Pig mailing list you're recommending not to use > >>>> Pig... LOL! Jokes apart, I would think this would be a common use > case > >>> for > >>>> Pig, no? Generating a Pig script on the fly is a decent idea, but > >> we're > >>>> hoping to avoid that - unless there's no other way. Thanks for the > >>>> pointers. > >>>> > >>>> > >>>> On Thu, Jul 18, 2013 at 2:52 AM, Bertrand Dechoux <[email protected] > >>>>> wrote: > >>>> > >>>>> I would say either generate the script using another language (eg > >>> Python) > >>>>> or use a true programming language with an API having the same level > >> of > >>>>> abstraction (eg Java and Cascading). > >>>>> > >>>>> Bertrand > >>>>> > >>>>> > >>>>> On Thu, Jul 18, 2013 at 8:44 AM, Something Something < > >>>>> [email protected]> wrote: > >>>>> > >>>>>> There must be a better way to do this in Pig. Here's how my script > >>>> looks > >>>>>> like right now: (omitted some snippet for saving space, but you > >> will > >>>> get > >>>>>> the idea). > >>>>>> > >>>>>> FACT_TABLE = LOAD 'XYZ' as (col1 :chararray,………. col30: > >> chararray); > >>>>>> > >>>>>> FACT_TABLE1 = FOREACH FACT_TABLE GENERATE col1, udf1(col2) as > >>> col2,….. > >>>>>> udf10(col30) as col30; > >>>>>> > >>>>>> DIMENSION1 = LOAD 'DIM1' as (key, value); > >>>>>> > >>>>>> FACT_TABLE2 = JOIN FACT_TABLE1 BY col1 LEFT OUTER, DIMENSION1 BY > >> key; > >>>>>> > >>>>>> FACT_TABLE3 = FOREACH FACT_TABLE2 GENERATE DIMENSION1::value as > >>>> col1,……. > >>>>>> FACT_TABLE1::col30 as col30; > >>>>>> > >>>>>> DIMENSION2 = LOAD 'DIM2' as (key, value); > >>>>>> > >>>>>> FACT_TABLE4 = JOIN FACT_TABLE3 BY col2 LEFT OUTER, DIMENSION2 BY > >> key; > >>>>>> > >>>>>> FACT_TABLE5 = FOREACH FACT_TABLE4 GENERATE FACT_TABLE3::col1 as > >>>>>> col1, DIMENSION2::value as col2,……. FACT_TABLE3::col30 as col30; > >>>>>> > >>>>>> & so on! There are 10 more such dimension tables to join. > >>>>>> > >>>>>> In short, each row on the fact table needs to be joined to a key > >>> field > >>>>> on a > >>>>>> dimension table to get it's associated value. > >>>>>> > >>>>>> This is beginning to look ugly. Plus it's maintenance nightmare > >> when > >>>> it > >>>>>> comes to adding new fields. What's the best way to code this in > >> Pig? > >>>>>> > >>>>>> Thanks in advance. > >>>>>> > >>>>> > >>>> > >>> > >> >
