Hi Rodrigo,

Thanks for your suggestion. Though I don't see how the multistore UDF
helps.

Register UDFs etc
> A = LOAD....
> B = LOAD....
> C = LOAD....
>
> -- do lots of transformations with A and B and C get intermediate result
> INTER_RES
> result1 = FOREACH (GROUP INTER_RES BY (...
> STORE result1 INTO '....
> result2 = FOREACH (GROUP INTER_RES BY (...
> STORE result2 INTO '....
> result3 = FOREACH (GROUP INTER_RES BY (...
> STORE result3 INTO '....
> result4 = FOREACH (GROUP INTER_RES BY (...
> STORE result4 INTO '....
> ...
> ...
>

The different projections (groupings) are not done in the intermediate
result INTER_RES they are done later...

Cheers,
-Marco

On Thu, Jan 8, 2015 at 12:04 PM, Rodrigo Ferreira <web...@gmail.com> wrote:

> Marco,
>
> check out this UDF:
>
> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/MultiStorage.html
>
> I think it can get the job done without having to group everything.
>
> Cheers,
> Rodrigo
>
> 2015-01-08 7:27 GMT-02:00 Marco Cadetg <ma...@zattoo.com>:
>
> > Hi there,
> >
> > I've a big pig script which first generates some expensive intermediate
> > result on which I run multiple group by statements and multiple stores.
> > Something like this.
> >
> > Register UDFs etc
> > A = LOAD....
> > B = LOAD....
> > C = LOAD....
> >
> > -- do lots of transformations with A and B and C get intermediate result
> > INTER_RES
> > result1 = FOREACH (GROUP INTER_RES BY (...
> > STORE result1 INTO '....
> > result2 = FOREACH (GROUP INTER_RES BY (...
> > STORE result2 INTO '....
> > result3 = FOREACH (GROUP INTER_RES BY (...
> > STORE result3 INTO '....
> > result4 = FOREACH (GROUP INTER_RES BY (...
> > STORE result4 INTO '....
> > ...
> > ...
> >
> > Note the results which get stored are independent off each other. Meaning
> > they are not getting used as an input for anything else further down and
> do
> > also not alter the INTER_RES.
> >
> > Am I correct that pig would only need to LOAD A, B and C once? From what
> I
> > can see on the command line output it looks like the expensive
> intermediate
> > is computed every time for each store. I've done a quick test and if I
> do a
> > STORE of the intermediate and LOAD that it seems to be faster. Is there a
> > way to avoid this storing of the expensive intermediate?
> >
> > Cheers,
> > -Marco
> >
>

Reply via email to