Re: Statistics optimizer

Dmitriy Ryaboy Thu, 14 Oct 2010 11:32:51 -0700

As long as the loader can provide statistics (be it Howl loader, or
something else), we can use stats to make optimizations. There is actually a
(very simplistic) example of this already in 0.8, in that it will estimate
number of reducers based on file size if parallelism is not specified for
join/group/etc operations.  It's not using LoadStatistics afaik, that should
be fixed.


Ashutosh and I did some work around this about a year ago (on top of the
visitor madness) that was never really presentable.. would be happy to dig
up the code and see if any of it is still salvageable.

-D

On Thu, Oct 14, 2010 at 11:01 AM, Alan Gates <[email protected]> wrote:

> AFAIK no one is working on that currently.  Our next thoughts on optimizer
> improvement were to start using the new optimizer framework in the MR
> optimizer so we can bring some order to the madness of visitors that is the
> MR optimizer.  I think Thejas plans on starting work on that in 0.9.
>
> In the long run many people have discussed having a cost based optimizer,
> but I have not seen any proposals of how it should work.  With the advent of
> Howl it would seem that at least basic statistic based decisions could be
> made in the optimizer:  is one of the files small enough to use a replicated
> join in this case?  is the file already sorted so we can use a merge join?.
>
> It would be great if you're interested in working in this area.  The best
> way to start is file a JIRA or start a wiki page with general approach and
> design information.
>
> Alan.
>
>
> On Oct 13, 2010, at 7:46 PM, Renato Marroquín Mogrovejo wrote:
>
>  Hey everyone!
>> In the Pig Journal page (http://wiki.apache.org/pig/PigJournal) says
>> something about getting statistics for Pig's optimizer. Is there any work
>> being done on that?
>> Or are there any other plans to improve the optimizer? I mean now is a
>> rule
>> based one, are there expectations to change it to a cost based one?
>> Any opinions or comments are highly appreciated. Thanks!
>>
>>
>> Renato M.
>>
>
>

Re: Statistics optimizer

Reply via email to