Re: Statistics optimizer

Alan Gates Thu, 14 Oct 2010 11:02:29 -0700

AFAIK no one is working on that currently. Our next thoughts onoptimizer improvement were to start using the new optimizer frameworkin the MR optimizer so we can bring some order to the madness ofvisitors that is the MR optimizer. I think Thejas plans on startingwork on that in 0.9.

In the long run many people have discussed having a cost basedoptimizer, but I have not seen any proposals of how it should work.With the advent of Howl it would seem that at least basic statisticbased decisions could be made in the optimizer: is one of the filessmall enough to use a replicated join in this case? is the filealready sorted so we can use a merge join?.

It would be great if you're interested in working in this area. Thebest way to start is file a JIRA or start a wiki page with generalapproach and design information.


Alan.

On Oct 13, 2010, at 7:46 PM, Renato Marroquín Mogrovejo wrote:

Hey everyone!
In the Pig Journal page (http://wiki.apache.org/pig/PigJournal) says
something about getting statistics for Pig's optimizer. Is there anywork
being done on that?
Or are there any other plans to improve the optimizer? I mean now isa rule
based one, are there expectations to change it to a cost based one?
Any opinions or comments are highly appreciated. Thanks!


Renato M.

Re: Statistics optimizer

Reply via email to