At a high level, we are implementing the framework for propagating
statistics between Pig operators, and using said statistics to make
moderately intelligent decisions about Join types that should be used
(unless they are specified by the user). We do this in a fairly
brute-force manner, by generating all alternative plans (that part is
not working so hot right now, see subject) and costing them, choosing
the global minimum (there is some pruning happening, but not as much
as something like System R). As far as relation order inside a given
Join, we set that deterministically after choosing the join, as Pig
has specific preferences for where the largest relation should go for
a given join type. Once we have join type selection working, other
optimizations can be added -- the tricky part is making sure the
costing functions can't produce drastically wrong results.
All the work is happening at the logical layer, between the rule-based
optimizer and LogToPhysTranslator.
2009/11/5 RichardGUO Fei <gladiato...@hotmail.com>:
> I am also doing a cost-based optimizer. So I am interested in knowing some of
> the specs that you are after.
> 上Windows Live 中国首页，下载Messenger2009安全版！