Our initial survey of related literature showed that the usual place
for a CBO tends to be between the physical and logical layer (in fact,
the famous Cascades paper advocates removing the distinction between
physical and logical operators altogether, and using an "is_logical"
and "is_physical" flag instead -- meaning an operator can be one,
both, or neither).

The reasoning is that you cannot properly determine a cost of a plan
if you don't know the physical "properties" of the operators that
implement it. An optimizer that works at a logical layer would by
definition create the same plan whether in local or mapreduce mode
(since such differences are abstracted from it). This is clearly
incorrect, as the properties of the environment in which these plans
are executed are drastically different.  Working at the physical layer
lets us stay close to the iron and adjust based on the specifics of
the execution environment.

Certainly one can posit a framework for a CBO that would set up the
necessary interfaces and plumbing for optimizing in any execution
mode, and invoke the proper implementations at run time; we are not
discounting that possibility (haven't gotten quite that far in the
design, to be honest).  But we feel that the implementations have to
be execution mode specific.


On Tue, Sep 1, 2009 at 6:26 PM, Jianyong Dai<jiany...@yahoo-inc.com> wrote:
> I am still reading but one interesting question is why you decide to put CBO
> in physical layer?
> Dmitriy Ryaboy wrote:
>> Whoops :-)
>> Here's the Google doc:
>> http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdA&hl=en
>> -Dmitriy
>> On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasan<s...@yahoo-inc.com>
>> wrote:
>>> Dmitriy and Gang,
>>> The mailing list does not allow attachments. Can you post it on a
>>> website and just send the URL ?
>>> Thanks,
>>> Santhosh
>>> -----Original Message-----
>>> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
>>> Sent: Tuesday, September 01, 2009 9:48 AM
>>> To: pig-dev@hadoop.apache.org
>>> Subject: Request for feedback: cost-based optimizer
>>> Hi everyone,
>>> Attached is a (very) preliminary document outlining a rough design we
>>> are proposing for a cost-based optimizer for Pig.
>>> This is being done as a capstone project by three CMU Master's students
>>> (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not
>>> necessarily meant for immediate incorporation into the Pig codebase,
>>> although it would be nice if it, or parts of it, are found to be useful
>>> in the mainline.
>>> We would love to get some feedback from the developer community
>>> regarding the ideas expressed in the document, any concerns about the
>>> design, suggestions for improvement, etc.
>>> Thanks,
>>> Dmitriy, Ashutosh, Tejal

Reply via email to