Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project

2012-04-06 Thread Josh Berkus
On 3/30/12 7:29 AM, Tom Lane wrote:
 Arun Chaitanya chaitan64a...@gmail.com writes:
 The link to the paper is
 http://www.iith.ac.in/~ravig/courses/cs5050/papers/decorrelation-cesar.pdf
 
 Given the authorship of that paper, I'd have to wonder whether Microsoft
 has filed for any patents regarding these ideas.

Feh, too bad Jim Grey isn't around anymore; I could just ask.

Is there some smaller query optimization project which Arun could
reasonably tackle?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project

2012-03-30 Thread Arun Chaitanya
Hi,

I wanted to take up this as a GSOC 2012 project.

SQL supports nested queries. When the inner query contains a
correlation variable the present optimizer takes an iterative
execution plan. If the inner query scans over a relation, the
iterative plan chosen can be sub-optimal.

The goal of this project is to enable De-correlation for all possible
cases. The iterative execution plan can be converted to a set oriented
plan by taking a join over the base relations involved. Sufficient
work has already been done in this area and has been implemented in
SQL Server.

The changes required to incorporate the above mentioned strategy is in
rewriting phase to the best of my knowledge. The key idea is to
introduce the APPLY operator in the raw parse tree. In the above
mentioned Papers, the author has mentioned the process of removing the
apply. The author has proposed a set of rules which will allow us to
achieve the goal. The present postgresql optimizer community has done
some work in these lines for simple subqueries involving =,  ,  in
the predicates [ I observed it by seeing the result of EXPLAIN for
relevant queries ]. The optimization is not done for subqueries
containing aggregate queries and existential and containment queries.

An example query from TPCH benchmark discussed by the author:

select c_custkey
from customer
where 100  (select sum(o_totalprice)
from orders
where o_custkey = c_custkey)

In the above case, c_custkey is a correlation variable (parameterized)
coming from the outer query. Hence in the present system, the inner
query is executed as many times as the tuples in the customer
relation. As the subQuery involves a scan over orders relation, the
total I/O cost involved is pretty high.

Using the transformation proposed by the author, the query can be re-written as

select c_custkey
from customer left outer join
orders on o_custkey = c custkey
group by c_custkey
having 100  sum(o_totalprice)

This allows the optimizer to chose a better plan for left outer join
and avoid iterative execution. The idea here is not to get a rewritten
query as output but to generate a plan tree that does the same as the
above query.

Regards,
Arun

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project

2012-03-30 Thread Robert Haas
On Fri, Mar 30, 2012 at 7:33 AM, Arun Chaitanya chaitan64a...@gmail.com wrote:
 I wanted to take up this as a GSOC 2012 project.

This would be a great query planner optimization but the chances of
getting it done in one summer as a GSoC project seem to me to be nil.
You've never had a patch accepted before; picking a massive
reorganization of the query planner as your first project is not going
to work out well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project

2012-03-30 Thread Arun Chaitanya
Thanks a lot Heikki.
I have already posted an example in the mail.
The link to the paper is
http://www.iith.ac.in/~ravig/courses/cs5050/papers/decorrelation-cesar.pdf

Hope this helps,
Arun

On Fri, Mar 30, 2012 at 5:32 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 (off-list)

 You'll want to post a link to the paper, otherwise people who are not GSoC
 mentors will have no idea what you're talking about ;-). Posting an example
 query and access plans would illustrate the point, too.


 On 30.03.2012 14:33, Arun Chaitanya wrote:

 Hi,

 I wanted to take up this as a GSOC 2012 project.

 SQL supports nested queries. When the inner query contains a
 correlation variable the present optimizer takes an iterative
 execution plan. If the inner query scans over a relation, the
 iterative plan chosen can be sub-optimal.

 The goal of this project is to enable De-correlation for all possible
 cases. The iterative execution plan can be converted to a set oriented
 plan by taking a join over the base relations involved. Sufficient
 work has already been done in this area and has been implemented in
 SQL Server.

 The changes required to incorporate the above mentioned strategy is in
 rewriting phase to the best of my knowledge. The key idea is to
 introduce the APPLY operator in the raw parse tree. In the above
 mentioned Papers, the author has mentioned the process of removing the
 apply. The author has proposed a set of rules which will allow us to
 achieve the goal. The present postgresql optimizer community has done
 some work in these lines for simple subqueries involving =,  ,  in
 the predicates [ I observed it by seeing the result of EXPLAIN for
 relevant queries ]. The optimization is not done for subqueries
 containing aggregate queries and existential and containment queries.

 An example query from TPCH benchmark discussed by the author:

 select c_custkey
 from customer
 where 100  (select sum(o_totalprice)
 from orders
 where o_custkey = c_custkey)

 In the above case, c_custkey is a correlation variable (parameterized)
 coming from the outer query. Hence in the present system, the inner
 query is executed as many times as the tuples in the customer
 relation. As the subQuery involves a scan over orders relation, the
 total I/O cost involved is pretty high.

 Using the transformation proposed by the author, the query can be
 re-written as

 select c_custkey
 from customer left outer join
 orders on o_custkey = c custkey
 group by c_custkey
 having 100  sum(o_totalprice)

 This allows the optimizer to chose a better plan for left outer join
 and avoid iterative execution. The idea here is not to get a rewritten
 query as output but to generate a plan tree that does the same as the
 above query.

 Regards,
 Arun



 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project

2012-03-30 Thread Arun Chaitanya
Thanks Robert,

Yes. I think I am being over ambitious as I never had any Open Source
development experience.
Anyways, please go through the idea. I have posted the link to the
paper in on of the replies.

Please, suggest me other options which I can take up as a GSOC 2012 project.

On Fri, Mar 30, 2012 at 5:34 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 30, 2012 at 7:33 AM, Arun Chaitanya chaitan64a...@gmail.com 
 wrote:
 I wanted to take up this as a GSOC 2012 project.

 This would be a great query planner optimization but the chances of
 getting it done in one summer as a GSoC project seem to me to be nil.
 You've never had a patch accepted before; picking a massive
 reorganization of the query planner as your first project is not going
 to work out well.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project

2012-03-30 Thread Tom Lane
Arun Chaitanya chaitan64a...@gmail.com writes:
 The link to the paper is
 http://www.iith.ac.in/~ravig/courses/cs5050/papers/decorrelation-cesar.pdf

Given the authorship of that paper, I'd have to wonder whether Microsoft
has filed for any patents regarding these ideas.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers