Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
On 3/30/12 7:29 AM, Tom Lane wrote: Arun Chaitanya chaitan64a...@gmail.com writes: The link to the paper is http://www.iith.ac.in/~ravig/courses/cs5050/papers/decorrelation-cesar.pdf Given the authorship of that paper, I'd have to wonder whether Microsoft has filed for any patents regarding these ideas. Feh, too bad Jim Grey isn't around anymore; I could just ask. Is there some smaller query optimization project which Arun could reasonably tackle? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Hi, I wanted to take up this as a GSOC 2012 project. SQL supports nested queries. When the inner query contains a correlation variable the present optimizer takes an iterative execution plan. If the inner query scans over a relation, the iterative plan chosen can be sub-optimal. The goal of this project is to enable De-correlation for all possible cases. The iterative execution plan can be converted to a set oriented plan by taking a join over the base relations involved. Sufficient work has already been done in this area and has been implemented in SQL Server. The changes required to incorporate the above mentioned strategy is in rewriting phase to the best of my knowledge. The key idea is to introduce the APPLY operator in the raw parse tree. In the above mentioned Papers, the author has mentioned the process of removing the apply. The author has proposed a set of rules which will allow us to achieve the goal. The present postgresql optimizer community has done some work in these lines for simple subqueries involving =, , in the predicates [ I observed it by seeing the result of EXPLAIN for relevant queries ]. The optimization is not done for subqueries containing aggregate queries and existential and containment queries. An example query from TPCH benchmark discussed by the author: select c_custkey from customer where 100 (select sum(o_totalprice) from orders where o_custkey = c_custkey) In the above case, c_custkey is a correlation variable (parameterized) coming from the outer query. Hence in the present system, the inner query is executed as many times as the tuples in the customer relation. As the subQuery involves a scan over orders relation, the total I/O cost involved is pretty high. Using the transformation proposed by the author, the query can be re-written as select c_custkey from customer left outer join orders on o_custkey = c custkey group by c_custkey having 100 sum(o_totalprice) This allows the optimizer to chose a better plan for left outer join and avoid iterative execution. The idea here is not to get a rewritten query as output but to generate a plan tree that does the same as the above query. Regards, Arun -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
On Fri, Mar 30, 2012 at 7:33 AM, Arun Chaitanya chaitan64a...@gmail.com wrote: I wanted to take up this as a GSOC 2012 project. This would be a great query planner optimization but the chances of getting it done in one summer as a GSoC project seem to me to be nil. You've never had a patch accepted before; picking a massive reorganization of the query planner as your first project is not going to work out well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Thanks a lot Heikki. I have already posted an example in the mail. The link to the paper is http://www.iith.ac.in/~ravig/courses/cs5050/papers/decorrelation-cesar.pdf Hope this helps, Arun On Fri, Mar 30, 2012 at 5:32 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: (off-list) You'll want to post a link to the paper, otherwise people who are not GSoC mentors will have no idea what you're talking about ;-). Posting an example query and access plans would illustrate the point, too. On 30.03.2012 14:33, Arun Chaitanya wrote: Hi, I wanted to take up this as a GSOC 2012 project. SQL supports nested queries. When the inner query contains a correlation variable the present optimizer takes an iterative execution plan. If the inner query scans over a relation, the iterative plan chosen can be sub-optimal. The goal of this project is to enable De-correlation for all possible cases. The iterative execution plan can be converted to a set oriented plan by taking a join over the base relations involved. Sufficient work has already been done in this area and has been implemented in SQL Server. The changes required to incorporate the above mentioned strategy is in rewriting phase to the best of my knowledge. The key idea is to introduce the APPLY operator in the raw parse tree. In the above mentioned Papers, the author has mentioned the process of removing the apply. The author has proposed a set of rules which will allow us to achieve the goal. The present postgresql optimizer community has done some work in these lines for simple subqueries involving =, , in the predicates [ I observed it by seeing the result of EXPLAIN for relevant queries ]. The optimization is not done for subqueries containing aggregate queries and existential and containment queries. An example query from TPCH benchmark discussed by the author: select c_custkey from customer where 100 (select sum(o_totalprice) from orders where o_custkey = c_custkey) In the above case, c_custkey is a correlation variable (parameterized) coming from the outer query. Hence in the present system, the inner query is executed as many times as the tuples in the customer relation. As the subQuery involves a scan over orders relation, the total I/O cost involved is pretty high. Using the transformation proposed by the author, the query can be re-written as select c_custkey from customer left outer join orders on o_custkey = c custkey group by c_custkey having 100 sum(o_totalprice) This allows the optimizer to chose a better plan for left outer join and avoid iterative execution. The idea here is not to get a rewritten query as output but to generate a plan tree that does the same as the above query. Regards, Arun -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Thanks Robert, Yes. I think I am being over ambitious as I never had any Open Source development experience. Anyways, please go through the idea. I have posted the link to the paper in on of the replies. Please, suggest me other options which I can take up as a GSOC 2012 project. On Fri, Mar 30, 2012 at 5:34 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Mar 30, 2012 at 7:33 AM, Arun Chaitanya chaitan64a...@gmail.com wrote: I wanted to take up this as a GSOC 2012 project. This would be a great query planner optimization but the chances of getting it done in one summer as a GSoC project seem to me to be nil. You've never had a patch accepted before; picking a massive reorganization of the query planner as your first project is not going to work out well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Arun Chaitanya chaitan64a...@gmail.com writes: The link to the paper is http://www.iith.ac.in/~ravig/courses/cs5050/papers/decorrelation-cesar.pdf Given the authorship of that paper, I'd have to wonder whether Microsoft has filed for any patents regarding these ideas. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers