Re: [HACKERS] nested loop semijoin estimates

2015-06-06 Thread Tomas Vondra
FWIW, I've repeated the TPC-DS tests on a much larger data set (50GB) today, and I see that (a) 3f59be836c555fa679bbe0ec76de50a8b5cb23e0 (ANTI/SEMI join costing) changes nothing - there are some small cost changes, but only in plans involving semi/anti-joins (which is expected).

Re: [HACKERS] nested loop semijoin estimates

2015-06-05 Thread Robert Haas
On Tue, Jun 2, 2015 at 10:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: What it seems like we should do, if we want to back-patch this, is apply it without the add_path_precheck changes. Then as an independent HEAD-only patch, change add_path_precheck so that it's behaving as designed. It looks

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tom Lane
Tomas Vondra tomas.von...@2ndquadrant.com writes: On 06/02/15 23:27, Tom Lane wrote: Do we have instructions around here anyplace on how to set up/use TPC-DS? I couldn't find anything about it on the wiki ... Not that I'm aware of, but it's not really all that difficult. [ instructions... ]

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tom Lane
Tomas Vondra tomas.von...@2ndquadrant.com writes: OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The results are good, IMHO. With TPC-H, I've used 1GB and 4GB datasets, and I've seen no plan changes at all. I don't plan to run the tests on larger data sets, I do expect

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tomas Vondra
On 06/02/15 01:47, Josh Berkus wrote: On 06/01/2015 03:22 PM, Tomas Vondra wrote: On 06/01/15 23:47, Josh Berkus wrote: On 06/01/2015 02:18 PM, Tom Lane wrote: Anybody else want to speak for or against back-patching the patch as posted? I intentionally didn't push it in before today's

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tomas Vondra
On 06/02/15 16:37, Tom Lane wrote: Tomas Vondra tomas.von...@2ndquadrant.com writes: OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The results are good, IMHO. I'm a bit disturbed by that, because AFAICS from the plans, these queries did not involve any semi or anti

Re: [HACKERS] nested loop semijoin estimates

2015-06-02 Thread Tom Lane
Tomas Vondra tomas.von...@2ndquadrant.com writes: On 06/02/15 16:37, Tom Lane wrote: It's possible that the change was due to random variation in ANALYZE statistics, in which case it was just luck. I don't think so. I simply loaded the data, ran ANALYZE, and then simply started either

Re: [HACKERS] nested loop semijoin estimates

2015-06-01 Thread Tom Lane
Tomas Vondra tomas.von...@2ndquadrant.com writes: On 06/01/15 00:08, Tom Lane wrote: Attached is an incremental patch (on top of the previous one) to allow startup cost of parameterized paths to be considered when the relation is the RHS of a semi or anti join. It seems reasonably clean

Re: [HACKERS] nested loop semijoin estimates

2015-06-01 Thread Josh Berkus
On 06/01/2015 02:18 PM, Tom Lane wrote: Tomas Vondra tomas.von...@2ndquadrant.com writes: On 06/01/15 00:08, Tom Lane wrote: Attached is an incremental patch (on top of the previous one) to allow startup cost of parameterized paths to be considered when the relation is the RHS of a semi or

Re: [HACKERS] nested loop semijoin estimates

2015-06-01 Thread Tomas Vondra
On 06/01/15 23:47, Josh Berkus wrote: On 06/01/2015 02:18 PM, Tom Lane wrote: Anybody else want to speak for or against back-patching the patch as posted? I intentionally didn't push it in before today's releases, but I will push it later this week if there are not objections. I would like

Re: [HACKERS] nested loop semijoin estimates

2015-06-01 Thread Josh Berkus
On 06/01/2015 03:22 PM, Tomas Vondra wrote: On 06/01/15 23:47, Josh Berkus wrote: On 06/01/2015 02:18 PM, Tom Lane wrote: Anybody else want to speak for or against back-patching the patch as posted? I intentionally didn't push it in before today's releases, but I will push it later this

Re: [HACKERS] nested loop semijoin estimates

2015-05-31 Thread Tom Lane
Tomas Vondra tomas.von...@2ndquadrant.com writes: On 05/30/15 23:16, Tom Lane wrote: Attached is a draft patch for that. It fixes the problem for me: Seems to be working OK, but I still do get a Bitmap Heap Scan there (but more about that later). Attached is an incremental patch (on top of

Re: [HACKERS] nested loop semijoin estimates

2015-05-31 Thread Tomas Vondra
On 06/01/15 00:08, Tom Lane wrote: Tomas Vondra tomas.von...@2ndquadrant.com writes: On 05/30/15 23:16, Tom Lane wrote: Attached is a draft patch for that. It fixes the problem for me: Seems to be working OK, but I still do get a Bitmap Heap Scan there (but more about that later).

Re: [HACKERS] nested loop semijoin estimates

2015-05-30 Thread Tom Lane
Tomas Vondra tomas.von...@2ndquadrant.com writes: I wonder whether the run_cost += inner_run_cost; is actually correct, because this pretty much means we assume scanning the whole inner relation (once). Wouldn't something like this be more appropriate? run_cost +=

Re: [HACKERS] nested loop semijoin estimates

2015-05-30 Thread Tomas Vondra
On 05/30/15 03:52, Tomas Vondra wrote: On 05/30/15 01:20, Tomas Vondra wrote: Notice the cost - it's way lover than the previous plan (9.2 vs ~111k), yet this plan was not chosen. So either the change broke something (e.g. by violating some optimizer assumption), or maybe there's a bug

Re: [HACKERS] nested loop semijoin estimates

2015-05-30 Thread Tom Lane
I wrote: So what this seems to mean is that for SEMI/ANTI join cases, we have to postpone all of the inner scan cost determination to final_cost_nestloop, so that we can do this differently depending on whether has_indexed_join_quals() is true. That's a little bit annoying because it will

Re: [HACKERS] nested loop semijoin estimates

2015-05-30 Thread Tomas Vondra
On 05/30/15 21:50, Tom Lane wrote: So what this seems to mean is that for SEMI/ANTI join cases, we have to postpone all of the inner scan cost determination to final_cost_nestloop, so that we can do this differently depending on whether has_indexed_join_quals() is true. That's a little bit

Re: [HACKERS] nested loop semijoin estimates

2015-05-30 Thread Tomas Vondra
Hi, On 05/30/15 23:16, Tom Lane wrote: I wrote: So what this seems to mean is that for SEMI/ANTI join cases, we have to postpone all of the inner scan cost determination to final_cost_nestloop, so that we can do this differently depending on whether has_indexed_join_quals() is true. That's a

[HACKERS] nested loop semijoin estimates

2015-05-29 Thread Tomas Vondra
Hi, while looking at this post from pgsql-performance about plan changes http://www.postgresql.org/message-id/flat/20150529095117.gb15...@hjp.at I noticed that initial_cost_nestloop() does this in (9.1, mentioned in the pgsql-performance post uses the same logic): if (jointype ==

Re: [HACKERS] nested loop semijoin estimates

2015-05-29 Thread Tomas Vondra
On 05/30/15 01:20, Tomas Vondra wrote: Notice the cost - it's way lover than the previous plan (9.2 vs ~111k), yet this plan was not chosen. So either the change broke something (e.g. by violating some optimizer assumption), or maybe there's a bug somewhere else ... After a bit more