Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-09-05 Thread Robert Haas
On Fri, Sep 2, 2011 at 12:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: column values).  But GROUP BY or DISTINCT would entirely invalidate the column frequency statistics, which makes me think that ignoring the pg_statistic entry might be the thing to do.  Comments? There's a possible problem

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-09-04 Thread Tom Lane
I wrote: On a longer-term basis, I'm looking into what we could do with extracting stats from subqueries, but that doesn't seem like material for a backpatch. I have a draft patch that I've been playing with (attached). I've committed a heavily rewritten version of that patch. Git HEAD

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-09-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes: On lör, 2011-08-27 at 13:32 -0400, Tom Lane wrote: The larger problem is that if a subquery didn't get flattened, it's often because it's got LIMIT, or GROUP BY, or some similar clause that makes it highly suspect whether the statistics available for

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-08-31 Thread Peter Eisentraut
On lör, 2011-08-27 at 13:32 -0400, Tom Lane wrote: EXPLAIN SELECT * FROM test1 WHERE sha1 in (SELECT sha1 FROM test2 LIMIT 200); Here, however, it has apparently not passed this knowledge through the LIMIT. The LIMIT prevents the subquery from being flattened entirely, ie we don't

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-08-31 Thread Robert Haas
On Wed, Aug 31, 2011 at 6:22 AM, Peter Eisentraut pete...@gmx.net wrote: On lör, 2011-08-27 at 13:32 -0400, Tom Lane wrote: EXPLAIN SELECT * FROM test1  WHERE sha1 in (SELECT sha1 FROM test2 LIMIT 200); Here, however, it has apparently not passed this knowledge through the LIMIT. The

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-08-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Wed, Aug 31, 2011 at 6:22 AM, Peter Eisentraut pete...@gmx.net wrote: I liked the old one better. ;-) AFAICS, those plans are identical, except for a minor difference in the cost of scanning test2. The point is that the estimate of the result size

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-08-29 Thread Robert Haas
On Sat, Aug 27, 2011 at 1:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: EXPLAIN SELECT * FROM test1  WHERE sha1 in (SELECT sha1 FROM test2);                               QUERY PLAN --  

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-08-27 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes: EXPLAIN SELECT * FROM test1 WHERE sha1 in (SELECT sha1 FROM test2); QUERY PLAN -- Hash Semi Join (cost=30.52..61.27 rows=1000 width=27) Hash Cond: