Re: [HACKERS] COUNT(*) and index-only scans

2011-12-14 Thread Greg Stark
On Mon, Nov 21, 2011 at 6:43 PM, Robert Haas robertmh...@gmail.com wrote: In buffer fill mode, we scan the index and add matching tuples and their CTIDs to the buffer.  When the buffer is full or the index AM reports that there are no more tuples in the scan, we switch to buffer drain mode.

Re: [HACKERS] COUNT(*) and index-only scans

2011-12-14 Thread Robert Haas
On Wed, Dec 14, 2011 at 6:58 AM, Greg Stark st...@mit.edu wrote: On Mon, Nov 21, 2011 at 6:43 PM, Robert Haas robertmh...@gmail.com wrote: In buffer fill mode, we scan the index and add matching tuples and their CTIDs to the buffer.  When the buffer is full or the index AM reports that there

Re: [HACKERS] COUNT(*) and index-only scans

2011-11-21 Thread Robert Haas
On Sat, Nov 19, 2011 at 11:22 AM, Thom Brown t...@linux.com wrote: While I accept that maybe adapting the existing bitmap index scan functionality isn't necessarily desirable, would it be feasible to create a corresponding bitmap index-only scan method. I've been thinking about this a bit

Re: [HACKERS] COUNT(*) and index-only scans

2011-11-19 Thread Tom Lane
Thom Brown t...@linux.com writes: So is there a chance of getting bitmap index-only scans? Don't hold your breath. It seems like a huge increment of complexity for probably very marginal gains. The point of a bitmap scan (as opposed to regular indexscan) is to reduce heap accesses by combining

Re: [HACKERS] COUNT(*) and index-only scans

2011-11-19 Thread Thom Brown
On 19 November 2011 16:08, Tom Lane t...@sss.pgh.pa.us wrote: Thom Brown t...@linux.com writes: So is there a chance of getting bitmap index-only scans? Don't hold your breath.  It seems like a huge increment of complexity for probably very marginal gains.  The point of a bitmap scan (as

Re: [HACKERS] COUNT(*) and index-only scans

2011-11-18 Thread Thom Brown
On 12 October 2011 17:26, Robert Haas robertmh...@gmail.com wrote: On Wed, Oct 12, 2011 at 11:59 AM, Tom Lane t...@sss.pgh.pa.us wrote: The place where the decision is actually somewhat hard, IMO, is where you're pulling a small part of the table but significantly more than one row, and the

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-15 Thread Jeff Janes
On Mon, Oct 10, 2011 at 9:48 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Jeff Janes  wrote: Kevin Grittner  wrote: create table t (id int not null primary key); insert into t select generate_series(1, 100); vacuum freeze analyze; explain analyze select count(*) from t where

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Jeff Davis
On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote: The real issue is that the costing estimates need to be accurate, and that's where the rubber hits the road. Otherwise, even if we pick the right way to scan the table, we may do silly things up the line when we go to start constructing

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Robert Haas
On Wed, Oct 12, 2011 at 2:50 AM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote: The real issue is that the costing estimates need to be accurate, and that's where the rubber hits the road.  Otherwise, even if we pick the right way to scan the table,

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Wed, Oct 12, 2011 at 2:50 AM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote: The real issue is that the costing estimates need to be accurate, and that's where the rubber hits the road. Can you send stats

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Greg Stark
On Wed, Oct 12, 2011 at 2:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think it's overkill, and possibly unpleasantly unstable as well. For the initial attack on this we should just have VACUUM and ANALYZE count the number of all-visible blocks and store that in pg_class along with the

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Robert Haas
On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: What bothers me considerably more is the issue about how specific queries might see an all-visible fraction that's very substantially different from the table's overall ratio, especially in examples such as historical-data

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Greg Stark st...@mit.edu writes: Assuming you're in a steady-state situation the amount of all-visible blocks will fluctuate from a high just after vacuum to a low just before the next vacuum. There are other ways a block can be marked all-visible but for the most part I would expect the

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: What bothers me considerably more is the issue about how specific queries might see an all-visible fraction that's very substantially different from the table's overall ratio, -

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Aidan Van Dyk
On Wed, Oct 12, 2011 at 10:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: - Suppose the table has a million rows and we're going to read 100 of them, or 0.01%.  Now it might appear that a covering index has a negligible advantage over a non-covering index, but in fact I think we still want to err

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Greg Stark
On Wed, Oct 12, 2011 at 3:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: What I suggest as a first cut for that is: simply derate the visibility fraction as the fraction of the table expected to be scanned gets smaller. I think there's a statistically more rigorous way of accomplishing the same

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Wed, Oct 12, 2011 at 3:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: The problem is precisely that the pages a query is going to read are likely to *not* be a random sample, but to be correlated with recently-dirtied pages. Sure, but I was suggesting aiming

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Kevin Grittner
Aidan Van Dyk ai...@highrise.ca wrote: The elephant in the room is that the index-only-scan really doesn't save a *whole* lot if the heap pages are already in shared buffers. It's not hard to create a simple test case where it's about three times slower to go to cached heap pages than to

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Greg Stark
On Wed, Oct 12, 2011 at 4:26 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: But it matters a *lot* when they heap pages are not in shared buffers Yeah, obviously it matters more if you actually need to add a random disk read. To be fair the indexes are also random I/O. So the case

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Aidan Van Dyk ai...@highrise.ca writes: The elephant in the room is that the index-only-scan really doesn't save a *whole* lot if the heap pages are already in shared buffers. But it matters a *lot* when they heap pages are not in shared buffers (both ways, saving IO, or causing lots of random

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Robert Haas
On Wed, Oct 12, 2011 at 10:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: What bothers me considerably more is the issue about how specific queries might see an all-visible fraction that's

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: I'm not concerned about an index scan vs. a sequential scan here. I'm concerned about it being impossible for the DBA to get an index-only scan when s/he wants it very badly. The current (stupid) formula handles this case just about perfectly - it

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Robert Haas
On Wed, Oct 12, 2011 at 11:59 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I'm not concerned about an index scan vs. a sequential scan here.  I'm concerned about it being impossible for the DBA to get an index-only scan when s/he wants it very badly.  The

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Garick Hamlin
On Wed, Oct 12, 2011 at 03:16:54PM +0100, Greg Stark wrote: On Wed, Oct 12, 2011 at 2:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think it's overkill, and possibly unpleasantly unstable as well. For the initial attack on this we should just have VACUUM and ANALYZE count the number of

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread jesper
Jeff Janes wrote: Kevin Grittner wrote: create table t (id int not null primary key); insert into t select generate_series(1, 100); vacuum freeze analyze; explain analyze select count(*) from t where id between 50 and 500010; That gives you an index-only scan; but without the

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Robert Haas
On Mon, Oct 10, 2011 at 3:15 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: Right now, our costing model for index-only scans is pretty dumb. It assumes that using an index-only scan will avoid 10% of the heap fetches.  That could easily be low,

Re: table/index options | was: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Robert Haas
On Mon, Oct 10, 2011 at 3:16 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2011/10/10 Robert Haas robertmh...@gmail.com: On Mon, Oct 10, 2011 at 1:36 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: That gives you an index-only scan; but without the WHERE clause it uses

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Bruce Momjian
Greg Stark wrote: On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: My intention was to allow it to consider any covering index. ?You're thinking about the cost estimate, which is really entirely different. Is there any reason to consider more than one? I would have

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 12:43 PM, Bruce Momjian br...@momjian.us wrote: Greg Stark wrote: On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: My intention was to allow it to consider any covering index. ?You're thinking about the cost estimate, which is really entirely

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Simon Riggs
On Tue, Oct 11, 2011 at 5:45 AM, Greg Stark st...@mit.edu wrote: On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: My intention was to allow it to consider any covering index.  You're thinking about the cost estimate, which is really entirely different. Is there any reason

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Josh Berkus
The trouble is that if we VACUUM and then ANALYZE, we'll often get back a value very close to 100%, but then the real value may diminish quite a bit before the next auto-analyze fires. I think if we can figure out what to do about that problem we'll be well on our way... It's not so much an

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 3:18 PM, Josh Berkus j...@agliodbs.com wrote: The trouble is that if we VACUUM and then ANALYZE, we'll often get back a value very close to 100%, but then the real value may diminish quite a bit before the next auto-analyze fires.  I think if we can figure out what to

[HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Bruce Momjian
I talked to Robert Haas and he said that index-only scans do not optimize COUNT(*). Is this something we can do for PG 9.2? Is anyone working on this? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Thom Brown
On 10 October 2011 18:23, Bruce Momjian br...@momjian.us wrote: I talked to Robert Haas and he said that index-only scans do not optimize COUNT(*).  Is this something we can do for PG 9.2?  Is anyone working on this? Yes it does, provided that there is an appropriate WHERE clause. But yes, I

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Greg Stark
On Mon, Oct 10, 2011 at 6:23 PM, Bruce Momjian br...@momjian.us wrote: I talked to Robert Haas and he said that index-only scans do not optimize COUNT(*).  Is this something we can do for PG 9.2?  Is anyone working on this? People usually conflate multiple problems when they talk about

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Kevin Grittner
Bruce Momjian br...@momjian.us wrote: I talked to Robert Haas and he said that index-only scans do not optimize COUNT(*). Is this something we can do for PG 9.2? Is anyone working on this? Well, it's not that it doesn't optimize COUNT(*) -- it's that it doesn't yet cost the index scan as

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Robert Haas
On Mon, Oct 10, 2011 at 1:36 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Bruce Momjian br...@momjian.us wrote: I talked to Robert Haas and he said that index-only scans do not optimize COUNT(*).  Is this something we can do for PG 9.2?  Is anyone working on this? Well, it's not

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: Bruce Momjian br...@momjian.us wrote: I talked to Robert Haas and he said that index-only scans do not optimize COUNT(*). Is this something we can do for PG 9.2? Is anyone working on this? Well, it's not that it doesn't optimize COUNT(*)

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: Right now, our costing model for index-only scans is pretty dumb. It assumes that using an index-only scan will avoid 10% of the heap fetches. That could easily be low, and on an insert-only table or one where only the recently-updated rows are

table/index options | was: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Cédric Villemain
2011/10/10 Robert Haas robertmh...@gmail.com: On Mon, Oct 10, 2011 at 1:36 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: That gives you an index-only scan; but without the WHERE clause it uses a seq scan.  I think it's mainly a matter of doing enough benchmarks to figure out how best

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: I think what Robert is complaining about is that we won't currently consider an index that matches neither any WHERE clauses nor ORDER BY, ie, count(*) over the whole table won't get considered for an index-only scan, regardless of cost estimates. I guess

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: I think what Robert is complaining about is that we won't currently consider an index that matches neither any WHERE clauses nor ORDER BY, ie, count(*) over the whole table won't get considered for an

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Jeff Janes
On Mon, Oct 10, 2011 at 10:36 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Bruce Momjian br...@momjian.us wrote: I talked to Robert Haas and he said that index-only scans do not optimize COUNT(*).  Is this something we can do for PG 9.2?  Is anyone working on this? Well, it's not

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Greg Stark
On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: My intention was to allow it to consider any covering index.  You're thinking about the cost estimate, which is really entirely different. Is there any reason to consider more than one? I would have expected the narrowest one

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Kevin Grittner
Jeff Janes wrote: Kevin Grittner wrote: create table t (id int not null primary key); insert into t select generate_series(1, 100); vacuum freeze analyze; explain analyze select count(*) from t where id between 50 and 500010; That gives you an index-only scan; but without the