Re: [HACKERS] COUNT(*) and index-only scans

2011-12-14 Thread Robert Haas
On Wed, Dec 14, 2011 at 6:58 AM, Greg Stark wrote: > On Mon, Nov 21, 2011 at 6:43 PM, Robert Haas wrote: >> In buffer fill mode, we scan the index and add matching tuples and >> their CTIDs to the buffer.  When the buffer is full or the index AM >> reports that there are no more tuples in the sca

Re: [HACKERS] COUNT(*) and index-only scans

2011-12-14 Thread Greg Stark
On Mon, Nov 21, 2011 at 6:43 PM, Robert Haas wrote: > In buffer fill mode, we scan the index and add matching tuples and > their CTIDs to the buffer.  When the buffer is full or the index AM > reports that there are no more tuples in the scan, we switch to buffer > drain mode. Instead you could d

Re: [HACKERS] COUNT(*) and index-only scans

2011-11-21 Thread Robert Haas
On Sat, Nov 19, 2011 at 11:22 AM, Thom Brown wrote: > While I accept that maybe adapting the existing bitmap index scan > functionality isn't necessarily desirable, would it be feasible to > create a corresponding bitmap index-only scan method. I've been thinking about this a bit more; I wonder w

Re: [HACKERS] COUNT(*) and index-only scans

2011-11-19 Thread Thom Brown
On 19 November 2011 16:08, Tom Lane wrote: > Thom Brown writes: >> So is there a chance of getting bitmap index-only scans? > > Don't hold your breath.  It seems like a huge increment of complexity > for probably very marginal gains.  The point of a bitmap scan (as > opposed to regular indexscan)

Re: [HACKERS] COUNT(*) and index-only scans

2011-11-19 Thread Tom Lane
Thom Brown writes: > So is there a chance of getting bitmap index-only scans? Don't hold your breath. It seems like a huge increment of complexity for probably very marginal gains. The point of a bitmap scan (as opposed to regular indexscan) is to reduce heap accesses by combining visits to the

Re: [HACKERS] COUNT(*) and index-only scans

2011-11-18 Thread Thom Brown
On 12 October 2011 17:26, Robert Haas wrote: > On Wed, Oct 12, 2011 at 11:59 AM, Tom Lane wrote: >> The place where the decision is actually somewhat hard, IMO, is where >> you're pulling a small part of the table but significantly more than one >> row, and the traditional best choice would be a

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-15 Thread Jeff Janes
On Mon, Oct 10, 2011 at 9:48 PM, Kevin Grittner wrote: >> Jeff Janes  wrote: >> Kevin Grittner  wrote: > >>> create table t (id int not null primary key); >>> insert into t select generate_series(1, 100); >>> vacuum freeze analyze; >>> explain analyze select count(*) from t >>> where id betwee

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Garick Hamlin
On Wed, Oct 12, 2011 at 03:16:54PM +0100, Greg Stark wrote: > On Wed, Oct 12, 2011 at 2:52 PM, Tom Lane wrote: > > I think it's overkill, and possibly unpleasantly unstable as well. > > For the initial attack on this we should just have VACUUM and ANALYZE > > count the number of all-visible blocks

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Robert Haas
On Wed, Oct 12, 2011 at 11:59 AM, Tom Lane wrote: > Robert Haas writes: >> I'm not concerned about an index scan vs. a sequential scan here.  I'm >> concerned about it being impossible for the DBA to get an index-only >> scan when s/he wants it very badly.  The current (stupid) formula >> handles

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Robert Haas writes: > I'm not concerned about an index scan vs. a sequential scan here. I'm > concerned about it being impossible for the DBA to get an index-only > scan when s/he wants it very badly. The current (stupid) formula > handles this case just about perfectly - it will prefer a smalle

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Robert Haas
On Wed, Oct 12, 2011 at 10:37 AM, Tom Lane wrote: > Robert Haas writes: >> On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane wrote: >>> What bothers me considerably more is the issue about how specific >>> queries might see an all-visible fraction that's very substantially >>> different from the table's

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Aidan Van Dyk writes: > The elephant in the room is that the index-only-scan really doesn't > save a *whole* lot if the heap pages are already in shared buffers. > But it matters a *lot* when they heap pages are not in shared buffers > (both ways, saving IO, or causing lots of random IO) > Can we

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Greg Stark
On Wed, Oct 12, 2011 at 4:26 PM, Kevin Grittner wrote: >> But it matters a *lot* when they heap pages are not in shared >> buffers > > Yeah, obviously it matters more if you actually need to add a random > disk read. To be fair the indexes are also random I/O. So the case that really matters is w

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Kevin Grittner
Aidan Van Dyk wrote: > The elephant in the room is that the index-only-scan really > doesn't save a *whole* lot if the heap pages are already in shared > buffers. It's not hard to create a simple test case where it's about three times slower to go to cached heap pages than to use the values fr

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Greg Stark writes: > On Wed, Oct 12, 2011 at 3:29 PM, Tom Lane wrote: >> The problem is precisely that the pages a query is going to read are >> likely to *not* be a random sample, but to be correlated with >> recently-dirtied pages. > Sure, but I was suggesting aiming for the nth percentile rat

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Greg Stark
On Wed, Oct 12, 2011 at 3:29 PM, Tom Lane wrote: >>> What I suggest as a first cut for that is: simply derate the visibility >>> fraction as the fraction >>> of the table expected to be scanned gets smaller. > >> I think there's a statistically more rigorous way of accomplishing the >> same thing

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Aidan Van Dyk
On Wed, Oct 12, 2011 at 10:37 AM, Tom Lane wrote: >> - Suppose the table has a million rows and we're going to read 100 of >> them, or 0.01%.  Now it might appear that a covering index has a >> negligible advantage over a non-covering index, but in fact I think we >> still want to err on the side

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Robert Haas writes: > On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane wrote: >> What bothers me considerably more is the issue about how specific >> queries might see an all-visible fraction that's very substantially >> different from the table's overall ratio, > - Suppose VACUUM processes the table a

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Greg Stark writes: > Assuming you're in a steady-state situation the amount of all-visible > blocks will fluctuate from a high just after vacuum to a low just > before the next vacuum. There are other ways a block can be marked > all-visible but for the most part I would expect the fraction to go

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Robert Haas
On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane wrote: > What bothers me considerably more is the issue about how specific > queries might see an all-visible fraction that's very substantially > different from the table's overall ratio, especially in examples such as > historical-data tables where most

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Greg Stark
On Wed, Oct 12, 2011 at 2:52 PM, Tom Lane wrote: > I think it's overkill, and possibly unpleasantly unstable as well. > For the initial attack on this we should just have VACUUM and ANALYZE > count the number of all-visible blocks and store that in pg_class along > with the tuple-count statistics.

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Tom Lane
Robert Haas writes: > On Wed, Oct 12, 2011 at 2:50 AM, Jeff Davis wrote: >> On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote: >>> The real issue is that the costing estimates need to be accurate, and >>> that's where the rubber hits the road. >> Can you send stats messages to keep track when

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-12 Thread Robert Haas
On Wed, Oct 12, 2011 at 2:50 AM, Jeff Davis wrote: > On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote: >> The real issue is that the costing estimates need to be accurate, and >> that's where the rubber hits the road.  Otherwise, even if we pick the >> right way to scan the table, we may do si

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Jeff Davis
On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote: > The real issue is that the costing estimates need to be accurate, and > that's where the rubber hits the road. Otherwise, even if we pick the > right way to scan the table, we may do silly things up the line when > we go to start constructing

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 3:18 PM, Josh Berkus wrote: > >> The trouble is that if we VACUUM and then ANALYZE, we'll often get >> back a value very close to 100%, but then the real value may diminish >> quite a bit before the next auto-analyze fires.  I think if we can >> figure out what to do about

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Josh Berkus
> The trouble is that if we VACUUM and then ANALYZE, we'll often get > back a value very close to 100%, but then the real value may diminish > quite a bit before the next auto-analyze fires. I think if we can > figure out what to do about that problem we'll be well on our way... It's not so much

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Simon Riggs
On Tue, Oct 11, 2011 at 5:45 AM, Greg Stark wrote: > On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane wrote: >> My intention was to allow it to consider any covering index.  You're >> thinking about the cost estimate, which is really entirely different. >> > > Is there any reason to consider more than o

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 12:43 PM, Bruce Momjian wrote: > Greg Stark wrote: >> On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane wrote: >> > My intention was to allow it to consider any covering index. ?You're >> > thinking about the cost estimate, which is really entirely different. >> > >> >> Is there a

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Bruce Momjian
Greg Stark wrote: > On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane wrote: > > My intention was to allow it to consider any covering index. ?You're > > thinking about the cost estimate, which is really entirely different. > > > > Is there any reason to consider more than one? I would have expected > th

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Robert Haas
On Mon, Oct 10, 2011 at 3:15 PM, Kevin Grittner wrote: > Robert Haas wrote: > >> Right now, our costing model for index-only scans is pretty dumb. >> It assumes that using an index-only scan will avoid 10% of the >> heap fetches.  That could easily be low, and on an insert-only >> table or one wh

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread jesper
>> Jeff Janes wrote: >> Kevin Grittner wrote: > >>> create table t (id int not null primary key); >>> insert into t select generate_series(1, 100); >>> vacuum freeze analyze; >>> explain analyze select count(*) from t >>> where id between 50 and 500010; >>> >>> That gives you an index-onl

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Kevin Grittner
> Jeff Janes wrote: > Kevin Grittner wrote: >> create table t (id int not null primary key); >> insert into t select generate_series(1, 100); >> vacuum freeze analyze; >> explain analyze select count(*) from t >> where id between 50 and 500010; >> >> That gives you an index-only scan; b

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Greg Stark
On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane wrote: > My intention was to allow it to consider any covering index.  You're > thinking about the cost estimate, which is really entirely different. > Is there any reason to consider more than one? I would have expected the narrowest one to be the best c

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Jeff Janes
On Mon, Oct 10, 2011 at 10:36 AM, Kevin Grittner wrote: > Bruce Momjian wrote: >> I talked to Robert Haas and he said that index-only scans do not >> optimize COUNT(*).  Is this something we can do for PG 9.2?  Is >> anyone working on this? > > Well, it's not that it doesn't optimize COUNT(*) --

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Tom Lane
"Kevin Grittner" writes: > Tom Lane wrote: >> I think what Robert is complaining about is that we won't >> currently consider an index that matches neither any WHERE clauses >> nor ORDER BY, ie, count(*) over the whole table won't get >> considered for an index-only scan, regardless of cost estim

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Kevin Grittner
Tom Lane wrote: > I think what Robert is complaining about is that we won't > currently consider an index that matches neither any WHERE clauses > nor ORDER BY, ie, count(*) over the whole table won't get > considered for an index-only scan, regardless of cost estimates. I guess the trick woul

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Kevin Grittner
Robert Haas wrote: > Right now, our costing model for index-only scans is pretty dumb. > It assumes that using an index-only scan will avoid 10% of the > heap fetches. That could easily be low, and on an insert-only > table or one where only the recently-updated rows are routinely > accessed,

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Tom Lane
"Kevin Grittner" writes: > Bruce Momjian wrote: >> I talked to Robert Haas and he said that index-only scans do not >> optimize COUNT(*). Is this something we can do for PG 9.2? Is >> anyone working on this? > Well, it's not that it doesn't optimize COUNT(*) -- it's that it > doesn't yet cost

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Robert Haas
On Mon, Oct 10, 2011 at 1:36 PM, Kevin Grittner wrote: > Bruce Momjian wrote: >> I talked to Robert Haas and he said that index-only scans do not >> optimize COUNT(*).  Is this something we can do for PG 9.2?  Is >> anyone working on this? > > Well, it's not that it doesn't optimize COUNT(*) -- i

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Kevin Grittner
Bruce Momjian wrote: > I talked to Robert Haas and he said that index-only scans do not > optimize COUNT(*). Is this something we can do for PG 9.2? Is > anyone working on this? Well, it's not that it doesn't optimize COUNT(*) -- it's that it doesn't yet cost the index scan as cheaper than a t

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Greg Stark
On Mon, Oct 10, 2011 at 6:23 PM, Bruce Momjian wrote: > I talked to Robert Haas and he said that index-only scans do not > optimize COUNT(*).  Is this something we can do for PG 9.2?  Is anyone > working on this? People usually conflate multiple problems when they talk about "count(*). The usual

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Thom Brown
On 10 October 2011 18:23, Bruce Momjian wrote: > I talked to Robert Haas and he said that index-only scans do not > optimize COUNT(*).  Is this something we can do for PG 9.2?  Is anyone > working on this? Yes it does, provided that there is an appropriate WHERE clause. But yes, I think we defin