Re: [HACKERS] Improving count(*)

2005-12-01 Thread Bruce Momjian
Add idea to TODO: * Allow data to be pulled directly from indexes Currently indexes do not have enough tuple visibility information to allow data to be pulled from the index without also accessing the heap. One way to allow this is to set a bit on i

Re: Materialized views (Was Re: [HACKERS] Improving count(*))

2005-11-23 Thread Josh Berkus
Guys, I'm not going to get the university research up before American Thanksgiving. Sorry. Look for it next week. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend

Re: [HACKERS] Improving count(*)

2005-11-22 Thread mark
On Tue, Nov 22, 2005 at 06:11:01PM -0500, Bruce Momjian wrote: > [EMAIL PROTECTED] wrote: > > A solution enhancing the above mentioned indexes, to maintain a count > > for whole index blocks, would allow whole index blocks that satisfy > > the WHERE clause to be counted, assuming the whole index bl

Re: [HACKERS] Improving count(*)

2005-11-22 Thread Jim C. Nasby
On Tue, Nov 22, 2005 at 06:11:01PM -0500, Bruce Momjian wrote: > [EMAIL PROTECTED] wrote: > Jan has been talking about have a bitmap to track pages that need > vacuuming, and I am wondering if the same system could be used to track > the heap-dirty bits. Putting one bit on every 8k disk page means

Re: [HACKERS] Improving count(*)

2005-11-22 Thread Bruce Momjian
Jim C. Nasby wrote: > On Fri, Nov 18, 2005 at 02:56:52PM -0500, Gregory Maxwell wrote: > > However, some great ideas have been proposed here which would not only > > help in that case but would otherwise be quite useful. > > > > *Inclusion of a 'MVCC inflight' bit in indexes which would allow > >

Re: [HACKERS] Improving count(*)

2005-11-22 Thread Bruce Momjian
[EMAIL PROTECTED] wrote: > On Fri, Nov 18, 2005 at 03:46:42PM +, Richard Huxton wrote: > > Simon Riggs wrote: > > >One of the major complaints is always "Select count(*) is slow". > > Although there seem to have been plenty of ideas on this they all seem > > to just provide a solution for the

Re: [HACKERS] Improving count(*)

2005-11-22 Thread Bruce Momjian
Gregory Maxwell wrote: > On 11/21/05, Jim C. Nasby <[EMAIL PROTECTED]> wrote: > > What about Greg Stark's idea of combining Simon's idea of storing > > per-heap-block xmin/xmax with using that information in an index scan? > > ISTM that's the best of everything that's been presented: it allows for

Re: [HACKERS] Improving count(*)

2005-11-21 Thread Gregory Maxwell
On 11/21/05, Jim C. Nasby <[EMAIL PROTECTED]> wrote: > What about Greg Stark's idea of combining Simon's idea of storing > per-heap-block xmin/xmax with using that information in an index scan? > ISTM that's the best of everything that's been presented: it allows for > faster index scans without ad

Re: [HACKERS] Improving count(*)

2005-11-21 Thread Jim C. Nasby
On Fri, Nov 18, 2005 at 02:56:52PM -0500, Gregory Maxwell wrote: > However, some great ideas have been proposed here which would not only > help in that case but would otherwise be quite useful. > > *Inclusion of a 'MVCC inflight' bit in indexes which would allow > skipping MVCC checks in clumps o

Re: [HACKERS] Improving count(*)

2005-11-21 Thread Jim C. Nasby
On Fri, Nov 18, 2005 at 12:08:03AM +, Simon Riggs wrote: > The trouble is, people moan and constantly. Perhaps we should stick to > our guns and say, why do you care? From here, I think we should say, > "show me an application package that needs this so badly we'll change > PostgreSQL just for

Re: Materialized views (Was Re: [HACKERS] Improving count(*))

2005-11-21 Thread Nicolas Barbier
On 11/20/05, Heikki Linnakangas <[EMAIL PROTECTED]> wrote: > On Sat, 19 Nov 2005, Nicolas Barbier wrote: > > > You might want to take a look at the pages that I set up to track the > > progress on my master's thesis: > > > > http://www.nicolas.barbier.easynet.be/itsme/thesis/> > > > > especially t

Re: Materialized views (Was Re: [HACKERS] Improving count(*))

2005-11-20 Thread Heikki Linnakangas
On Sat, 19 Nov 2005, Nicolas Barbier wrote: You might want to take a look at the pages that I set up to track the progress on my master's thesis: http://www.nicolas.barbier.easynet.be/itsme/thesis/> especially the literature page: http://www.nicolas.barbier.easynet.be/itsme/thesis/literature/

Re: Materialized views (Was Re: [HACKERS] Improving count(*))

2005-11-20 Thread Heikki Linnakangas
On Sat, 19 Nov 2005, Josh Berkus wrote: Could you post it to the list? I'd be interested to take a look, though I'm afraid don't have the time to work on it. Yeah, I should put it up on pgFoundry. I'm not sure exactly where, though -- I don't want to launch a new project if it's not going to

Re: Materialized views (Was Re: [HACKERS] Improving count(*))

2005-11-19 Thread Josh Berkus
Heikki, > Could you post it to the list? I'd be interested to take a look, though > I'm afraid don't have the time to work on it. Yeah, I should put it up on pgFoundry. I'm not sure exactly where, though -- I don't want to launch a new project if it's not going to take off. Maybe Bizgres. -

Re: Materialized views (Was Re: [HACKERS] Improving count(*))

2005-11-19 Thread Nicolas Barbier
(CCed to the matview-devel mailing list) On 11/19/05, Heikki Linnakangas <[EMAIL PROTECTED]> wrote: > I've been reading some papers on materialized views lately. Here's some > interesting ones: (snip) You might want to take a look at the pages that I set up to track the progress on my master's

Materialized views (Was Re: [HACKERS] Improving count(*))

2005-11-19 Thread Heikki Linnakangas
On Fri, 18 Nov 2005, Josh Berkus wrote: Alvaro, I guess there must be a query-rewriting mechanism for implementing materialized views.  With that in place we may be able to implement this other thing ...  Is anybody working on materialized views? I have a bundle of academic code designed to

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Josh Berkus
Alvaro, > I guess there must be a query-rewriting mechanism for implementing > materialized views.  With that in place we may be able to implement this > other thing ...  Is anybody working on materialized views? I have a bundle of academic code designed to do exactly this, if any hacker wants t

Re: [HACKERS] Improving count(*)

2005-11-18 Thread mark
On Fri, Nov 18, 2005 at 03:46:42PM +, Richard Huxton wrote: > Simon Riggs wrote: > >One of the major complaints is always "Select count(*) is slow". > Although there seem to have been plenty of ideas on this they all seem > to just provide a solution for the "whole table" case. It might be tha

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Alvaro Herrera
Tom Lane wrote: > Richard Huxton writes: > > Might it be possible to apply rule-style rewriting to a clause of an > > ordinary select query? That is, is it prohibitively expensive to get PG > > to recognise > >SELECT count(*) FROM big_table > > and replace it with > >SELECT sum(summary_c

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Gregory Maxwell
On 11/18/05, Merlin Moncure <[EMAIL PROTECTED]> wrote: > > In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL > > Server) the leaf level of the narrowest index on the table is scanned, > > following a linked list of leaf pages. Leaf pages can be pretty dense > > under Sybase, bec

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Alvaro Herrera
Merlin Moncure wrote: > > In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL > > Server) the leaf level of the narrowest index on the table is scanned, > > following a linked list of leaf pages. Leaf pages can be pretty dense > > under Sybase, because they do use prefix compressi

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Merlin Moncure
> In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL > Server) the leaf level of the narrowest index on the table is scanned, > following a linked list of leaf pages. Leaf pages can be pretty dense > under Sybase, because they do use prefix compression. A count(*) > on a table w

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Tom Lane
Richard Huxton writes: > Might it be possible to apply rule-style rewriting to a clause of an > ordinary select query? That is, is it prohibitively expensive to get PG > to recognise >SELECT count(*) FROM big_table > and replace it with >SELECT sum(summary_count) FROM my_materialised_vie

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Richard Huxton
Simon Riggs wrote: One of the major complaints is always "Select count(*) is slow". Although there seem to have been plenty of ideas on this they all seem to just provide a solution for the "whole table" case. It might be that the solution provides other benefits, but for this one case it doe

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Steve Wampler
Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > >>From here, another proposal. We have a GUC called count_uses_estimate >>that is set to off by default. If set to true, then a count(*) will use >>the planner logic to estimate number of rows in the table and return >>that as the answer,

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Tino Wildenhain
Zeugswetter Andreas DCP SD schrieb: Since that costs, I guess I would make it optional and combine it with materialized views that are automatically used at runtime, and can at the same time answer other aggregates or aggregates for groups. create materialized view xx_agg enable query r

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Zeugswetter Andreas DCP SD
> > Since that costs, I guess I would make it optional and combine it with > > materialized views that are automatically used at runtime, and can at > > the same time answer other aggregates or aggregates for groups. > > create materialized view xx_agg enable query rewrite as select > > count(*

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Tino Wildenhain
Zeugswetter Andreas DCP SD schrieb: The instant someone touches a block it would no longer be marked as frozen (vacuum or analyze or other is not required) and count(*) would visit the tuples in the block making the correct decision at that time. Hmm, so the idea would be that if a block

Re: [HACKERS] Improving count(*)

2005-11-18 Thread Zeugswetter Andreas DCP SD
> > The instant someone touches a block it would no longer be marked as > > frozen (vacuum or analyze or other is not required) and count(*) would > > visit the tuples in the block making the correct decision at that time. > > Hmm, so the idea would be that if a block no longer contained any tu

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Varun Kacholia
> > Seems like a better solution. I can finish the patch pretty soon. I need > > to contact the original author, who has disappeared, but I'll send it over > > to you. > Sounds good. I wondered where he'd gone to. Still here :-) Just got swamped with too much work that the tablesample patch got p

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Simon Riggs
On Fri, 2005-11-18 at 11:51 +1100, Gavin Sherry wrote: > Seems like a better solution. I can finish the patch pretty soon. I need > to contact the original author, who has disappeared, but I'll send it over > to you. Sounds good. I wondered where he'd gone to. Sampling would be useful for 8,2 Be

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Greg Stark
I think the important thing to keep track of is a single bit: Which the following apply? a) all of the tuples in this block are visible b) at least one tuple in this block is in-doubt and may need to be vacuumed That isn't enough to calculate count(*) on its own but it means you could s

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Jeff Davis
[EMAIL PROTECTED] wrote: > Probably obvious, and already mentioned, count(*) isn't the only query > that would benefit from visibility information in the index. It's > rather unfortunate that MVCC requires table lookups, when all values > queried or matched are found in the index key itself. The id

Re: [HACKERS] Improving count(*)

2005-11-17 Thread mark
Probably obvious, and already mentioned, count(*) isn't the only query that would benefit from visibility information in the index. It's rather unfortunate that MVCC requires table lookups, when all values queried or matched are found in the index key itself. The idea of an all index table is appea

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Gavin Sherry
On Fri, 18 Nov 2005, Simon Riggs wrote: > >From here, another proposal. We have a GUC called count_uses_estimate > that is set to off by default. If set to true, then a count(*) will use > the planner logic to estimate number of rows in the table and return > that as the answer, rather than actual

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Dann Corbit
> -Original Message- > From: [EMAIL PROTECTED] [mailto:pgsql-hackers- > [EMAIL PROTECTED] On Behalf Of Tom Lane > Sent: Thursday, November 17, 2005 4:17 PM > To: Simon Riggs > Cc: Kevin Grittner; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Improving coun

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > From here, another proposal. We have a GUC called count_uses_estimate > that is set to off by default. If set to true, then a count(*) will use > the planner logic to estimate number of rows in the table and return > that as the answer, rather than actually

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Simon Riggs
On Thu, 2005-11-17 at 16:30 -0600, Kevin Grittner wrote: > In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL > Server) the leaf level of the narrowest index on the table is scanned, > following a linked list of leaf pages. Leaf pages can be pretty dense > under Sybase, because t

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Tom Lane
Martijn van Oosterhout writes: > Now, lets say you add a field to the tuple which you the position of > the index entry. You can only reasonably do this for one index, say the > primary key. Now you have a two-way link the updating becomes much > quicker, at the cost of even more overhead. I thin

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Kevin Grittner
In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL Server) the leaf level of the narrowest index on the table is scanned, following a linked list of leaf pages. Leaf pages can be pretty dense under Sybase, because they do use prefix compression. A count(*) on a table with 100 mi

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Martijn van Oosterhout
On Thu, Nov 17, 2005 at 09:34:08PM +, Simon Riggs wrote: > Adding visibility to an index would add substantial bulk to any index. > If we could do this at the same time as adding leading key, full field > compression (*not* prefix compression), then it might be worth doing. I think the single

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Simon Riggs
On Thu, 2005-11-17 at 16:34 -0500, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > Bearing in mind other RDBMS' approach is to count the number of rows in > > an index, their cost is probably about the same as scanning table > > blocks/10 very roughly - so the cost is far from zero fo

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > Bearing in mind other RDBMS' approach is to count the number of rows in > an index, their cost is probably about the same as scanning table > blocks/10 very roughly - so the cost is far from zero for them. Really? The impression I get is that people who a

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Simon Riggs
On Thu, 2005-11-17 at 14:46 -0500, Jonah H. Harris wrote: > Nice suggestion, I think it's workable but (like all other methods) > has some technical/pseudo-political challenges. > > I'm still voting for my old, "Much Ado About COUNT(*)" topic; adding > visibiility to the indexes and counting th

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Martijn van Oosterhout
On Thu, Nov 17, 2005 at 02:55:09PM -0500, Rod Taylor wrote: > On Thu, 2005-11-17 at 20:38 +0100, Martijn van Oosterhout wrote: > > It's an interesting idea, but you still run into the issue of > > visibility. If two people start a transaction, one of them inserts a > > row and then both run a selec

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Rod Taylor
On Thu, 2005-11-17 at 20:38 +0100, Martijn van Oosterhout wrote: > On Thu, Nov 17, 2005 at 07:28:10PM +, Simon Riggs wrote: > > One of the major complaints is always "Select count(*) is slow". > > > > I have a somewhat broadbrush idea as to how we might do this (for larger > > tables). > > I

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Jonah H. Harris
Simon,   Nice suggestion, I think it's workable but (like all other methods) has some technical/pseudo-political challenges.   I'm still voting for my old, "Much Ado About COUNT(*)" topic; adding visibiility to the indexes and counting them like the other RDBMS vendors.  True, it would add storage

Re: [HACKERS] Improving count(*)

2005-11-17 Thread Martijn van Oosterhout
On Thu, Nov 17, 2005 at 07:28:10PM +, Simon Riggs wrote: > One of the major complaints is always "Select count(*) is slow". > > I have a somewhat broadbrush idea as to how we might do this (for larger > tables). It's an interesting idea, but you still run into the issue of visibility. If two

[HACKERS] Improving count(*)

2005-11-17 Thread Simon Riggs
One of the major complaints is always "Select count(*) is slow". I have a somewhat broadbrush idea as to how we might do this (for larger tables). Previously, we've said that maintaining a running count of the tuples in a table is hard, or at least costly. However, maintaining a partial count ma