Re: [HACKERS] the big picture for index-only scans

2011-09-27 Thread Robert Haas
On Sun, Aug 21, 2011 at 3:13 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 PS. Robert, the LOCKING section in the header comment of visibilitymap.c is
 out-of-date: it claims that the VM bit is cleared after releasing the lock
 on the heap page.

Fixed, along with your other observation a couple of emails upthread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-24 Thread Gokulakannan Somasundaram


 There are extensive comments on this in visibilitymap.c and, in
 heapam.c, heap_xlog_visible().

 I went through the design again and again. I am convinced that this should
not have any functional bugs and should not cause much performance issues.
Nice thoughts on bypassing the WAL Logging.

Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-21 Thread Heikki Linnakangas

On 21.08.2011 07:10, Gokulakannan Somasundaram wrote:

d) In addition, currently there is no WAL Logging, while the bit is

cleared,

which would not be the case in future and hence the exclusive lock held

on

the visibility map is going to be held for a longer time.



This is false and has been false since the visibility map was first

implemented.

I can't understand this. If you are not doing this, then it would cause
consistency issues. Are you saying, we have a crash safe visibility map, but
you don't follow log the change before changing the contents/ WAL
principle. If so, please explain in detail. If you are doing it in the
normal way, then you should be logging the changes before making the changes
to the buffer and during that timeframe, you should be holding the lock on
the buffer. Heikki specifically pointed out, that you have brought in the
WAL Logging of visibility map, within the critical section.


I think you two are talking slightly past each other. There is no extra 
WAL record written when a bit is cleared in the visibility map, there is 
just a flag in the WAL record of the heap insert/update/delete. That is 
what Robert was trying to explain, that part hasn't changed since 8.4. 
What *did* change, however, in master, when the visibility map was made 
crash-safe, is the duration the lock on the visibility map page is held. 
Before that, the visibility map page was locked only briefly *after* the 
changes to the heap page were already applied and WAL record written. 
Now, the VM page lock is acquired and released at the same time as the 
lock on the heap page. It's held while the heap page changes are made 
and WAL record is written. I believe that's what Gokulakannan was trying 
to point out, and is worried that you might get contention on the VM 
page lock now because it's held for a much longer duration.


Gokulakannan, if you could come up with a test case that demonstrates 
that contention (or the lack thereof), that would be good. Otherwise 
we're just speculating.


If it's an issue, perhaps we could release the VM page lock early. We're 
not updating the LSN on it, so we don't need to wait for the WAL record 
to be written, I think. It's a bit out of the ordinary, though, so I 
wouldn't like to do that without an actual test case that shows it's an 
issue.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-21 Thread Heikki Linnakangas

On 21.08.2011 07:41, Gokulakannan Somasundaram wrote:

On Sat, Aug 20, 2011 at 4:57 AM, Gokulakannan Somasundaram


gokul...@gmail.com  wrote:

by your argument, if WALInserLock is held for 't' seconds, you should
definitely be holding visibility map lock for more than time frame 't'.


Nope, that's not how it works.  Perhaps you should read the code.
See, e.g., heap_update().

--


OK. I took a look at the patch you have supplied in
http://postgresql.1045698.n5.nabble.com/crash-safe-visibility-map-take-five-td4377235.html


Here's the patch as it was committed: 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=503c7305a1e379f95649eef1a694d0c1dbdc674a



There is a code like this.

  {
  all_visible_cleared = true;
  PageClearAllVisible(BufferGetPage(buffer));
+visibilitymap_clear(relation,
+ItemPointerGetBlockNumber((heaptup-t_self)),
+vmbuffer);
  }

Here, you are not making an entry into the WAL. then there is an assumption
that the two bits will be in sync without any WAL entry. There is a chance
that the visibility map might be affected by partial page writes, where
clearing of a particular bit might not have been changed. And i am thinking
a lot of such issues. Can you just explain the background logic behind
ignoring the principle of WAL logging? What are the implemented principles,
that protect the Visibility map pages??


The all_visible_cleared flag is included in the WAL record of the insert 
(or update or delete). Partial page writes are not a problem, because we 
always fetch the VM page and clear the bit, regardless of the LSN on the 
VM page.


PS. Robert, the LOCKING section in the header comment of visibilitymap.c 
is out-of-date: it claims that the VM bit is cleared after releasing the 
lock on the heap page.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-21 Thread Gokulakannan Somasundaram
 The all_visible_cleared flag is included in the WAL record of the insert
(or update or delete). Partial page writes are not a problem, because we
 always fetch the VM page and clear the bit, regardless of the LSN on the
VM page.


Two things
a) First, my understanding of checkpoint is that it flushes all the pages,
that got changed below a particular LSN. If we are not having a LSN in the
visibility map, how we will be sure, that a visibility map page is getting
flushed/not? With the following approach, i can see only one issue. If the
heap page gets written and checkpointed and the visibility map doesn't get
synced during the checkpoint, then there is an issue. Can you please explain
me, how we get the assurance?

b) Even if we have a contention issue, Visibility map is a solution for a
considerable number of database scenarios. But it should not become a
default package. A table, with no chance of index-only scans should not
incur the extra overhead of crash safe visibility maps. Those tables should
be sparred from this extra overhead, as they don't have index only scans.

Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-21 Thread Gokulakannan Somasundaram

 a) First, my understanding of checkpoint is that it flushes all the pages,
 that got changed below a particular LSN. If we are not having a LSN in the
 visibility map, how we will be sure, that a visibility map page is getting
 flushed/not?


Please ignore this statement.  I confused between the checkpoints
implemented in Postgres and some other database.

Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-21 Thread Robert Haas
On Sun, Aug 21, 2011 at 12:10 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 Consider the TPC-C benchmark. Currently on a four core box, Oracle does
 29 transactions per minute. Say we are doing something around half of
 this. So a page should get quickly filled. If a vacuum runs and the
 transactions are not touched by the MakePayment transaction, then it will be
 marked as AllVisible. When the MakePayment transaction updates, it should
 clear that bit. If we test it out, with too little warehouses, we may not
 see the effect. Of Course the PGPROC infrastructure for generating
 transaction ids might be the biggest culprit, but if you ignore that this
 should be visible.

Actual testing does not appear to show a bottleneck in either of these areas.

 Maybe.  Of course, we're only talking about cases where the index-only
 scan optimization is in use (which is not all of them).

 But are you saying that, the optimization of looking at visibility map will
 happen only for Index-only scans and will not use the visibility map
 infrastructure for the normal index scans? That's definitely a good idea and
 improvement.

Well, yes.  And not only that, but the index-only scans patch has been
posted on this very mailing list, along with detailed submission
notes.  So you could go read, or try it, before jumping to the
conclusion that it doesn't work.

 d) In addition, currently there is no WAL Logging, while the bit is
 cleared,
 which would not be the case in future and hence the exclusive lock held
 on
 the visibility map is going to be held for a longer time.

 This is false and has been false since the visibility map was first
 implemented.

 I can't understand this. If you are not doing this, then it would cause
 consistency issues. Are you saying, we have a crash safe visibility map, but
 you don't follow log the change before changing the contents/ WAL
 principle. If so, please explain in detail. If you are doing it in the
 normal way, then you should be logging the changes before making the changes
 to the buffer and during that timeframe, you should be holding the lock on
 the buffer. Heikki specifically pointed out, that you have brought in the
 WAL Logging of visibility map, within the critical section.

 Heikki's comments, i am quoting:
  I believe Robert's changes to make the visibility map crash-safe covers
 that. Clearing the bit in the visibility map now happens within the same
 critical section as clearing the flag on the heap page and writing the WAL
 record.

 I would be able to respond to your other statements, once we are clear here.

There are extensive comments on this in visibilitymap.c and, in
heapam.c, heap_xlog_visible().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
 The above could already happen in 8.4, where the visibility map was
 introduced. The contention on the VM buffer would be just as bad whether you
 hold the heap page lock at the same time or not. I have not heard any
 complaints of contention on VM buffers.

 --
  Heikki Linnakangas


 a) First of all, it(Visibility Map) should have definitely affected the
scalability of postgres in scenarios where in updates occur during a time
batch window. May be the increase in speed of vacuums negate that effect.
b) Second, currently the index scans  don't touch the visibility map and in
future they are going to acquire share lock on that. This should increase
the contention.
c) Your statement : The contention on the VM buffer would be just as bad
whether you hold the heap page lock at the same time or not.
I am talking about the contention time frame of the heap page. It will be
increased Consider my statement in conjunction with the scenario 2.
d) In addition, currently there is no WAL Logging, while the bit is cleared,
which would not be the case in future and hence the exclusive lock held on
the visibility map is going to be held for a longer time.

You might definitely see some performance improvement, if you are testing
this in anything less than 4 cores. Bump up the core count and processor
count and check whether this affects the load-throughput curve.

Thanks,
Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
 Hmm, you have a point.  If 100 backends simultaneously write to 100
 different pages, and all of those pages are all-visible, then it's
 possible that they could end up fighting over the buffer content lock
 on the visibility map page.  But why would you expect that to matter?
 In a heavily updated table, the proportion of visibility map bits that
 are set figures to be quite low, since they're only set during VACUUM.
  To have 100 backends simultaneously pick different pages to write
 each of which is all-visible seems really unlucky.   Even if it does
 happen from time to time, I suspect the effects would be largely
 masked by WALInsertLock contention.  The visibility map content lock
 is only taken very briefly, whereas the operations protected by
 WALInsertLock are much more complex.


by your argument, if WALInserLock is held for 't' seconds, you should
definitely be holding visibility map lock for more than time frame 't' . So
the index scans have to wait for acquiring the lock on visibility map to
check visibility. What we are trading off here is Synchronization Vs I/O.
Should we lose the scalability for performance??

Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 4:48 PM, Gokulakannan Somasundaram 
gokul...@gmail.com wrote:


 The above could already happen in 8.4, where the visibility map was
 introduced. The contention on the VM buffer would be just as bad whether you
 hold the heap page lock at the same time or not. I have not heard any
 complaints of contention on VM buffers.

 --
  Heikki Linnakangas


 a) First of all, it(Visibility Map) should have definitely affected the
 scalability of postgres in scenarios where in updates occur during a time
 batch window. May be the increase in speed of vacuums negate that effect.
 b) Second, currently the index scans  don't touch the visibility map and in
 future they are going to acquire share lock on that. This should increase
 the contention.
 c) Your statement : The contention on the VM buffer would be just as bad
 whether you hold the heap page lock at the same time or not.
 I am talking about the contention time frame of the heap page. It will be
 increased Consider my statement in conjunction with the scenario 2.
 d) In addition, currently there is no WAL Logging, while the bit is
 cleared, which would not be the case in future and hence the exclusive lock
 held on the visibility map is going to be held for a longer time.

 You might definitely see some performance improvement, if you are testing
 this in anything less than 4 cores. Bump up the core count and processor
 count and check whether this affects the load-throughput curve.


By your argument, we can say that no-one will create a index with a function
like random(), time(), date(), broken operators etc. Its hard to imagine a
index in which a a user will only want to insert and never select on it.
Even C++ provides std::map infrastructure for objects, where the user owns
the responsibility of writing the operator correctly. Other databases do
the same. Even going one step ahead, we are already going back to such
indexes, if there is unique constraint/ referential integrity constraints.
But with all such caveats, it was decided we should not allow covering
indexes, as they require going back to the index for updates/deletes.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Robert Haas
On Sat, Aug 20, 2011 at 4:48 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 a) First of all, it(Visibility Map) should have definitely affected the
 scalability of postgres in scenarios where in updates occur during a time
 batch window. May be the increase in speed of vacuums negate that effect.

I think that you have switched gears here and are in this paragraph
talking about the 8.4-era visibility map changes rather than my recent
work.  There is zero evidence that those changes caused any
performance problem.  I've spent a large chunk of the last four months
looking for scalability problems and I haven't come across any
evidence that this is an issue.  If you think it is, let's have a test
case.

 b) Second, currently the index scans  don't touch the visibility map and in
 future they are going to acquire share lock on that. This should increase
 the contention.

Maybe.  Of course, we're only talking about cases where the index-only
scan optimization is in use (which is not all of them).  And even
then, if bringing in the heap pages would have caused buffer evictions
and the index-only scan avoids that, then you're trading contention
for exclusive locks on the BufFreelistLock and buf mapping locks for
shared-lock contention on the visibility map page.  Furthermore, the
latter will (except in very large relations) only need to be
shared-locked and pinned once per scan, whereas you might easily have
a case where each index probe forces replacement of a heap page.

What I am slightly worried about is that our machinery for taking and
releasing buffer pins is going to become a scalability bottleneck at
some point, and certainly very hot pages like root index pages and
visibility map pages could become hot-spots for such contention.  But
the experiments I've done so far suggest that there are other more
serious bottlenecks that have to be addressed first.  If it does rise
to the list, we'll find a way to fix it, but I think skipping the
index-only scan optimization is going to be a cure worse than the
disease.

 c) Your statement : The contention on the VM buffer would be just as bad
 whether you hold the heap page lock at the same time or not.
 I am talking about the contention time frame of the heap page. It will be
 increased Consider my statement in conjunction with the scenario 2.

Sure, but here again, what is your evidence that this actually
matters?  It's not increasing by very much.

 d) In addition, currently there is no WAL Logging, while the bit is cleared,
 which would not be the case in future and hence the exclusive lock held on
 the visibility map is going to be held for a longer time.

This is false and has been false since the visibility map was first implemented.

 You might definitely see some performance improvement, if you are testing
 this in anything less than 4 cores. Bump up the core count and processor
 count and check whether this affects the load-throughput curve.

I'm fairly certain I already did that experiment, on a 32-core
machine, but since the patch you're worried about went in two months
ago, I'm a little fuzzy on the details.  Maybe you should test it out
yourself

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Robert Haas
On Sat, Aug 20, 2011 at 4:57 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 by your argument, if WALInserLock is held for 't' seconds, you should
 definitely be holding visibility map lock for more than time frame 't'.

Nope, that's not how it works.  Perhaps you should read the code.
See, e.g., heap_update().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Robert Haas
On Sat, Aug 20, 2011 at 5:06 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 By your argument, we can say that no-one will create a index with a function
 like random(), time(), date(), broken operators etc. Its hard to imagine a
 index in which a a user will only want to insert and never select on it.

The whole point of this optimization is to make index access cheaper,
not more expensive.  You seem convinced it's done the opposite, but
you haven't offered much evidence (or any test results) to support
that proposition.

 Even C++ provides std::map infrastructure for objects, where the user owns
 the responsibility of writing the operator correctly. Other databases do
 the same. Even going one step ahead, we are already going back to such
 indexes, if there is unique constraint/ referential integrity constraints.
 But with all such caveats, it was decided we should not allow covering
 indexes, as they require going back to the index for updates/deletes.

What we decided NOT to do is put xmin/xmax/cid into the index tuple,
for precisely the reasons you mention.  That would be catastrophic
both for write operations to the table, and for the size of the index.
 This approach is appealing precisely because a single visibility map
page can cover a very large chunk of the heap.  Is there a potential
problem buried somewhere in there?  Maybe.  But if there is, I'm
pretty sure it's something far less obvious than what you seem to be
worried about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
   I think that you have switched gears here and are in this paragraph

 talking about the 8.4-era visibility map changes rather than my recent
 work.  There is zero evidence that those changes caused any
 performance problem.  I've spent a large chunk of the last four months
 looking for scalability problems and I haven't come across any
 evidence that this is an issue.  If you think it is, let's have a test
 case.


Consider the TPC-C benchmark. Currently on a four core box, Oracle does
29 transactions per minute. Say we are doing something around half of
this. So a page should get quickly filled. If a vacuum runs and the
transactions are not touched by the MakePayment transaction, then it will be
marked as AllVisible. When the MakePayment transaction updates, it should
clear that bit. If we test it out, with too little warehouses, we may not
see the effect. Of Course the PGPROC infrastructure for generating
transaction ids might be the biggest culprit, but if you ignore that this
should be visible.

Maybe.  Of course, we're only talking about cases where the index-only
 scan optimization is in use (which is not all of them).

But are you saying that, the optimization of looking at visibility map will
happen only for Index-only scans and will not use the visibility map
infrastructure for the normal index scans? That's definitely a good idea and
improvement.

 d) In addition, currently there is no WAL Logging, while the bit is
cleared,
 which would not be the case in future and hence the exclusive lock held
on
 the visibility map is going to be held for a longer time.

 This is false and has been false since the visibility map was first
implemented.

I can't understand this. If you are not doing this, then it would cause
consistency issues. Are you saying, we have a crash safe visibility map, but
you don't follow log the change before changing the contents/ WAL
principle. If so, please explain in detail. If you are doing it in the
normal way, then you should be logging the changes before making the changes
to the buffer and during that timeframe, you should be holding the lock on
the buffer. Heikki specifically pointed out, that you have brought in the
WAL Logging of visibility map, within the critical section.

Heikki's comments, i am quoting:
 I believe Robert's changes to make the visibility map crash-safe covers
that. Clearing the bit in the visibility map now happens within the same
critical section as clearing the flag on the heap page and writing the WAL
record.

I would be able to respond to your other statements, once we are clear here.

Thanks,
Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 4:57 AM, Gokulakannan Somasundaram

 gokul...@gmail.com wrote:
  by your argument, if WALInserLock is held for 't' seconds, you should
  definitely be holding visibility map lock for more than time frame 't'.

 Nope, that's not how it works.  Perhaps you should read the code.
 See, e.g., heap_update().

 --

OK. I took a look at the patch you have supplied in
http://postgresql.1045698.n5.nabble.com/crash-safe-visibility-map-take-five-td4377235.html
There is a code like this.

 {
 all_visible_cleared = true;
 PageClearAllVisible(BufferGetPage(buffer));
+visibilitymap_clear(relation,
+ItemPointerGetBlockNumber((heaptup-t_self)),
+vmbuffer);
 }

Here, you are not making an entry into the WAL. then there is an assumption
that the two bits will be in sync without any WAL entry. There is a chance
that the visibility map might be affected by partial page writes, where
clearing of a particular bit might not have been changed. And i am thinking
a lot of such issues. Can you just explain the background logic behind
ignoring the principle of WAL logging? What are the implemented principles,
that protect the Visibility map pages??

Thanks,
Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
  By your argument, we can say that no-one will create a index with a
 function
  like random(), time(), date(), broken operators etc. Its hard to imagine
 a
  index in which a a user will only want to insert and never select on it.

 The whole point of this optimization is to make index access cheaper,
 not more expensive.  You seem convinced it's done the opposite, but
 you haven't offered much evidence (or any test results) to support
 that proposition.


I hope you are referring to thick indexes/covering indexes/indexes with
snapshot. Why do you think its done the opposite? In fact all the other
databases like Oracle, SQL-Server, Sybase implement the indexes with
snapshot (that's the only one they support). It makes the index tuple larger
by 8 bytes, but avoids the heap-fetch. I think, i ran a couple of
benchmarks, when i submitted the patch and showed the improvement. The
trade-off in that case was simple. Size of the index Vs avoiding a disk I/O.
User still has the choice of creating indexes without snapshot( it was
provided as an optional index).



 What we decided NOT to do is put xmin/xmax/cid into the index tuple,
 for precisely the reasons you mention.  That would be catastrophic
 both for write operations to the table, and for the size of the index.


Why it would be catastrophic for write operations on table?? -please explain
me.
The trade-off in that case was simple. Size of the index Vs avoiding a disk
I/O. There was no catastrophic damage on the size of the index, as far as i
can see.

I made this point, because Heikki pointed out that since no-one is
complaining about some performance problem, and so we can assume that it
doesn't exist. But the thick index proposal was shot down on the context,
some one might create a index on a function like random(), time(). date() or
with broken operators, effectively meaning that you can insert into the
index and cannot select back. We are already doing unique checks and
referential integrity checks on that kind of indexes(which would all be
wrong), but still we should not be working in that area, to help user not
make that mistake of creating such indexes. So we should apply the same
principle for decision making here also.

Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Gokulakannan Somasundaram


 Note that we already have the visibility map, and the accesses needed to
 update it are already there. Granted, we'll have to change the logic
 slightly to make it crash safe, but I don't expect that to add any
 meaningful overhead - the changes are going to be where the bits are set,
 ie. vacuum, not when the bits are cleared. Granted, we might also want to
 set the bits more aggressively once they're used by index-only-scans. *But
 done correctly, just taking advantage of the VM that's already there
 shouldn't add overhead to other operations.*

 I agree that we need to do tests to demonstrate that there's a gain from
 the patch, once we have a patch to test. I would be very surprised if there
 isn't, but that just means the testing is going to be easy.

 --
  Heikki Linnakangas

  EnterpriseDB   http://www.enterprisedb.com


  I could see some arguments supporting this feature, citing covering
indexes as example. But i just want to highlight they are not same.
Visibility map approach is totally not related to the covering indexes
approach, except the intention of avoiding the heap scan. Because of the
small size, we will be having more contentions(People who have worked with
Oracle can take the example of a bitmap index on a OLTP database). I was
making the suggestion previously to make these crash safe visibility maps
optional for a table, so that the overhead, which comes with it, can be
avoided for those tables, which have queries that don't support index only
scans. The fact that the proposal is for crash safe visibility map, to
become a default package of any Postgresql table will definitely have wide
ranging implications on OLTP performance.

Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Robert Haas
On Fri, Aug 19, 2011 at 9:19 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 The fact that the
 proposal is for crash safe visibility map, to become a default package of
 any Postgresql table will definitely have wide ranging implications on OLTP
 performance.

Well, that would certainly be alarming if true, but I don't think it
is.  As far as I can see, the overhead of making the visibility map
crash-safe is just (1) a very small percentage increase in the work
being done by VACUUM and (2) a slight possibility of extra work done
by a foreground process if the visibility map bit changes at almost
exactly the same time the process was about to insert, update, or
delete a tuple.

If someone comes up with a test where this overhead is enough to
measure, then we might need to rethink our whole approach.  Maybe we
would make it an optional feature, or maybe we would just rip it out
and start over with some sort of redesign, or maybe we would look for
other optimizations to counterbalance the additional overhead.  I
don't know.  But as far as I can see you're hypothesizing without
evidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Bruce Momjian
Robert Haas wrote:
  I am happy to have pg_upgrade skip upgrading visibility map files --- it
  already has code to conditionally process them because they only exist
  in = 8.4:
 
  ? ? ? ?/* fsm/vm files added in PG 8.4 */
  ? ? ? ?if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
  ? ? ? ?{
  ? ? ? ? ? ?/*
  ? ? ? ? ? ? * Copy/link any fsm and vm files, if they exist
  ? ? ? ? ? ? */
 
  Just give the word and it will be done.
 
 I hereby give the word.  :-)
 
 Specifically, we need to skip copying vm files (only) if coming from a
 version prior to this commit:
 
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e16954f3d27fa8e16c379ff6623ae18d6250a39c

Done with the attached, applied patch.  There was no cat-version bump
from that commit (because the format didn't change, just the
crash-safeness) so I picked the first cat-version change after this
commit.  This is only a pg_upgrade 9.2+ issue.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index 6def748..a19b3df
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
***
*** 64,69 
--- 64,75 
  #define TABLE_SPACE_SUBDIRS_CAT_VER 20100
  /* postmaster/postgres -b (binary_upgrade) flag added during PG 9.1 development */
  #define BINARY_UPGRADE_SERVER_FLAG_CAT_VER 201104251
+ /*
+  * 	Visibility map changed with this 9.2 commit,
+  *	8f9fe6edce358f7904e0db119416b4d1080a83aa; pick later catalog version.
+  */
+ #define VISIBILITY_MAP_CRASHSAFE_CAT_VER 201107031
+ 
  
  /*
   * Each relation is represented by a relinfo structure.
diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c
new file mode 100644
index d4a420f..df752c5
*** a/contrib/pg_upgrade/relfilenode.c
--- b/contrib/pg_upgrade/relfilenode.c
*** transfer_single_new_db(pageCnvCtx *pageC
*** 120,128 
  	int			numFiles = 0;
  	int			mapnum;
  	int			fileno;
! 
  	old_dir[0] = '\0';
  
  	for (mapnum = 0; mapnum  size; mapnum++)
  	{
  		char		old_file[MAXPGPATH];
--- 120,134 
  	int			numFiles = 0;
  	int			mapnum;
  	int			fileno;
! 	bool		vm_crashsafe_change = false;
! 	
  	old_dir[0] = '\0';
  
+ 	/* Do not copy non-crashsafe vm files for binaries that assume crashsafety */
+ 	if (old_cluster.controldata.cat_ver  VISIBILITY_MAP_CRASHSAFE_CAT_VER 
+ 		new_cluster.controldata.cat_ver = VISIBILITY_MAP_CRASHSAFE_CAT_VER)
+ 		vm_crashsafe_change = true;
+ 	
  	for (mapnum = 0; mapnum  size; mapnum++)
  	{
  		char		old_file[MAXPGPATH];
*** transfer_single_new_db(pageCnvCtx *pageC
*** 168,175 
  
  			for (fileno = 0; fileno  numFiles; fileno++)
  			{
  if (strncmp(namelist[fileno]-d_name, scandir_file_pattern,
! 			strlen(scandir_file_pattern)) == 0)
  {
  	snprintf(old_file, sizeof(old_file), %s/%s, maps[mapnum].old_dir,
  			 namelist[fileno]-d_name);
--- 174,189 
  
  			for (fileno = 0; fileno  numFiles; fileno++)
  			{
+ char *vm_offset = strstr(namelist[fileno]-d_name, _vm);
+ bool is_vm_file = false;
+ 
+ /* Is a visibility map file? (name ends with _vm) */
+ if (vm_offset  strlen(vm_offset) == strlen(_vm))
+ 	is_vm_file = true;
+ 
  if (strncmp(namelist[fileno]-d_name, scandir_file_pattern,
! 			strlen(scandir_file_pattern)) == 0 
! 	(!is_vm_file || !vm_crashsafe_change))
  {
  	snprintf(old_file, sizeof(old_file), %s/%s, maps[mapnum].old_dir,
  			 namelist[fileno]-d_name);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Robert Haas
On Fri, Aug 19, 2011 at 11:22 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
  I am happy to have pg_upgrade skip upgrading visibility map files --- it
  already has code to conditionally process them because they only exist
  in = 8.4:
 
  ? ? ? ?/* fsm/vm files added in PG 8.4 */
  ? ? ? ?if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
  ? ? ? ?{
  ? ? ? ? ? ?/*
  ? ? ? ? ? ? * Copy/link any fsm and vm files, if they exist
  ? ? ? ? ? ? */
 
  Just give the word and it will be done.

 I hereby give the word.  :-)

 Specifically, we need to skip copying vm files (only) if coming from a
 version prior to this commit:

 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e16954f3d27fa8e16c379ff6623ae18d6250a39c

 Done with the attached, applied patch.  There was no cat-version bump
 from that commit (because the format didn't change, just the
 crash-safeness) so I picked the first cat-version change after this
 commit.  This is only a pg_upgrade 9.2+ issue.

Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Gokulakannan Somasundaram


 Well, that would certainly be alarming if true, but I don't think it
 is.  As far as I can see, the overhead of making the visibility map
 crash-safe is just (1) a very small percentage increase in the work
 being done by VACUUM and (2) a slight possibility of extra work done
 by a foreground process if the visibility map bit changes at almost
 exactly the same time the process was about to insert, update, or
 delete a tuple.

 Let's forget the overhead posed by vacuum. Can you please point me to the
design which talks in detail of the second overhead?

Thanks.


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Gokulakannan Somasundaram

 Well, that would certainly be alarming if true, but I don't think it
 is.  As far as I can see, the overhead of making the visibility map
 crash-safe is just (1) a very small percentage increase in the work
 being done by VACUUM and (2) a slight possibility of extra work done
 by a foreground process if the visibility map bit changes at almost
 exactly the same time the process was about to insert, update, or
 delete a tuple.

 Let's forget the overhead posed by vacuum. Can you please point me to the
 design which talks in detail of the second overhead?

 Thanks.


If you are following the same design that Heikki put forward, then there is
a problem with it in maintaining the bits in page and the bits in visibility
map in sync, which we have already discussed.


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Heikki Linnakangas

On 19.08.2011 21:06, Gokulakannan Somasundaram wrote:

If you are following the same design that Heikki put forward, then there is
a problem with it in maintaining the bits in page and the bits in visibility
map in sync, which we have already discussed.


Are you referring to this: 
http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php ? I 
believe Robert's changes to make the visibility map crash-safe covers 
that. Clearing the bit in the visibility map now happens within the same 
critical section as clearing the flag on the heap page and writing th 
WAL record.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 2:25 AM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 19.08.2011 21:06, Gokulakannan Somasundaram wrote:

 If you are following the same design that Heikki put forward, then there
 is
 a problem with it in maintaining the bits in page and the bits in
 visibility
 map in sync, which we have already discussed.


 Are you referring to this: http://archives.postgresql.**
 org/pgsql-hackers/2010-02/**msg02097.phphttp://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php?
  I believe Robert's changes to make the visibility map crash-safe covers
 that. Clearing the bit in the visibility map now happens within the same
 critical section as clearing the flag on the heap page and writing th WAL
 record.

 In that case, say a 100 sessions are trying to update records which fall
under the 8000*4 heap pages( i assume 2 bits per visibility map - 8 * 1024 *
4 exact) covered by one page of visibility map, won't it make the 99
sessions wait for that visibility map while holding the exclusive lock on
the 99 heap pages?
Are we going to say, that these kind of situations occur very rarely? Or
that the loss of scalability in these situations, is worth the performance
during the read-heavy workloads?

In any case, making a database going through all these extra overheads,
while they don't even have any index-only scans!!!  That definitely should
be given a thought.

Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 2:51 AM, Gokulakannan Somasundaram 
gokul...@gmail.com wrote:

 On Sat, Aug 20, 2011 at 2:25 AM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:

 On 19.08.2011 21:06, Gokulakannan Somasundaram wrote:

 If you are following the same design that Heikki put forward, then there
 is
 a problem with it in maintaining the bits in page and the bits in
 visibility
 map in sync, which we have already discussed.


 Are you referring to this: http://archives.postgresql.**
 org/pgsql-hackers/2010-02/**msg02097.phphttp://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php?
  I believe Robert's changes to make the visibility map crash-safe covers
 that. Clearing the bit in the visibility map now happens within the same
 critical section as clearing the flag on the heap page and writing th WAL
 record.

 In that case, say a 100 sessions are trying to update records which fall
 under the 8000*4 heap pages( i assume 2 bits per visibility map - 8 * 1024 *
 4 exact) covered by one page of visibility map, won't it make the 99
 sessions wait for that visibility map while holding the exclusive lock on
 the 99 heap pages?
 Are we going to say, that these kind of situations occur very rarely? Or
 that the loss of scalability in these situations, is worth the performance
 during the read-heavy workloads?

 In any case, making a database going through all these extra overheads,
 while they don't even have any index-only scans!!!  That definitely should
 be given a thought.

 Gokul.


Please consider the update, i mentioned above  occurs and changes the
visibility bit of the page.


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Robert Haas
On Fri, Aug 19, 2011 at 2:51 PM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 On Sat, Aug 20, 2011 at 2:25 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

 On 19.08.2011 21:06, Gokulakannan Somasundaram wrote:

 If you are following the same design that Heikki put forward, then there
 is
 a problem with it in maintaining the bits in page and the bits in
 visibility
 map in sync, which we have already discussed.

 Are you referring to this:
 http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php ? I
 believe Robert's changes to make the visibility map crash-safe covers that.
 Clearing the bit in the visibility map now happens within the same critical
 section as clearing the flag on the heap page and writing th WAL record.

 In that case, say a 100 sessions are trying to update records which fall
 under the 8000*4 heap pages( i assume 2 bits per visibility map - 8 * 1024 *
 4 exact) covered by one page of visibility map,

There are about 8000 visibility map bytes per page, so about 64000
bits, each covering one page.  So a visibility map page covers about
512MB of heap.

 won't it make the 99
 sessions wait for that visibility map while holding the exclusive lock on
 the 99 heap pages?

Hmm, you have a point.  If 100 backends simultaneously write to 100
different pages, and all of those pages are all-visible, then it's
possible that they could end up fighting over the buffer content lock
on the visibility map page.  But why would you expect that to matter?
In a heavily updated table, the proportion of visibility map bits that
are set figures to be quite low, since they're only set during VACUUM.
 To have 100 backends simultaneously pick different pages to write
each of which is all-visible seems really unlucky.   Even if it does
happen from time to time, I suspect the effects would be largely
masked by WALInsertLock contention.  The visibility map content lock
is only taken very briefly, whereas the operations protected by
WALInsertLock are much more complex.

This does, however, remind me of two other points:

1. Heikki's idea of trying to set visibility map bits more
aggressively is probably a good one, but it would be possible to
overdo it, because setting visibility map bits is not free. It has an
immediate cost - in that we have to write xlog - and a deferred cost -
in that it will impose overhead when those pages are re-dirtied.  At
the moment, I think we're probably too far in the opposite direction -
i.e. we leave the visibility map bits unset for too long, leading to a
massive amount of deferred work that gets done all at once when VACUUM
finally runs.  But we shouldn't overcorrect.

2. While we're tinkering with the visibility map, we should think
about whether it makes sense to carve out some more bits for such
purposes as we may in the future require.  Even if we allowed each
heap page a byte in the visibility map instead of a single bit, the
visibility map would still be roughly 1000 times smaller than the
heap; and if there are any situations where the page-level locks
become choke points, this would mitigate that effect.  There might
also be some advantage in that bytes can be atomically set, while bits
can't, although I can't immediately think how we'd leverage that.
Alternatively, we could widen the field to something less than a full
byte, like 2 or 4 bits, if that seems better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Robert Haas
On Fri, Aug 19, 2011 at 4:02 PM, Robert Haas robertmh...@gmail.com wrote:
 Hmm, you have a point.  If 100 backends simultaneously write to 100
 different pages, and all of those pages are all-visible, then it's
 possible that they could end up fighting over the buffer content lock
 on the visibility map page.  But why would you expect that to matter?
 In a heavily updated table, the proportion of visibility map bits that
 are set figures to be quite low, since they're only set during VACUUM.
  To have 100 backends simultaneously pick different pages to write
 each of which is all-visible seems really unlucky.   Even if it does
 happen from time to time, I suspect the effects would be largely
 masked by WALInsertLock contention.  The visibility map content lock
 is only taken very briefly, whereas the operations protected by
 WALInsertLock are much more complex.

Oh, snap.  I see another possible problem here.

At the time visibilitymap_clear() is called, we're already (and
necessarily) holding a content lock on the buffer.  And then we go get
a content lock on the visibility map page, whose buffer number might
be higher or lower than that of the heap page, possibly leading us to
violate the rule the buffer content locks must be taken increasing
buffer number order.

Maybe that's OK, because I can't see that we'd ever acquire any other
buffer content lock while already holding a lock on the visibility map
buffer.  But given this logic, if we did do such a thing, it could
result in an undetected deadlock.

Hmm

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Heikki Linnakangas

On 19.08.2011 23:02, Robert Haas wrote:

On Fri, Aug 19, 2011 at 2:51 PM, Gokulakannan Somasundaram
gokul...@gmail.com  wrote:

won't it make the 99
sessions wait for that visibility map while holding the exclusive lock on
the 99 heap pages?


Hmm, you have a point.  If 100 backends simultaneously write to 100
different pages, and all of those pages are all-visible, then it's
possible that they could end up fighting over the buffer content lock
on the visibility map page.  But why would you expect that to matter?
In a heavily updated table, the proportion of visibility map bits that
are set figures to be quite low, since they're only set during VACUUM.
  To have 100 backends simultaneously pick different pages to write
each of which is all-visible seems really unlucky.   Even if it does
happen from time to time, I suspect the effects would be largely
masked by WALInsertLock contention.  The visibility map content lock
is only taken very briefly, whereas the operations protected by
WALInsertLock are much more complex.


The above could already happen in 8.4, where the visibility map was 
introduced. The contention on the VM buffer would be just as bad whether 
you hold the heap page lock at the same time or not. I have not heard 
any complaints of contention on VM buffers.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Heikki Linnakangas

On 19.08.2011 23:17, Robert Haas wrote:

On Fri, Aug 19, 2011 at 4:02 PM, Robert Haasrobertmh...@gmail.com  wrote:

Hmm, you have a point.  If 100 backends simultaneously write to 100
different pages, and all of those pages are all-visible, then it's
possible that they could end up fighting over the buffer content lock
on the visibility map page.  But why would you expect that to matter?
In a heavily updated table, the proportion of visibility map bits that
are set figures to be quite low, since they're only set during VACUUM.
  To have 100 backends simultaneously pick different pages to write
each of which is all-visible seems really unlucky.   Even if it does
happen from time to time, I suspect the effects would be largely
masked by WALInsertLock contention.  The visibility map content lock
is only taken very briefly, whereas the operations protected by
WALInsertLock are much more complex.


Oh, snap.  I see another possible problem here.

At the time visibilitymap_clear() is called, we're already (and
necessarily) holding a content lock on the buffer.  And then we go get
a content lock on the visibility map page, whose buffer number might
be higher or lower than that of the heap page, possibly leading us to
violate the rule the buffer content locks must be taken increasing
buffer number order.


Huh? The rule is that you have to acquire locks on heap pages in 
increasing page number order. That doesn't apply to the order between 
the heap and the visibility map. The rule we've established for that is 
that you have to acquire the lock on the heap page first, before locking 
the corresponding vm page. It would be good to add a comment about that 
to the header comment of RelationGetBufferForTuple(), there doesn't seem 
to be anything about the visibility map buffer arguments there.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-18 Thread Robert Haas
On Tue, May 10, 2011 at 8:19 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
  Any thoughts welcome. ?Incidentally, if anyone else feels like working
  on this, feel free to let me know and I'm happy to step away, from all
  of it or from whatever part someone else wants to tackle. ?I'm mostly
  working on this because it's something that I think we really need to
  get done, more than having a burning desire to be the one who does it.
 
  Indexonly scans are welcome!
  I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

 Well, I have code for #1, and just need reviews, and #2 shouldn't be
 that hard, and with luck I'll twist Bruce's arm into doing it (*waves
 to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
 on what/how you'd like to contribute there?

 I am happy to have pg_upgrade skip upgrading visibility map files --- it
 already has code to conditionally process them because they only exist
 in = 8.4:

        /* fsm/vm files added in PG 8.4 */
        if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
        {
            /*
             * Copy/link any fsm and vm files, if they exist
             */

 Just give the word and it will be done.

I hereby give the word.  :-)

Specifically, we need to skip copying vm files (only) if coming from a
version prior to this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e16954f3d27fa8e16c379ff6623ae18d6250a39c

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-06-19 Thread Cédric Villemain
2011/5/14 Robert Haas robertmh...@gmail.com:
 On Fri, May 13, 2011 at 6:34 PM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 If reviewers agree it is safe and benchmarks make evidences that no
 basic performance  issue are raised, then let's see if next items have
 blockers or can be done.

 Sounds right to me.

and recent stuff on VM will allow us to move forward it seems !

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-06-19 Thread Cédric Villemain
2011/5/11 Bruce Momjian br...@momjian.us:
 Cédric Villemain wrote:
 2011/5/10 Kevin Grittner kevin.gritt...@wicourts.gov:
  Simon Riggs si...@2ndquadrant.com wrote:
  The typical speed up for non-covered indexes will come when we
  access a very large table (not in cache) via an index scan that is
  smaller than a bitmapindex scan. Will we be able to gauge
  selectivities sufficiently accurately to be able to pinpoint that
  during optimization? How will we know that the table is not in
  cache? Or is this an optimisation in the executor for a bitmapheap
  scan?
 
  I would continue to object to using current cache contents for plan
  choice because of plan instability and the fact that an odd initial
  cache load could skew plans in a bad direction indefinitely. ?I do
  agree (and have already posted) that I think the hardest part of
  this might be developing a good cost model. ?I doubt that's an
  insoluble problem, especially since it is something we can refine
  over time as we gain experience with the edge cases.

 you will have the same possible instability in planning with the
 index(-only?) scan because we may need to access heap anyway and this
 needs is based on estimation, or I miss something ? I understood the
 idea was just to bypass the heap access *if* we can for *this*
 heap-page.

 In reality, I am not really scared by plan instability because of a
 possible PG/OS cache estimation. The percentages remain stable in my
 observations ... I don't know yet how it will go for vis map.

 And, we already have plan instability currently, which is *good* : at
 some point a seq scan is better than an bitmap heap scan. Because the
 relation size change and because ANALYZE re-estimate the distribution
 of the data. I will be very happy to issue ANALYZE CACHE as I have to
 ANALYZE temp table for large query if it allows the planner to provide
 me the best plan in some scenario...but this is another topic, sorry
 for the digression..

 Good point --- we would be making plan decisions based on the visibility
 map coverage.  The big question is whether visibility map changes are
 more dynamic than the values we already plan against, like rows in the
 table, table size, and value distributions.  I don't know the answer.


Robert, I though of Covered-Index as just a usage of the vis map:
don't take the heap block if not needed. This look easier to do and
better in the long term (because read-only DB may quickly turn into a
no-heap access DB for example). Thus this is not real covered-index.
Did you want to implement real covered-index and did you have ideas on
how to do that ? Or just an optimization of the current
planner/executor on index usage ?
___

I don't know VM internals:

 * do we have a counter of ALL_VISIBLE flag set on a relation ? (this
should be very good for planner)
 * do we need a pg_class.rel_vmvisible ?! (I have hands up, don't
shoot pleeaase)
 * is it ok to parse VM for planning (I believe it is not) ?

Ideas ?

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-06-19 Thread Robert Haas
On Sun, Jun 19, 2011 at 10:44 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 and recent stuff on VM will allow us to move forward it seems !

Yep, looks promising.  The heap_hot_search_buffer refactoring patch is
related as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-06-19 Thread Robert Haas
On Sun, Jun 19, 2011 at 11:12 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 Good point --- we would be making plan decisions based on the visibility
 map coverage.  The big question is whether visibility map changes are
 more dynamic than the values we already plan against, like rows in the
 table, table size, and value distributions.  I don't know the answer.

 Robert, I though of Covered-Index as just a usage of the vis map:
 don't take the heap block if not needed. This look easier to do and
 better in the long term (because read-only DB may quickly turn into a
 no-heap access DB for example). Thus this is not real covered-index.
 Did you want to implement real covered-index and did you have ideas on
 how to do that ? Or just an optimization of the current
 planner/executor on index usage ?

If by a real covered index you mean one that includes visibility
info in the index - I have no plans to work on anything like that.  If
we were to do that, the index would become much larger and less
efficient, whereas the approach of just optimizing the way our
existing indexes are used doesn't have that disadvantage.  It also
sounds like a lot of work. Now, if someone else wants to demonstrate
that it has advantages that are worth the costs and go do it, more
power to said person, but I'm unexcited about it.

 I don't know VM internals:

  * do we have a counter of ALL_VISIBLE flag set on a relation ? (this
 should be very good for planner)
  * do we need a pg_class.rel_vmvisible ?! (I have hands up, don't
 shoot pleeaase)

Evidently I'm developing a more frightening reputation than I would hope.  :-(

Anyway, yes, I do believe we need a table-level statistic for the
percentage of the visibility map bits that are believed to be set.
Having said that I think we need it, let me also say that I'm a bit
skeptical about how well it will work.  There are two problems:

1. Consider a query like SELECT a, b FROM foo WHERE a = 1.  To
accurately estimate the cost of executing this query via an index-only
scan (on an index over foo (a, b)), we need to know (i) the percentage
of rows in the table for which a = 1 and (ii) the percentage *of those
rows* which are on an all-visible page.  We can assume that if 80% of
the rows in the table are on all-visible pages, then 80% of the rows
returned by this query will be on all-visible pages also, but that
might be wildly wrong.  This is similar to the problem of costing
SELECT * FROM foo WHERE a = 1 AND b = 1 - we know the fraction of
rows where a = 1 and the fraction where b = 1, but there's no
certainty that multiplying those values will produce an accurate
estimate for the conjunction of those conditions.  The problem here is
not as bad as the general multi-column statistics problem because a
mistake will only bollix the cost, not the row count estimate, but
it's still not very nice.

2. Since VACUUM and ANALYZE often run together, we will be estimating
the percentage of rows on all-visible pages just at the time when that
percentage is highest.  This is not exactly wonderful, either...

I have a fair amount of hope that even with these problems we can come
up with some adjustment to the planner that is better than just
ignoring the problem, but I am not sure how difficult it will be.

  * is it ok to parse VM for planning (I believe it is not) ?

It doesn't seem like a good idea to me, but I just work here.  I'm not
sure what that would buy us.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-06-19 Thread Cédric Villemain
2011/6/19 Robert Haas robertmh...@gmail.com:
 On Sun, Jun 19, 2011 at 11:12 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 Good point --- we would be making plan decisions based on the visibility
 map coverage.  The big question is whether visibility map changes are
 more dynamic than the values we already plan against, like rows in the
 table, table size, and value distributions.  I don't know the answer.

 Robert, I though of Covered-Index as just a usage of the vis map:
 don't take the heap block if not needed. This look easier to do and
 better in the long term (because read-only DB may quickly turn into a
 no-heap access DB for example). Thus this is not real covered-index.
 Did you want to implement real covered-index and did you have ideas on
 how to do that ? Or just an optimization of the current
 planner/executor on index usage ?

 If by a real covered index you mean one that includes visibility
 info in the index - I have no plans to work on anything like that.  If
 we were to do that, the index would become much larger and less
 efficient, whereas the approach of just optimizing the way our
 existing indexes are used doesn't have that disadvantage.  It also
 sounds like a lot of work. Now, if someone else wants to demonstrate
 that it has advantages that are worth the costs and go do it, more
 power to said person, but I'm unexcited about it.

Yes I was thinking of that, and agree with you.


 I don't know VM internals:

  * do we have a counter of ALL_VISIBLE flag set on a relation ? (this
 should be very good for planner)
  * do we need a pg_class.rel_vmvisible ?! (I have hands up, don't
 shoot pleeaase)

 Evidently I'm developing a more frightening reputation than I would hope.  :-(

Nah, I was joking :) don't worry !
Probably because I have already proposing 1 new GUC and at least one
new column to pg_class recently. (and we're not used to change that
frequently)


 Anyway, yes, I do believe we need a table-level statistic for the
 percentage of the visibility map bits that are believed to be set.
 Having said that I think we need it, let me also say that I'm a bit
 skeptical about how well it will work.  There are two problems:

 1. Consider a query like SELECT a, b FROM foo WHERE a = 1.  To
 accurately estimate the cost of executing this query via an index-only
 scan (on an index over foo (a, b)), we need to know (i) the percentage
 of rows in the table for which a = 1 and (ii) the percentage *of those
 rows* which are on an all-visible page.  We can assume that if 80% of
 the rows in the table are on all-visible pages, then 80% of the rows
 returned by this query will be on all-visible pages also, but that
 might be wildly wrong.  This is similar to the problem of costing
 SELECT * FROM foo WHERE a = 1 AND b = 1 - we know the fraction of
 rows where a = 1 and the fraction where b = 1, but there's no
 certainty that multiplying those values will produce an accurate
 estimate for the conjunction of those conditions.  The problem here is
 not as bad as the general multi-column statistics problem because a
 mistake will only bollix the cost, not the row count estimate, but
 it's still not very nice.

 2. Since VACUUM and ANALYZE often run together, we will be estimating
 the percentage of rows on all-visible pages just at the time when that
 percentage is highest.  This is not exactly wonderful, either...

 I have a fair amount of hope that even with these problems we can come
 up with some adjustment to the planner that is better than just
 ignoring the problem, but I am not sure how difficult it will be.

  * is it ok to parse VM for planning (I believe it is not) ?

 It doesn't seem like a good idea to me, but I just work here.  I'm not
 sure what that would buy us.

All true, and I won't be unhappy to have the feature as a bonus, not
expected by the planner(for the cost part) but handled by the
executor.

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-06-19 Thread Florian Pflug
On Jun19, 2011, at 20:40 , Robert Haas wrote:
 2. Since VACUUM and ANALYZE often run together, we will be estimating
 the percentage of rows on all-visible pages just at the time when that
 percentage is highest.  This is not exactly wonderful, either...

Hm, doesn't autovacuum run ANALYZE quite a bit more frequently than
VACUUM by default?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-06-19 Thread Robert Haas
On Sun, Jun 19, 2011 at 5:10 PM, Florian Pflug f...@phlo.org wrote:
 On Jun19, 2011, at 20:40 , Robert Haas wrote:
 2. Since VACUUM and ANALYZE often run together, we will be estimating
 the percentage of rows on all-visible pages just at the time when that
 percentage is highest.  This is not exactly wonderful, either...

 Hm, doesn't autovacuum run ANALYZE quite a bit more frequently than
 VACUUM by default?

The autoanalyze threshold is, by default, 10%; and the autovacuum
threshold, 20%.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-06-19 Thread Florian Pflug
On Jun19, 2011, at 23:16 , Robert Haas wrote:
 On Sun, Jun 19, 2011 at 5:10 PM, Florian Pflug f...@phlo.org wrote:
 On Jun19, 2011, at 20:40 , Robert Haas wrote:
 2. Since VACUUM and ANALYZE often run together, we will be estimating
 the percentage of rows on all-visible pages just at the time when that
 percentage is highest.  This is not exactly wonderful, either...
 
 Hm, doesn't autovacuum run ANALYZE quite a bit more frequently than
 VACUUM by default?
 
 The autoanalyze threshold is, by default, 10%; and the autovacuum
 threshold, 20%.

Hm, so you could ignore (or rather dampen) the results of 
VACUUM+ANALYZE and rely on the ANALYZE-only runs to keep
the estimate correct. Still doesn't sound that bad...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-06-19 Thread Robert Haas
On Sun, Jun 19, 2011 at 7:59 PM, Florian Pflug f...@phlo.org wrote:
 On Jun19, 2011, at 23:16 , Robert Haas wrote:
 On Sun, Jun 19, 2011 at 5:10 PM, Florian Pflug f...@phlo.org wrote:
 On Jun19, 2011, at 20:40 , Robert Haas wrote:
 2. Since VACUUM and ANALYZE often run together, we will be estimating
 the percentage of rows on all-visible pages just at the time when that
 percentage is highest.  This is not exactly wonderful, either...

 Hm, doesn't autovacuum run ANALYZE quite a bit more frequently than
 VACUUM by default?

 The autoanalyze threshold is, by default, 10%; and the autovacuum
 threshold, 20%.

 Hm, so you could ignore (or rather dampen) the results of
 VACUUM+ANALYZE and rely on the ANALYZE-only runs to keep
 the estimate correct. Still doesn't sound that bad...

Yeah, there are a lots of possible approaches.  You could try to keep
a count of how many visibility map bits had been cleared since the
last run...  and either adjust the estimate directly or use it to
trigger an ANALYZE (or some limited ANALYZE that only looks at
visibility map bits).  You could gather statistics on how often the
queries that are actually running are finding the relevant visibility
map bits set, and use that to plan future queries.  You could do what
you're suggesting... and there are probably other options as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-13 Thread Cédric Villemain
2011/5/11 Robert Haas robertmh...@gmail.com:
 On Wed, May 11, 2011 at 3:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Completely agree, but why are you saying that to me?

 When Tom asks me why I suggest something, nobody tells him its a free
 software project etc

 What is the difference here?

 We're now 40 emails in this thread, and there seems to be far more
 heat than light here.  Here's an attempt at a summary:

 - Simon wants proof that the performance benefit of this feature is
 worth any downsides it may have, which is standard procedure, and
 isn't certain the feature will have a significant performance benefit.
 - Numerous other people think Simon's doubts about the feature are
 poorly justified (and some of them also think he's being a pain in the
 neck).
 - Various peripherally related topics, such as optimizing count(*),
 which is not part of the vision for the first go-round that I sketched
 out in my OP, and plan stability, which is another issue entirely,
 have been discussed.
 - Meanwhile, only one person has done any review of the actual code
 that's been written, which is posted on the crash-safe visibility map
 thread, which may be why multiple people seem confused about what it
 does.
 - And no one has done any benchmarking of that code.

Will you be able to do some ? or can you propose a simple process to
do efficient benchmark of the patch ?

If reviewers agree it is safe and benchmarks make evidences that no
basic performance  issue are raised, then let's see if next items have
blockers or can be done.


 I think it would be really helpful if some more people would review
 the crash-safe visibility map patch, and if at least one person could
 benchmark it.  It would be useful to know (a) whether that noticeably
 slows down the system during inserts, updates, and deletes, especially
 at very high concurrency; and (b) how much of an impact the additional
 WAL-logging has on VACUUM.  On the other hand, arguing about whether
 index-only scans are going to result in a significant performance
 benefit is not useful.  I am going to be both surprised and
 disappointed if they don't, but there's only one way to find out, and
 a theoretical argument isn't it.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-13 Thread Robert Haas
On Fri, May 13, 2011 at 6:34 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 Will you be able to do some ? or can you propose a simple process to
 do efficient benchmark of the patch ?

I will probably do some benchmarking at some point, unless someone
else goes nuts and makes it moot before I get to that point.  I think
the main thing is to just apply the patch and beat up the server, and
see if it's any slower than it was before.

 If reviewers agree it is safe and benchmarks make evidences that no
 basic performance  issue are raised, then let's see if next items have
 blockers or can be done.

Sounds right to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Simon Riggs
On Wed, May 11, 2011 at 12:54 AM, Greg Stark gsst...@mit.edu wrote:

 On a separate note though, Simon, I don't know what you mean by we
 normally start with a problem. It's an free software project and
 people are free to work on whatever interests them whether that's
 because it solves a problem they have, helps a client who's paying
 them, or just because it's of academic interest to them. We don't
 always take their patches if they aren't of general interest but
 people propose all kinds of crazy experimental ideas all the time.

Completely agree, but why are you saying that to me?

When Tom asks me why I suggest something, nobody tells him its a free
software project etc

What is the difference here?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Simon Riggs
On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote:
 Greg Stark wrote:
 On a separate note though, Simon, I don't know what you mean by we
 normally start with a problem. It's an free software project and
 people are free to work on whatever interests them whether that's
 because it solves a problem they have, helps a client who's paying
 them, or just because it's of academic interest to them. We don't
 always take their patches if they aren't of general interest but
 people propose all kinds of crazy experimental ideas all the time.

 I am confused by Simon's questions too.

 Simon seems to regularly argue for adding features late in the
 development cycle and backpatch things no one else thinks should be
 backpatched, but he wants more research that index-only scans are going
 to improve things before it is implemented?   The first is aggressive
 development, the second is very conservative development --- they don't
 match, so I now wonder what the motivation is since it isn't consistent.

Not really sure why reasonable technical skepticism should become
personal commentary.

You don't question Tom's motives if he is skeptical of an idea of
mine. Why would you question my motivation? What is *your* motive for
acting like that?

I'm not driven by one setting of conservatism, but I am interested
in adding fully usable features that bring credit to the project. If I
see a feature that can have minor things added to it to improve them,
then I raise that during beta. If I see things being worked out that
sounds dubious, I mention that in early development.

I don't think this work will materially improve the speed of count(*)
in majority of cases. This introduces extra overhead into the code
path and that can be a net loss. The only time it will help is when
you have a large table that is not cached and also not recently
updated. Is count(*) run very often against such tables? Do we really
care enough to optimise that use case with lots of special purpose
code? The very fact that Kevin and yourself bring up different reasons
for why we need this feature makes me nervous.

The analysis has not been done yet, and all I have done is request that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Simon Riggs
On Wed, May 11, 2011 at 2:34 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 So, what do we need in order to find our way to index-only scans?

 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.  However, before we can rely on the

 FYI, because the visibility map is only one _bit_ per page, it is 8000 *
 8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of
 heap pages.  This is important because we rely on this compactness in
 hope that the WAL logging of this information will not be burdensome.

We would need to issue one WAL record per bit, not per page.

I'm concerned about the path length added by VM visits and the
potential contention that concentration of information will bring.

Those aren't things to be dismissed without calculation and analysis.
There might not be an issue there, but its worth checking.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Heikki Linnakangas

On 10.05.2011 20:15, Simon Riggs wrote:

On Tue, May 10, 2011 at 5:17 PM, Kevin Grittner
kevin.gritt...@wicourts.gov  wrote:

Simon Riggssi...@2ndquadrant.com  wrote:


This topic has been discussed many times, yet I have never seen an
assessment that explains WHY we would want to do index-only scans.


In databases with this feature, it's not too unusual for a query
which uses just an index to run one or more orders of magnitude
faster than a query which has to randomly access the heap for each
index entry.  That seems like enough evidence of its possible value
in PostgreSQL to proceed to the point where benchmarks become
possible.  I'm assuming that, like all other features added as
performance optimizations, it won't be committed until there are
benchmarks showing the net benefit.

As a thought experiment, picture the relative costs of scanning a
portion of an index in index sequence, and being done, versus
scanning a portion of an index in index sequence and jumping to a
random heap access for each index entry as you go.


I can picture that. Regrettably, I can also picture the accesses to
the visibility map, the maintenance operations on the VM that are
needed for this and the contention that both of those will cause.


Note that we already have the visibility map, and the accesses needed to 
update it are already there. Granted, we'll have to change the logic 
slightly to make it crash safe, but I don't expect that to add any 
meaningful overhead - the changes are going to be where the bits are 
set, ie. vacuum, not when the bits are cleared. Granted, we might also 
want to set the bits more aggressively once they're used by 
index-only-scans. But done correctly, just taking advantage of the VM 
that's already there shouldn't add overhead to other operations.


I agree that we need to do tests to demonstrate that there's a gain from 
the patch, once we have a patch to test. I would be very surprised if 
there isn't, but that just means the testing is going to be easy.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Nicolas Barbier
2011/5/11, Bruce Momjian br...@momjian.us:

 FYI, because the visibility map is only one _bit_ per page, it is 8000 *
 8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of
 heap pages.

Actually, that would be one 8kB block covers 512MB of heap: 1 block
of visibility map (8kB) = 64k visibility bits = covers 64k blocks =
covers 512MB of heap. The cost of keeping the visibility map in cache
is therefore totally negligible, only the cost of WAL logging changes
to it is of interest.

 This is important because we rely on this compactness in hope that
 the WAL logging of this information will not be burdensome.

The size of on entry in the map (1 bit) is not very related to the WAL
overhead required per change of such a bit (i.e., the log record for a
1 bit change will certainly be way more than 1 bit).

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Cédric Villemain
2011/5/11 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 10.05.2011 20:15, Simon Riggs wrote:

 On Tue, May 10, 2011 at 5:17 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov  wrote:

 Simon Riggssi...@2ndquadrant.com  wrote:

 This topic has been discussed many times, yet I have never seen an
 assessment that explains WHY we would want to do index-only scans.

 In databases with this feature, it's not too unusual for a query
 which uses just an index to run one or more orders of magnitude
 faster than a query which has to randomly access the heap for each
 index entry.  That seems like enough evidence of its possible value
 in PostgreSQL to proceed to the point where benchmarks become
 possible.  I'm assuming that, like all other features added as
 performance optimizations, it won't be committed until there are
 benchmarks showing the net benefit.

 As a thought experiment, picture the relative costs of scanning a
 portion of an index in index sequence, and being done, versus
 scanning a portion of an index in index sequence and jumping to a
 random heap access for each index entry as you go.

 I can picture that. Regrettably, I can also picture the accesses to
 the visibility map, the maintenance operations on the VM that are
 needed for this and the contention that both of those will cause.

 Note that we already have the visibility map, and the accesses needed to
 update it are already there. Granted, we'll have to change the logic
 slightly to make it crash safe, but I don't expect that to add any
 meaningful overhead - the changes are going to be where the bits are set,
 ie. vacuum, not when the bits are cleared. Granted, we might also want to
 set the bits more aggressively once they're used by index-only-scans. But
 done correctly, just taking advantage of the VM that's already there
 shouldn't add overhead to other operations.

We won't be able to do index-only scan.
We can do index scan with probability to not access heap,
maybe(hopefully) completely in some cases.

IF vis map is ok to remove the need to access heap (perf and safe),
then, for the cost part:
Currently, cost_index materialize the cost to access each heap page by
a random_page_cost. I believe we should be able to change that to
remove the estimated number of heap page we don't need to access (can
be 100% or 0.1%).

And as suggested Simon, there is also maybe a path to improve the
bitmapheap scan. bitmapheap scan have already some workaround to be
sure indexscan looks cheaper in some case, just keep that and apply
same logic than for cost_index.

This is keeping the same rule PostgreSQL has : let the planner decide
the best solution instead of allowing special index declaration (it
hasn't been propose yet I think, but, well, just in case it pops into
the mind of someone)


 I agree that we need to do tests to demonstrate that there's a gain from the
 patch, once we have a patch to test. I would be very surprised if there
 isn't, but that just means the testing is going to be easy.

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Cédric Villemain
2011/5/10 Kevin Grittner kevin.gritt...@wicourts.gov:
 Simon Riggs si...@2ndquadrant.com wrote:
 The typical speed up for non-covered indexes will come when we
 access a very large table (not in cache) via an index scan that is
 smaller than a bitmapindex scan. Will we be able to gauge
 selectivities sufficiently accurately to be able to pinpoint that
 during optimization? How will we know that the table is not in
 cache? Or is this an optimisation in the executor for a bitmapheap
 scan?

 I would continue to object to using current cache contents for plan
 choice because of plan instability and the fact that an odd initial
 cache load could skew plans in a bad direction indefinitely.  I do
 agree (and have already posted) that I think the hardest part of
 this might be developing a good cost model.  I doubt that's an
 insoluble problem, especially since it is something we can refine
 over time as we gain experience with the edge cases.

you will have the same possible instability in planning with the
index(-only?) scan because we may need to access heap anyway and this
needs is based on estimation, or I miss something ? I understood the
idea was just to bypass the heap access *if* we can for *this*
heap-page.

In reality, I am not really scared by plan instability because of a
possible PG/OS cache estimation. The percentages remain stable in my
observations ... I don't know yet how it will go for vis map.

And, we already have plan instability currently, which is *good* : at
some point a seq scan is better than an bitmap heap scan. Because the
relation size change and because ANALYZE re-estimate the distribution
of the data. I will be very happy to issue ANALYZE CACHE as I have to
ANALYZE temp table for large query if it allows the planner to provide
me the best plan in some scenario...but this is another topic, sorry
for the digression..

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Robert Haas
On Tue, May 10, 2011 at 9:34 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 So, what do we need in order to find our way to index-only scans?

 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.  However, before we can rely on the

 FYI, because the visibility map is only one _bit_ per page, it is 8000 *
 8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of
 heap pages.  This is important because we rely on this compactness in
 hope that the WAL logging of this information will not be burdensome.

I accuse you of bad math.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Robert Haas
On Tue, May 10, 2011 at 10:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That will be true only if you intentionally ignore the points Greg
 raised.  If the table isn't entirely ALL_VISIBLE, then the choice of
 index will determine the ordering of the actual table probes that occur.
 There could be more or fewer page reads, in a more or less optimal
 order, depending on the index used.

However, note that this wasn't one of the cases I said I was going to
try to optimize in the first go-around anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Robert Haas
On Wed, May 11, 2011 at 3:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Completely agree, but why are you saying that to me?

 When Tom asks me why I suggest something, nobody tells him its a free
 software project etc

 What is the difference here?

We're now 40 emails in this thread, and there seems to be far more
heat than light here.  Here's an attempt at a summary:

- Simon wants proof that the performance benefit of this feature is
worth any downsides it may have, which is standard procedure, and
isn't certain the feature will have a significant performance benefit.
- Numerous other people think Simon's doubts about the feature are
poorly justified (and some of them also think he's being a pain in the
neck).
- Various peripherally related topics, such as optimizing count(*),
which is not part of the vision for the first go-round that I sketched
out in my OP, and plan stability, which is another issue entirely,
have been discussed.
- Meanwhile, only one person has done any review of the actual code
that's been written, which is posted on the crash-safe visibility map
thread, which may be why multiple people seem confused about what it
does.
- And no one has done any benchmarking of that code.

I think it would be really helpful if some more people would review
the crash-safe visibility map patch, and if at least one person could
benchmark it.  It would be useful to know (a) whether that noticeably
slows down the system during inserts, updates, and deletes, especially
at very high concurrency; and (b) how much of an impact the additional
WAL-logging has on VACUUM.  On the other hand, arguing about whether
index-only scans are going to result in a significant performance
benefit is not useful.  I am going to be both surprised and
disappointed if they don't, but there's only one way to find out, and
a theoretical argument isn't it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Bruce Momjian
Nicolas Barbier wrote:
 2011/5/11, Bruce Momjian br...@momjian.us:
 
  FYI, because the visibility map is only one _bit_ per page, it is 8000 *
  8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of
  heap pages.
 
 Actually, that would be one 8kB block covers 512MB of heap: 1 block
 of visibility map (8kB) = 64k visibility bits = covers 64k blocks =
 covers 512MB of heap. The cost of keeping the visibility map in cache
 is therefore totally negligible, only the cost of WAL logging changes
 to it is of interest.

Ah, yes, thanks, even better.

  This is important because we rely on this compactness in hope that
  the WAL logging of this information will not be burdensome.
 
 The size of on entry in the map (1 bit) is not very related to the WAL
 overhead required per change of such a bit (i.e., the log record for a
 1 bit change will certainly be way more than 1 bit).

True.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Bruce Momjian
Simon Riggs wrote:
 On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote:
  Greg Stark wrote:
  On a separate note though, Simon, I don't know what you mean by we
  normally start with a problem. It's an free software project and
  people are free to work on whatever interests them whether that's
  because it solves a problem they have, helps a client who's paying
  them, or just because it's of academic interest to them. We don't
  always take their patches if they aren't of general interest but
  people propose all kinds of crazy experimental ideas all the time.
 
  I am confused by Simon's questions too.
 
  Simon seems to regularly argue for adding features late in the
  development cycle and backpatch things no one else thinks should be
  backpatched, but he wants more research that index-only scans are going
  to improve things before it is implemented? ? The first is aggressive
  development, the second is very conservative development --- they don't
  match, so I now wonder what the motivation is since it isn't consistent.
 
 Not really sure why reasonable technical skepticism should become
 personal commentary.
 
 You don't question Tom's motives if he is skeptical of an idea of
 mine. Why would you question my motivation? What is *your* motive for
 acting like that?

Tom is consistent in his level of aggressive/conservative development
suggestions.  What I am seeing are many cases where you are consistently
pushing for something even though you get almost-overwhelming rejection,
and you keep going.  And if it was consistent in one direction, I could
understand because maybe you feel we are too conservative, but if it
isn't consistent, I have no idea how to learn or adjust to your
approach.  We clearly have some people on one side of the
conservative/agressive specturm, and some on the other side.

Now, I am willing to admit I might be totally wrong, but it has risen to
a level that I felt I should say something in case it is helpful.
 
 I'm not driven by one setting of conservatism, but I am interested
 in adding fully usable features that bring credit to the project. If I
 see a feature that can have minor things added to it to improve them,
 then I raise that during beta. If I see things being worked out that
 sounds dubious, I mention that in early development.

Yes, that seems fine to me, as stated.

 I don't think this work will materially improve the speed of count(*)
 in majority of cases. This introduces extra overhead into the code

I think this is the only hope we have of improving count(*) in an active
MVCC system.  It might not work, but it has been our only hope of
improvement of count(*) for a while.

 path and that can be a net loss. The only time it will help is when
 you have a large table that is not cached and also not recently
 updated. Is count(*) run very often against such tables? Do we really
 care enough to optimise that use case with lots of special purpose
 code? The very fact that Kevin and yourself bring up different reasons
 for why we need this feature makes me nervous.

Yes, no question.  For count(*), you don't care about the indexed
values, only the count, while for Kevin's case you are reading values
from the index.  I assume (or hope) that one or both will be a win for
this feature.

 The analysis has not been done yet, and all I have done is request that.

I think we are going to have to write the code and see the performance
hit and where it is a win.  Ideally we could figure this out
before-hand, but I don't think that is possible in this case.  If you
look at the research in reducing the load of updating the hint bits,
again, it is so complex that only working code and testing is showing if
there is possible improvement there.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Bruce Momjian
C?dric Villemain wrote:
 2011/5/10 Kevin Grittner kevin.gritt...@wicourts.gov:
  Simon Riggs si...@2ndquadrant.com wrote:
  The typical speed up for non-covered indexes will come when we
  access a very large table (not in cache) via an index scan that is
  smaller than a bitmapindex scan. Will we be able to gauge
  selectivities sufficiently accurately to be able to pinpoint that
  during optimization? How will we know that the table is not in
  cache? Or is this an optimisation in the executor for a bitmapheap
  scan?
 
  I would continue to object to using current cache contents for plan
  choice because of plan instability and the fact that an odd initial
  cache load could skew plans in a bad direction indefinitely. ?I do
  agree (and have already posted) that I think the hardest part of
  this might be developing a good cost model. ?I doubt that's an
  insoluble problem, especially since it is something we can refine
  over time as we gain experience with the edge cases.
 
 you will have the same possible instability in planning with the
 index(-only?) scan because we may need to access heap anyway and this
 needs is based on estimation, or I miss something ? I understood the
 idea was just to bypass the heap access *if* we can for *this*
 heap-page.
 
 In reality, I am not really scared by plan instability because of a
 possible PG/OS cache estimation. The percentages remain stable in my
 observations ... I don't know yet how it will go for vis map.
 
 And, we already have plan instability currently, which is *good* : at
 some point a seq scan is better than an bitmap heap scan. Because the
 relation size change and because ANALYZE re-estimate the distribution
 of the data. I will be very happy to issue ANALYZE CACHE as I have to
 ANALYZE temp table for large query if it allows the planner to provide
 me the best plan in some scenario...but this is another topic, sorry
 for the digression..

Good point --- we would be making plan decisions based on the visibility
map coverage.  The big question is whether visibility map changes are
more dynamic than the values we already plan against, like rows in the
table, table size, and value distributions.  I don't know the answer.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Kevin Grittner
Bruce Momjian br...@momjian.us wrote:
 
 The very fact that Kevin and yourself bring up different reasons
 for why we need this feature makes me nervous.
 
 Yes, no question.  For count(*), you don't care about the indexed
 values, only the count, while for Kevin's case you are reading
 values from the index.
 
[sigh]  I'm reluctant to draw out this digression further, but there
is a possibly-useful point to be made here: these are not two
different things.  A covering index can be considered whenever the
set of columns referenced in the query is contained inside the set
of columns in the index.  The fact that the set of columns needed by
count(*) is the empty set merely means that it is covered by any
index, since the empty set is contained in every set.
 
Now, this special case may make for an easy initial target in
implementation, or allow early benchmarking.  If so, all the better
to go there first.  I'm not sure why anyone would stop there,
though; if it pays off for that simple case it is likely to pay off
for the more general case, too.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 10.05.2011 20:15, Simon Riggs wrote:
 I can picture that. Regrettably, I can also picture the accesses to
 the visibility map, the maintenance operations on the VM that are
 needed for this and the contention that both of those will cause.

 I agree that we need to do tests to demonstrate that there's a gain from 
 the patch, once we have a patch to test. I would be very surprised if 
 there isn't, but that just means the testing is going to be easy.

I think Simon's point is that showing a gain on specific test cases
isn't a sufficient argument.  What we need to know about this sort of
change is what is the distributed overhead that is going to be paid by
*everybody*, whether their queries benefit from the optimization or not.
And what fraction of real-world queries really do benefit, and to what
extent.  Isolated test cases (undoubtedly chosen to show off the
optimization) are not adequate to form a picture of the overall cost and
benefit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Bruce Momjian
Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  On 10.05.2011 20:15, Simon Riggs wrote:
  I can picture that. Regrettably, I can also picture the accesses to
  the visibility map, the maintenance operations on the VM that are
  needed for this and the contention that both of those will cause.
 
  I agree that we need to do tests to demonstrate that there's a gain from 
  the patch, once we have a patch to test. I would be very surprised if 
  there isn't, but that just means the testing is going to be easy.
 
 I think Simon's point is that showing a gain on specific test cases
 isn't a sufficient argument.  What we need to know about this sort of
 change is what is the distributed overhead that is going to be paid by
 *everybody*, whether their queries benefit from the optimization or not.
 And what fraction of real-world queries really do benefit, and to what
 extent.  Isolated test cases (undoubtedly chosen to show off the
 optimization) are not adequate to form a picture of the overall cost and
 benefit.

Yes, I assume we are going to need the same kind of tests we did for
other invasive patches like serializable isolation level and hot
standby.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-11 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 I think Simon's point is that showing a gain on specific test
 cases isn't a sufficient argument.
 
Ah, if that's what he's been trying to get at, I'm curious who
disagrees with that.  I wouldn't have thought anyone on this list
would.
 
 What we need to know about this sort of change is what is the
 distributed overhead that is going to be paid by *everybody*,
 whether their queries benefit from the optimization or not.
 
Certainly we need to test whether Heikki is right in the previously
non-quoted part of his post on this thread:
 
 Note that we already have the visibility map, and the accesses
 needed to update it are already there. Granted, we'll have to
 change the logic slightly to make it crash safe, but I don't
 expect that to add any meaningful overhead - the changes are
 going to be where the bits are set, ie. vacuum, not when the bits
 are cleared. Granted, we might also want to set the bits more
 aggressively once they're used by index-only-scans. But done
 correctly, just taking advantage of the VM that's already there
 shouldn't add overhead to other operations.
 
 Isolated test cases (undoubtedly chosen to show off the
 optimization) are not adequate to form a picture of the overall
 cost and benefit.
 
Well, first, that hardly seems fair.  I have many times seen people
make an effort to synthesize *worst* case benchmarks.  Certainly any
regular on this list would know it is pointless to show only a best
case benchmark.
 
Second, we really need to make development of a performance testing
farm a priority at PGCon next week.  The need for it just keeps
coming up over and over.
 
Third, Dan Ports has been working a great deal with DBT-2 running
PostgreSQL for the SSI patch, both as a stress tool to flush out
bugs and to get benchmarks numbers conforming to the published
requirements of that benchmark.  I know from off-list emails that it
took a fair amount of work to get it running correctly with
PostgreSQL in his environment.  We should probably try to draw on
that experience.  (Of course that shouldn't be the *only* test in a
performance testing farm, but it's a good one to include.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Robert Haas
On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure mmonc...@gmail.com wrote:
 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.

 hm, what are the implications for tuple hint bits, short and long
 term?  I'm particularly interested if you think any hint bit i/o
 mitigation strategies are worth pursuing.

Well, I don't really want to let this thread on my project get
hijacked to talk about your project (not that I haven't been guilty of
that myself!) but, in brief, I think the main effect of index-only
scans is that the performance difference between a vacuumed table and
an unvacuumed table is going to increase.  It's already the case that
sequential scanning a table which has been vacuumed (and, therefore,
all the pages are marked all-visible) is noticeably faster than
sequential scanning a table which is not vacuumed (even if all the
hint bits are set).  Index-only scans are going to extend that by
making index scans run faster on a table with lots of all-visible
tables than on one where no pages are all-visible.  So the importance
of vacuuming an insert-only table occasionally (which autovacuum won't
do, at present, until it's needed to prevent XID wraparound) is
already more than zero, and it's going to go up.  But the all-visible
bits aren't quite the same as hint bits: I don't think there's any
impact on hint bits per se.

 2. Crash safe visibility map vs. pg_upgrade.  Even if we make the
 visibility map crash-safe in 9.2, people are going to want to use
 pg_upgrade to migrate from older versions, bringing their
 possibly-not-quite-correct visibility map forks along with them.  How
 should we handle that?  We could (2A) arrange to have pg_upgrade nuke
 all visibility forks when upgrading from a release where the
 visibility map is not crash-safe to one where it is;

 +1 on 2A.

OK.  Anybody else?

 3. Statistics.  I believe that in order to accurately estimate the
 cost of an index-only scan, we're going to need to know the fraction
 of tuples that are on pages whose visibility map bits are set.

 It would be helpful to know the performance benefit of index only
 scans before knowing how much benefit to attribute here.  Maybe a
 system wide kludge would for starters anyway, like assuming 60% of
 pages can be vis checked from the VM, or a single GUC, Then again,
 maybe not.

Yeah, maybe I should try to beat the main patch into some kind of
shape before working too much on the statistics stuff.  Then we could
actually benchmark it a bit, which would be good.  I don't think that
a system-wide kludge or GUC is going to work for prime time, but it's
probably fine for initial performance testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Merlin Moncure
On Tue, May 10, 2011 at 8:22 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure mmonc...@gmail.com wrote:
 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.

 hm, what are the implications for tuple hint bits, short and long
 term?  I'm particularly interested if you think any hint bit i/o
 mitigation strategies are worth pursuing.

 Well, I don't really want to let this thread on my project get
 hijacked to talk about your project (not that I haven't been guilty of
 that myself!)

no, that wasn't my intent at all, except in the sense of wondering if
a crash-safe visibility map provides a route of displacing a lot of
hint bit i/o and by extension, making alternative approaches of doing
that, including mine, a lot less useful.  that's a good thing.

meaning: since the vis map approach is going to be a fairly large win
over the classic approach to checking visibility in so many scenarios,
maybe the real long term goal should be just being as aggressive as
possible in terms of making sure it's set properly, and just give up
and be a bit more brute forcey when it's not set.  it's a fair
question.  that's a pretty broad statement, but that's what I'm
thinking about.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Cédric Villemain
2011/5/10 Robert Haas robertmh...@gmail.com:
 So, what do we need in order to find our way to index-only scans?

 3. Statistics.  I believe that in order to accurately estimate the
 cost of an index-only scan, we're going to need to know the fraction
 of tuples that are on pages whose visibility map bits are set.  I
 believe it should be fairly straightforward to have ANALYZE collect
 this information; and I'm inclined to do that as a separate patch.  It
 seems like it would also be nice to know what fraction of tuples are
 on pages that don't have the visibility map set but where, in fact,
 all tuples on the page are visible to all transactions, so it would be
 legal to set the bit.  A large discrepancy between these two
 percentages might be a good reason to auto-vacuum the table (perhaps
 using a really lazy vacuum[2]).  I'm not sure if this can be added
 cheaply, though, and in any case, any change to the set of criteria
 that will trigger an auto-vacuum is probably a can of worms.
 Thoughts?

ANALYZE can do the stats job for 'free' on the pages it collects
anyway. So that looks like a good idea.
I believe the really lazy vacuum is another topic; even if it will
improve the performance of the index only scan to have tables already
vacuuumed, the stats should expose that and the function
cost_index(_only?)() taking care of that.


 4. Planner and executor changes.  In contrast to Heikki's original
 implementation, I'm inclined to not to try to split the Index Scan
 node into index scan and heap fetch.  Since there are many choices for
 where to put the heap fetch node (any level of the join tree between
 the index scan and the root), this seems likely to result in a
 combinatorial explosion of paths[3], and I'm not real sure that the
 payback will be adequate.  Furthermore, the idea of allowing user code
 to see tuples that will only later be determined not to have been
 visible to that MVCC snapshot in the first place seems pretty scary
 from a security perspective, though certainly there are possible
 benefits[4].  Instead, I'm inclined to just have the planner evaluate
 whether the necessary columns can be extracted purely from the index.

The temptation is high to estimate the cost of an index_scan(only) +
ordered(by ctid) table pages fetch if heap required. (this is what I
understood from heikki suggestion 3-4. and it makes sense). It may be
easier to implement both at once but I didn't find the branch in the
Heikki's git repos. (probably removed since the long time)

 If not, we proceed as now.  If so, we can use the index only
 approach of using the visibility map to decide which heap fetches can
 be skipped.  It's not clear to me whether we need to compare the cost
 of the standard approach with the cost of the index only approach:
 in theory, if there aren't any visibility map bits anyway, the index
 only approach could be slower.  But I'm not sure whether that problem
 is significant or common enough to be worth expending a lot of code
 on.  Either way, the number of actual paths doesn't need to increase,
 because in this design, even if we apply a costing model, one approach
 will dominate the other.  Heikki also suggested considering index
 scans in cases where we don't now[4, again] but I'm inclined to leave
 this, too, for a later optimization, again because balancing the
 increase in paths against the possible performance benefits of using
 indexes in more situations seems finicky.  In short, for a first cut
 at this, I just want to look at this as a way to get cheaper index
 scans, and leave everything else to future work.

Based on ANALYZE stats for the visibility, I believe cost_index and
cost_index_only should be very similar functions (well, atm, I don't
see the point to split it in 2 functions).


 Any thoughts welcome.  Incidentally, if anyone else feels like working
 on this, feel free to let me know and I'm happy to step away, from all
 of it or from whatever part someone else wants to tackle.  I'm mostly
 working on this because it's something that I think we really need to
 get done, more than having a burning desire to be the one who does it.

Indexonly scans are welcome!
I believe I can help on 3 and 4, but (really) not sure for 1 and 2.


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 [1] http://archives.postgresql.org/pgsql-hackers/2011-05/msg00292.php
 [2] http://archives.postgresql.org/pgsql-hackers/2011-03/msg00946.php
 [3] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01379.php
 [4] http://archives.postgresql.org/pgsql-hackers/2009-07/msg00675.php

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 

Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 ANALYZE can do the stats job for 'free' on the pages it collects
 anyway. So that looks like a good idea.
 I believe the really lazy vacuum is another topic; even if it will
 improve the performance of the index only scan to have tables already
 vacuuumed, the stats should expose that and the function
 cost_index(_only?)() taking care of that.

I basically agree.  The connection is that - as we use the all-visible
for more things, the performance penalty for failing to vacuum (say)
an insert-only table will continue to grow.  Still, as you say,
clearly a separate topic.

 The temptation is high to estimate the cost of an index_scan(only) +
 ordered(by ctid) table pages fetch if heap required. (this is what I
 understood from heikki suggestion 3-4. and it makes sense). It may be
 easier to implement both at once but I didn't find the branch in the
 Heikki's git repos. (probably removed since the long time)

I was thinking about this as well, at least if I understand you
correctly.  That would be similar to a bitmap index scan, and I think
it would be a great thing to have, not only because it would allow us
to get the advantages of index-only scans in situations that are
well-suited to our current bitmap scans, but also because it could be
batched.  You could allocate a buffer of work_mem bytes and fill it up
with TIDs; then, when it's full, you sort the buffer and start doing
the necessary heap fetches in physical order.  If you still need more
rows, you can clear the buffer and go around for another pass.

 Based on ANALYZE stats for the visibility, I believe cost_index and
 cost_index_only should be very similar functions (well, atm, I don't
 see the point to split it in 2 functions).

Yeah, I would more imagine modifying the existing function.

 Any thoughts welcome.  Incidentally, if anyone else feels like working
 on this, feel free to let me know and I'm happy to step away, from all
 of it or from whatever part someone else wants to tackle.  I'm mostly
 working on this because it's something that I think we really need to
 get done, more than having a burning desire to be the one who does it.

 Indexonly scans are welcome!
 I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

Well, I have code for #1, and just need reviews, and #2 shouldn't be
that hard, and with luck I'll twist Bruce's arm into doing it (*waves
to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
on what/how you'd like to contribute there?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Cédric Villemain
2011/5/10 Robert Haas robertmh...@gmail.com:
 On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 ANALYZE can do the stats job for 'free' on the pages it collects
 anyway. So that looks like a good idea.
 I believe the really lazy vacuum is another topic; even if it will
 improve the performance of the index only scan to have tables already
 vacuuumed, the stats should expose that and the function
 cost_index(_only?)() taking care of that.

 I basically agree.  The connection is that - as we use the all-visible
 for more things, the performance penalty for failing to vacuum (say)
 an insert-only table will continue to grow.  Still, as you say,
 clearly a separate topic.

 The temptation is high to estimate the cost of an index_scan(only) +
 ordered(by ctid) table pages fetch if heap required. (this is what I
 understood from heikki suggestion 3-4. and it makes sense). It may be
 easier to implement both at once but I didn't find the branch in the
 Heikki's git repos. (probably removed since the long time)

 I was thinking about this as well, at least if I understand you
 correctly.  That would be similar to a bitmap index scan, and I think
 it would be a great thing to have, not only because it would allow us
 to get the advantages of index-only scans in situations that are
 well-suited to our current bitmap scans, but also because it could be
 batched.  You could allocate a buffer of work_mem bytes and fill it up
 with TIDs; then, when it's full, you sort the buffer and start doing
 the necessary heap fetches in physical order.  If you still need more
 rows, you can clear the buffer and go around for another pass.

 Based on ANALYZE stats for the visibility, I believe cost_index and
 cost_index_only should be very similar functions (well, atm, I don't
 see the point to split it in 2 functions).

 Yeah, I would more imagine modifying the existing function.

 Any thoughts welcome.  Incidentally, if anyone else feels like working
 on this, feel free to let me know and I'm happy to step away, from all
 of it or from whatever part someone else wants to tackle.  I'm mostly
 working on this because it's something that I think we really need to
 get done, more than having a burning desire to be the one who does it.

 Indexonly scans are welcome!
 I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

 Well, I have code for #1, and just need reviews, and #2 shouldn't be
 that hard, and with luck I'll twist Bruce's arm into doing it (*waves
 to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
 on what/how you'd like to contribute there?

I can provide initial patchs for cost and analyze, at least.


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote:

 So, what do we need in order to find our way to index-only scans?

 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If

This topic has been discussed many times, yet I have never seen an
assessment that explains WHY we would want to do index-only scans.

This will be a complex addition to the codebase and one that could
introduce bugs into MVCC. It seems reasonable to look at what the
benefit of this would be, and what the use case/ benefit profile is
before we spend a long time adding this optimization.

I asked for this previously on earlier threads also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Cédric Villemain
 The temptation is high to estimate the cost of an index_scan(only) +
 ordered(by ctid) table pages fetch if heap required. (this is what I
 understood from heikki suggestion 3-4. and it makes sense). It may be
 easier to implement both at once but I didn't find the branch in the
 Heikki's git repos. (probably removed since the long time)

 I was thinking about this as well, at least if I understand you

yes.

 correctly.  That would be similar to a bitmap index scan, and I think
 it would be a great thing to have, not only because it would allow us
 to get the advantages of index-only scans in situations that are
 well-suited to our current bitmap scans, but also because it could be
 batched.  You could allocate a buffer of work_mem bytes and fill it up
 with TIDs; then, when it's full, you sort the buffer and start doing
 the necessary heap fetches in physical order.  If you still need more
 rows, you can clear the buffer and go around for another pass.

Issue remaining here is that we don't have 'safe' Indexonly_scan, just
indexscan with probability on the 'only'.



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 This topic has been discussed many times, yet I have never seen an
 assessment that explains WHY we would want to do index-only scans.
 
In databases with this feature, it's not too unusual for a query
which uses just an index to run one or more orders of magnitude
faster than a query which has to randomly access the heap for each
index entry.  That seems like enough evidence of its possible value
in PostgreSQL to proceed to the point where benchmarks become
possible.  I'm assuming that, like all other features added as
performance optimizations, it won't be committed until there are
benchmarks showing the net benefit.
 
As a thought experiment, picture the relative costs of scanning a
portion of an index in index sequence, and being done, versus
scanning a portion of an index in index sequence and jumping to a
random heap access for each index entry as you go.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 11:27 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2011/5/10 Robert Haas robertmh...@gmail.com:
 On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 ANALYZE can do the stats job for 'free' on the pages it collects
 anyway. So that looks like a good idea.
 I believe the really lazy vacuum is another topic; even if it will
 improve the performance of the index only scan to have tables already
 vacuuumed, the stats should expose that and the function
 cost_index(_only?)() taking care of that.

 I basically agree.  The connection is that - as we use the all-visible
 for more things, the performance penalty for failing to vacuum (say)
 an insert-only table will continue to grow.  Still, as you say,
 clearly a separate topic.

 The temptation is high to estimate the cost of an index_scan(only) +
 ordered(by ctid) table pages fetch if heap required. (this is what I
 understood from heikki suggestion 3-4. and it makes sense). It may be
 easier to implement both at once but I didn't find the branch in the
 Heikki's git repos. (probably removed since the long time)

 I was thinking about this as well, at least if I understand you
 correctly.  That would be similar to a bitmap index scan, and I think
 it would be a great thing to have, not only because it would allow us
 to get the advantages of index-only scans in situations that are
 well-suited to our current bitmap scans, but also because it could be
 batched.  You could allocate a buffer of work_mem bytes and fill it up
 with TIDs; then, when it's full, you sort the buffer and start doing
 the necessary heap fetches in physical order.  If you still need more
 rows, you can clear the buffer and go around for another pass.

 Based on ANALYZE stats for the visibility, I believe cost_index and
 cost_index_only should be very similar functions (well, atm, I don't
 see the point to split it in 2 functions).

 Yeah, I would more imagine modifying the existing function.

 Any thoughts welcome.  Incidentally, if anyone else feels like working
 on this, feel free to let me know and I'm happy to step away, from all
 of it or from whatever part someone else wants to tackle.  I'm mostly
 working on this because it's something that I think we really need to
 get done, more than having a burning desire to be the one who does it.

 Indexonly scans are welcome!
 I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

 Well, I have code for #1, and just need reviews, and #2 shouldn't be
 that hard, and with luck I'll twist Bruce's arm into doing it (*waves
 to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
 on what/how you'd like to contribute there?

 I can provide initial patchs for cost and analyze, at least.

OK, cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Simon Riggs si...@2ndquadrant.com wrote:
 This topic has been discussed many times, yet I have never seen an
 assessment that explains WHY we would want to do index-only scans.
 
 In databases with this feature, it's not too unusual for a query
 which uses just an index to run one or more orders of magnitude
 faster than a query which has to randomly access the heap for each
 index entry.  That seems like enough evidence of its possible value
 in PostgreSQL to proceed to the point where benchmarks become
 possible.  I'm assuming that, like all other features added as
 performance optimizations, it won't be committed until there are
 benchmarks showing the net benefit.
 
 As a thought experiment, picture the relative costs of scanning a
 portion of an index in index sequence, and being done, versus
 scanning a portion of an index in index sequence and jumping to a
 random heap access for each index entry as you go.

It's already the case that we'll flip over to a bitmap indexscan,
and thus get rid of most/all of the random page accesses, in
situations where this is likely to be a big win.  Pointing to the
performance difference in databases that don't do that is therefore
not too convincing.

I'm inclined to agree that index-only scans might be worth the amount
of work that's involved ... but I share Simon's desire to see some proof
before anything gets committed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 12:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Simon Riggs si...@2ndquadrant.com wrote:
 This topic has been discussed many times, yet I have never seen an
 assessment that explains WHY we would want to do index-only scans.

 In databases with this feature, it's not too unusual for a query
 which uses just an index to run one or more orders of magnitude
 faster than a query which has to randomly access the heap for each
 index entry.  That seems like enough evidence of its possible value
 in PostgreSQL to proceed to the point where benchmarks become
 possible.  I'm assuming that, like all other features added as
 performance optimizations, it won't be committed until there are
 benchmarks showing the net benefit.

 As a thought experiment, picture the relative costs of scanning a
 portion of an index in index sequence, and being done, versus
 scanning a portion of an index in index sequence and jumping to a
 random heap access for each index entry as you go.

 It's already the case that we'll flip over to a bitmap indexscan,
 and thus get rid of most/all of the random page accesses, in
 situations where this is likely to be a big win.  Pointing to the
 performance difference in databases that don't do that is therefore
 not too convincing.

 I'm inclined to agree that index-only scans might be worth the amount
 of work that's involved ... but I share Simon's desire to see some proof
 before anything gets committed.

Well, we're not in the habit of committing performance patches without
performance numbers, so I doubt we'll reverse that trend now, and
certainly I had no intention of doing so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 5:17 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Simon Riggs si...@2ndquadrant.com wrote:

 This topic has been discussed many times, yet I have never seen an
 assessment that explains WHY we would want to do index-only scans.

 In databases with this feature, it's not too unusual for a query
 which uses just an index to run one or more orders of magnitude
 faster than a query which has to randomly access the heap for each
 index entry.  That seems like enough evidence of its possible value
 in PostgreSQL to proceed to the point where benchmarks become
 possible.  I'm assuming that, like all other features added as
 performance optimizations, it won't be committed until there are
 benchmarks showing the net benefit.

 As a thought experiment, picture the relative costs of scanning a
 portion of an index in index sequence, and being done, versus
 scanning a portion of an index in index sequence and jumping to a
 random heap access for each index entry as you go.

I can picture that. Regrettably, I can also picture the accesses to
the visibility map, the maintenance operations on the VM that are
needed for this and the contention that both of those will cause.

ISTM quite likely that we'll slow down writes to some extent in order
to improve this use case.

So I'm interested in knowing how broad the use case is and what the
overheads are, rather than have an aw crap! moment in the future
where we finish the code and only then realise its benefit footprint
is not useful. Best to start out with a clear benefit analysis other
than other DBMS do it.

For example, will this be an index-specific tuning option
(manual/automatic), per table or an always-on feature?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Simon Riggs si...@2ndquadrant.com wrote:
 This topic has been discussed many times, yet I have never seen
 an assessment that explains WHY we would want to do index-only
 scans.
  
 In databases with this feature, it's not too unusual for a query
 which uses just an index to run one or more orders of magnitude
 faster than a query which has to randomly access the heap for
 each index entry.  That seems like enough evidence of its
 possible value in PostgreSQL to proceed to the point where
 benchmarks become possible.  I'm assuming that, like all other
 features added as performance optimizations, it won't be
 committed until there are benchmarks showing the net benefit.
  
 As a thought experiment, picture the relative costs of scanning a
 portion of an index in index sequence, and being done, versus
 scanning a portion of an index in index sequence and jumping to a
 random heap access for each index entry as you go.
 
 It's already the case that we'll flip over to a bitmap indexscan,
 and thus get rid of most/all of the random page accesses, in
 situations where this is likely to be a big win.  Pointing to the
 performance difference in databases that don't do that is
 therefore not too convincing.
 
Sure.  Of course, if you're only accessing twenty thousand rows from
a table containing fifty million rows, bitmap index scans could come
out pretty close to random access times; but on the whole I agree
that the scale of benefit in PostgreSQL won't tend to match what
people see in other products.  Note that my words were enough
evidence of its possible value in PostgreSQL to proceed to the point
where benchmarks become possible.
 
In particular, we might want to somehow try to make clear to people
that the very wide indexes they are accustomed to creating to allow
this optimization in other products might be inefficient compared to
creating several one-column indexes which would enable bitmap
logical operations.
 
 I'm inclined to agree that index-only scans might be worth the
 amount of work that's involved
 
So we agree there.
 
 ... but I share Simon's desire to see some proof before anything
 gets committed.
 
And we agree there.  In fact, I can't think of anyone in the
community who doesn't want to see that for *any* purported
performance enhancement.
 
My overall gut feel is that there will be some circumstances where
the covering index optmization is much faster, and some where
people expect it to be, but it isn't.  The trickiest part of this
might be developing a costing model which allows us to make the
right choice most of the time.  And even if we get it perfect, we
can expect questions about why the covering index wasn't used, and
requests for hints so they can force it.  :-(
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Greg Stark
On Tue, May 10, 2011 at 5:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's already the case that we'll flip over to a bitmap indexscan,
 and thus get rid of most/all of the random page accesses, in
 situations where this is likely to be a big win.  Pointing to the
 performance difference in databases that don't do that is therefore
 not too convincing.

The other major effect is row size. Many databases have very wide
rows, perhaps on the order of 1kB. So the table with a million rows
might be 8GB but the index on a few key columns might only be a few
megabytes. Even if you have to read the entire index in random order
it'll likely all be cached and scan faster than the table itself.

One problem with hanging on benchmarks is that database schema design
can actually change based on what performs well. People get in the
habit of creating indexes in Oracle that are only logical when you
realize they allow the database to do an index-only scan  because they
contain extra columns that aren't actually used in where clauses but
are typically in the select list.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 6:25 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 ... but I share Simon's desire to see some proof before anything
 gets committed.

 And we agree there.  In fact, I can't think of anyone in the
 community who doesn't want to see that for *any* purported
 performance enhancement.

I'm not talking about eventual commit, I'm talking about the whole
process of development.

We should be focusing on improving a measurable performance issue, not
on implementing one exact design that someone thought might help.
How will we review the patch except by measuring it against the
declared performance goal? Otherwise all the various options along the
way will just be matters of opinion, instead of measurement.

From what has been said so far, the use case for this is related to
the practice of using covered indexes, which makes me nervous
because that is an expert level tuning task on other DBMS, limiting
the range of people who get benefit. The typical speed up for
non-covered indexes will come when we access a very large table (not
in cache) via an index scan that is smaller than a bitmapindex scan.
Will we be able to gauge selectivities sufficiently accurately to be
able to pinpoint that during optimization? How will we know that the
table is not in cache? Or is this an optimisation in the executor for
a bitmapheap scan?

I'm not being negative, I'm trying to avoid arguments, blind alleys
and much wasted development if we focus on the wrong things or go to
design too early..

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 ... but I share Simon's desire to see some proof before anything
 gets committed.

 And we agree there.  In fact, I can't think of anyone in the
 community who doesn't want to see that for *any* purported
 performance enhancement.
 
 I'm not talking about eventual commit, I'm talking about the whole
 process of development.
 
I'm confused -- you want to see proof that the concept works well in
PostgreSQL before development effort on it begins?  Or there is some
alternative you would like to see pursued instead?  Something else?
 
 From what has been said so far, the use case for this is related
 to the practice of using covered indexes, which makes me nervous
 because that is an expert level tuning task on other DBMS
 
What?  On the versions of MS SQL Server and Sybase ASE I've used it
costs covered index plans against all the other plans automatically,
and picks this type of plan if the cost looks lower.  Sure, DBAs
sometimes add indexes, or add columns to indexes, in hopes that such
a plan will be chosen -- but what's new and different there?
 
 The typical speed up for non-covered indexes will come when we
 access a very large table (not in cache) via an index scan that is
 smaller than a bitmapindex scan. Will we be able to gauge
 selectivities sufficiently accurately to be able to pinpoint that
 during optimization? How will we know that the table is not in
 cache? Or is this an optimisation in the executor for a bitmapheap
 scan?
 
I would continue to object to using current cache contents for plan
choice because of plan instability and the fact that an odd initial
cache load could skew plans in a bad direction indefinitely.  I do
agree (and have already posted) that I think the hardest part of
this might be developing a good cost model.  I doubt that's an
insoluble problem, especially since it is something we can refine
over time as we gain experience with the edge cases.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 8:35 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Simon Riggs si...@2ndquadrant.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:

 ... but I share Simon's desire to see some proof before anything
 gets committed.

 And we agree there.  In fact, I can't think of anyone in the
 community who doesn't want to see that for *any* purported
 performance enhancement.

 I'm not talking about eventual commit, I'm talking about the whole
 process of development.

 I'm confused -- you want to see proof that the concept works well in
 PostgreSQL before development effort on it begins?  Or there is some
 alternative you would like to see pursued instead?  Something else?

Well, I didn't ask for that and agree it would be foolish to demand
proof ahead of development.

I know this technique is effective in other DBMS, I just want to be
certain it will be effective for us before too much work is done. We
have the additional requirement for a crash safe vacuum map that needs
to be consulted, with possible contention effects. Sybase etc can
simply avoid the logical I/O, which is always a win, in or out of
cache. So the problem is quite different for us.

What I suggested was a assessment and benefit case because we normally
start with a problem and then work out how to solve it.

Normally, others come forward with the why? when? questions and it
feels like there's a bit of groupthink going on here. This looks to me
like its being approached like it was a feature, but it looks to me
like a possible optimisation, so suggest we treat it that way.

Out of concern, I don't want you to waste time on work that *may* not
be that useful in practice, and I don't want to miss improvements or
alternatives either.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 Normally, others come forward with the why? when? questions and it
 feels like there's a bit of groupthink going on here. This looks
 to me like its being approached like it was a feature, but it
 looks to me like a possible optimisation, so suggest we treat it
 that way.
 
This issue has come up a great many times over the years, and there
has been much discussion around it.  The Wiki page is here:
 
http://wiki.postgresql.org/wiki/Index-only_scans
 
This currently references 11 threads on the topic.  I'd bet that by
spending a couple hours at it I could quadruple that number of
threads.  (I'd really rather not, though.)
 
The problem is that there are regular and fairly frequent complaints
on the list about queries which run slower than people expect
because the heap must be checked for visibility information when
matching index entries are found.  It has become enough of a
conspicuous issue that a lot of people are interested in seeing if
something can be done about it.  After much discussion, people are
trying to advance a plan to find an answer.  I'm sure nobody
involved would ignore any suggestion on how it might be done better;
but at this point, I don't think it's fair to suggest that this is
not being pursued in response to a real problem, or that no serious
thought has been given to direction before people started moving.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Greg Stark
On Wed, May 11, 2011 at 12:14 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 The problem is that there are regular and fairly frequent complaints
 on the list about queries which run slower than people expect


To be fair about 3/4 of them were actually complaining about the lack
of some global materialized cache of the aggregate value. Covering
index-only scans are only going to be a linear speedup no matter how
large the factor it's not going to turn select count(*) into a O(1)
operation.

I support the idea of thinking of this as an optimization. But I don't
think there's much question. If we can avoid doing the i/o on the heap
that's an obvious and huge win. Sure the costs of maintaining the vm
need to be measured against the gains but it we don't know what those
costs are yet and whoever works on it will be well aware of that
balance.

On a separate note though, Simon, I don't know what you mean by we
normally start with a problem. It's an free software project and
people are free to work on whatever interests them whether that's
because it solves a problem they have, helps a client who's paying
them, or just because it's of academic interest to them. We don't
always take their patches if they aren't of general interest but
people propose all kinds of crazy experimental ideas all the time.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Robert Haas wrote:
  Any thoughts welcome. ?Incidentally, if anyone else feels like working
  on this, feel free to let me know and I'm happy to step away, from all
  of it or from whatever part someone else wants to tackle. ?I'm mostly
  working on this because it's something that I think we really need to
  get done, more than having a burning desire to be the one who does it.
 
  Indexonly scans are welcome!
  I believe I can help on 3 and 4, but (really) not sure for 1 and 2.
 
 Well, I have code for #1, and just need reviews, and #2 shouldn't be
 that hard, and with luck I'll twist Bruce's arm into doing it (*waves
 to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
 on what/how you'd like to contribute there?

I am happy to have pg_upgrade skip upgrading visibility map files --- it
already has code to conditionally process them because they only exist
in = 8.4:

/* fsm/vm files added in PG 8.4 */
if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
{
/*
 * Copy/link any fsm and vm files, if they exist
 */

Just give the word and it will be done.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Greg Stark wrote:
 On a separate note though, Simon, I don't know what you mean by we
 normally start with a problem. It's an free software project and
 people are free to work on whatever interests them whether that's
 because it solves a problem they have, helps a client who's paying
 them, or just because it's of academic interest to them. We don't
 always take their patches if they aren't of general interest but
 people propose all kinds of crazy experimental ideas all the time.

I am confused by Simon's questions too.  

Simon seems to regularly argue for adding features late in the
development cycle and backpatch things no one else thinks should be
backpatched, but he wants more research that index-only scans are going
to improve things before it is implemented?   The first is aggressive
development, the second is very conservative development --- they don't
match, so I now wonder what the motivation is since it isn't consistent.

Isn't speeding up COUNT(*) a sufficient case because it will not have to
touch the heap in many cases?  No one is going to apply this patch until
we fully understand the performance implications, just like every other
patch.  No one has suggested otherwise.

It is helpful to have people critically review all our work, but
disagreeing just for the sake of causing discussion isn't helpful, and I
have seen a lot of these discussions lately.  I am sensing a pattern.  :-(

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Greg Stark
On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote:
 Isn't speeding up COUNT(*) a sufficient case because it will not have to
 touch the heap in many cases?

Putting aside the politics questions, count(*) is an interesting case
-- it exposes some of the unanswered questions about index-only scans.

The reason select count(*) might win would be because we could pick
any index and do an index scan, relying on the visibility map to
optimize away the heap reads. This is only going to be a win if a
large fraction of the heap reads get optimized away.

It's going to be pretty tricky to determine in the optimizer a) which
index will be cheapest and b) what fraction of index tuples will point
to pages where the heap reference can be optimized away. The penalty
for guessing wrong if we use an index-only scan and it turns out to
have many pages that aren't all-visible would be pretty high.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Greg Stark wrote:
 On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote:
  Isn't speeding up COUNT(*) a sufficient case because it will not have to
  touch the heap in many cases?
 
 Putting aside the politics questions, count(*) is an interesting case
 -- it exposes some of the unanswered questions about index-only scans.
 
 The reason select count(*) might win would be because we could pick
 any index and do an index scan, relying on the visibility map to
 optimize away the heap reads. This is only going to be a win if a
 large fraction of the heap reads get optimized away.
 
 It's going to be pretty tricky to determine in the optimizer a) which
 index will be cheapest and b) what fraction of index tuples will point
 to pages where the heap reference can be optimized away. The penalty
 for guessing wrong if we use an index-only scan and it turns out to
 have many pages that aren't all-visible would be pretty high.

Yes, that is the tricky optimizer/analyze part.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Robert Haas wrote:
 So, what do we need in order to find our way to index-only scans?
 
 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.  However, before we can rely on the

FYI, because the visibility map is only one _bit_ per page, it is 8000 *
8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of
heap pages.  This is important because we rely on this compactness in
hope that the WAL logging of this information will not be burdensome.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Greg Stark wrote:
 On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote:
  Isn't speeding up COUNT(*) a sufficient case because it will not have to
  touch the heap in many cases?
 
 Putting aside the politics questions, count(*) is an interesting case
 -- it exposes some of the unanswered questions about index-only scans.
 
 The reason select count(*) might win would be because we could pick
 any index and do an index scan, relying on the visibility map to
 optimize away the heap reads. This is only going to be a win if a
 large fraction of the heap reads get optimized away.
 
 It's going to be pretty tricky to determine in the optimizer a) which
 index will be cheapest and b) what fraction of index tuples will point

I assume the smallest non-partial index would be the cheapest index.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Greg Stark wrote:
 Putting aside the politics questions, count(*) is an interesting case
 -- it exposes some of the unanswered questions about index-only scans.
 
 The reason select count(*) might win would be because we could pick
 any index and do an index scan, relying on the visibility map to
 optimize away the heap reads. This is only going to be a win if a
 large fraction of the heap reads get optimized away.
 
 It's going to be pretty tricky to determine in the optimizer a) which
 index will be cheapest and b) what fraction of index tuples will point

 I assume the smallest non-partial index would be the cheapest index.

That will be true only if you intentionally ignore the points Greg
raised.  If the table isn't entirely ALL_VISIBLE, then the choice of
index will determine the ordering of the actual table probes that occur.
There could be more or fewer page reads, in a more or less optimal
order, depending on the index used.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Greg Stark wrote:
  Putting aside the politics questions, count(*) is an interesting case
  -- it exposes some of the unanswered questions about index-only scans.
  
  The reason select count(*) might win would be because we could pick
  any index and do an index scan, relying on the visibility map to
  optimize away the heap reads. This is only going to be a win if a
  large fraction of the heap reads get optimized away.
  
  It's going to be pretty tricky to determine in the optimizer a) which
  index will be cheapest and b) what fraction of index tuples will point
 
  I assume the smallest non-partial index would be the cheapest index.
 
 That will be true only if you intentionally ignore the points Greg
 raised.  If the table isn't entirely ALL_VISIBLE, then the choice of
 index will determine the ordering of the actual table probes that occur.
 There could be more or fewer page reads, in a more or less optimal
 order, depending on the index used.

OK, would the clustering analyze stats (pg_stats.correlation) tell us
that?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Jesper Krogh

On 2011-05-11 01:54, Greg Stark wrote:

To be fair about 3/4 of them were actually complaining about the lack
of some global materialized cache of the aggregate value. Covering
index-only scans are only going to be a linear speedup no matter how
large the factor it's not going to turn select count(*) into a O(1)
operation.

Actually, if we could get to count(*) into the situation of a
very thin row today, so the cost of visibillity-testing didn't depend
hugely on the width of the row any more, then we be half-
way-there in terms of performance optimization.

If rows typically just were tuple-headers + a bit more, then it
would be way harder to go down this road and claim good
benefits. But currently the system needs to drag in allmost
one page pr. visibillity test from disk on random large tables.

I tried to graph the differences of thin vs. wide rows here:
http://shrek.*krogh*.cc/~*jesper*/*visibillitytesting*.pdf 
http://shrek.krogh.cc/%7Ejesper/visibillitytesting.pdf

http://postgresql.1045698.n5.nabble.com/Visibillity-testing-some-numbers-on-current-performance-td4284836.html

The getting the visibillitymap down to an O(n) is on large tables
shifting to be memory based vs. disk-based as now.

Jesper (It not a goal, but it would most-likely postpone some
peoples needs for buying a FusionIO card or similar)
--
Jesper


[HACKERS] the big picture for index-only scans

2011-05-09 Thread Robert Haas
So, what do we need in order to find our way to index-only scans?

1. The visibility map needs to be crash-safe.  The basic idea of
index-only scans is that, instead of checking the heap to find out
whether each tuple is visible, we first check the visibility map.  If
the visibility map bit is set, then we know all tuples on the page are
visible to all transactions, and therefore the tuple of interest is
visible to our transaction.  Assuming that a significant number of
visibility map bits are set, this should enable us to avoid a fair
amount of I/O, especially on large tables, because the visibility map
is roughly 8000 times smaller than the heap, and therefore far more
practical to keep in cache.  However, before we can rely on the
visibility map for this purpose, we need to fix the problems that can
leave bits set inappropriately in the face of an inconveniently-timed
crash.  I've been working on a patch for this on and off for a few
months now; my latest version is in need of review[1].

2. Crash safe visibility map vs. pg_upgrade.  Even if we make the
visibility map crash-safe in 9.2, people are going to want to use
pg_upgrade to migrate from older versions, bringing their
possibly-not-quite-correct visibility map forks along with them.  How
should we handle that?  We could (2A) arrange to have pg_upgrade nuke
all visibility forks when upgrading from a release where the
visibility map is not crash-safe to one where it is; (2B) keep a piece
of state somewhere indicating, for each relation, whether or not the
visibility map can be trusted, set it to false only if pg_upgrade
brings the relation over from and older version, and set it to true
after a successful vacuum that skips no intervening pages; or (2C)
advise the user to do a VACUUM FULL on each of their tables
pre-upgrade, and if they don't, treat wrong answers as their own
fault.  (I doubt anyone will advocate for this option, but for the
sake of completeness...).  (2A) seems like the simplest solution,
especially because it also avoids the overhead of checking the is the
visibility map bit reliable? flag every time we want to plan a query.

3. Statistics.  I believe that in order to accurately estimate the
cost of an index-only scan, we're going to need to know the fraction
of tuples that are on pages whose visibility map bits are set.  I
believe it should be fairly straightforward to have ANALYZE collect
this information; and I'm inclined to do that as a separate patch.  It
seems like it would also be nice to know what fraction of tuples are
on pages that don't have the visibility map set but where, in fact,
all tuples on the page are visible to all transactions, so it would be
legal to set the bit.  A large discrepancy between these two
percentages might be a good reason to auto-vacuum the table (perhaps
using a really lazy vacuum[2]).  I'm not sure if this can be added
cheaply, though, and in any case, any change to the set of criteria
that will trigger an auto-vacuum is probably a can of worms.
Thoughts?

4. Planner and executor changes.  In contrast to Heikki's original
implementation, I'm inclined to not to try to split the Index Scan
node into index scan and heap fetch.  Since there are many choices for
where to put the heap fetch node (any level of the join tree between
the index scan and the root), this seems likely to result in a
combinatorial explosion of paths[3], and I'm not real sure that the
payback will be adequate.  Furthermore, the idea of allowing user code
to see tuples that will only later be determined not to have been
visible to that MVCC snapshot in the first place seems pretty scary
from a security perspective, though certainly there are possible
benefits[4].  Instead, I'm inclined to just have the planner evaluate
whether the necessary columns can be extracted purely from the index.
If not, we proceed as now.  If so, we can use the index only
approach of using the visibility map to decide which heap fetches can
be skipped.  It's not clear to me whether we need to compare the cost
of the standard approach with the cost of the index only approach:
in theory, if there aren't any visibility map bits anyway, the index
only approach could be slower.  But I'm not sure whether that problem
is significant or common enough to be worth expending a lot of code
on.  Either way, the number of actual paths doesn't need to increase,
because in this design, even if we apply a costing model, one approach
will dominate the other.  Heikki also suggested considering index
scans in cases where we don't now[4, again] but I'm inclined to leave
this, too, for a later optimization, again because balancing the
increase in paths against the possible performance benefits of using
indexes in more situations seems finicky.  In short, for a first cut
at this, I just want to look at this as a way to get cheaper index
scans, and leave everything else to future work.

Any thoughts welcome.  Incidentally, if anyone else feels like working
on this, feel 

Re: [HACKERS] the big picture for index-only scans

2011-05-09 Thread Merlin Moncure
On Mon, May 9, 2011 at 10:25 PM, Robert Haas robertmh...@gmail.com wrote:
 So, what do we need in order to find our way to index-only scans?

 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.

hm, what are the implications for tuple hint bits, short and long
term?  I'm particularly interested if you think any hint bit i/o
mitigation strategies are worth pursuing.

 2. Crash safe visibility map vs. pg_upgrade.  Even if we make the
 visibility map crash-safe in 9.2, people are going to want to use
 pg_upgrade to migrate from older versions, bringing their
 possibly-not-quite-correct visibility map forks along with them.  How
 should we handle that?  We could (2A) arrange to have pg_upgrade nuke
 all visibility forks when upgrading from a release where the
 visibility map is not crash-safe to one where it is;

+1 on 2A.

 3. Statistics.  I believe that in order to accurately estimate the
 cost of an index-only scan, we're going to need to know the fraction
 of tuples that are on pages whose visibility map bits are set.

It would be helpful to know the performance benefit of index only
scans before knowing how much benefit to attribute here.  Maybe a
system wide kludge would for starters anyway, like assuming 60% of
pages can be vis checked from the VM, or a single GUC, Then again,
maybe not.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers