Re: [HACKERS] Covering Indexes

2012-08-25 Thread Florian Weimer
* Jeff Janes:

 I don't see the virtue of this in this case.  Since the index is not
 unique, why not just put the index on (a,b,c,d) and be done with it?

AFAICT, SQLite 4 encodes keys in a way that is not easily reversed
(although the encoding is injective, so it's reversible in principle).
Therefore, covered columns need to be listed separately, so they are
not subject to the key encoding.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-31 Thread Jeff Davis
On Thu, 2012-07-26 at 12:13 -0400, Bruce Momjian wrote: 
 So, do we want a TODO item about adding columns to a unique index that
 will not be used for uniqueness checks?

-1 from me, at least in its current form.

At it's heart, this is about separating the constraint from the index
that enforces it -- you'd like the columns to be available for querying
(for index only scans or otherwise), but not to take part in the
constraint.

And when you look at it from that perspective, this proposal is and
extremely limited form. You can't, for example, decide that an existing
index can be used for a new unique constraint. That's a lot more
plausible than the use cases mentioned in this thread as far as I can
see, but this proposal can't do that.

I tried proposing a more general use case when developing exclusion
constraints:

http://archives.postgresql.org/message-id/1253466074.6983.22.camel@jdavis

(allow user to specify multiple constraints enforced by one existing
index). But, at least at the time, my proposal didn't pass the
usefulness test:

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php

even though my proposal was strictly more powerful than this one is.

Also, this proposal extends the weird differences between CREATE UNIQUE
INDEX and a the declaration of a UNIQUE constraint. For example, if you
want DEFERRABLE you need to declare the constraint, but if you want to
use an expression (rather than a simple column reference) you need to
create the index. This problem does not exist with exclusion
constraints.

In my opinion, new innovations in unique constraints would be better
served as part of exclusion constraints, and we should keep unique
constraints simple. If we make an improvement to UNIQUE, then we will
want to do similar things for exclusion constraints anyway, so it just
seems duplicative.

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-28 Thread Jeff Davis
On Fri, 2012-07-27 at 15:27 -0500, Merlin Moncure wrote:
 The covering index/uniqueness use case adds
 legitimacy to the INDEX clause of exclusion constraints IMNSHO.

Yes, I think it would be worth revisiting the idea.

 One point of concern though is that (following a bit of testing)
 alter table foo add exclude using btree (id with =);
 ...is always strictly slower for inserts than
 alter table foo add primary key(id);

Yes, in my worst-case tests there is about a 2X difference for building
the constraint and about a 30-50% slowdown during INSERT (I thought I
remembered the difference being lower, but I didn't dig up my old test).
That's for an in-memory case, I would expect disk I/O to make the
difference less apparent.

 This is probably because it doesn't use the low level btree based
 uniqueness check (the index is not declared UNIQUE) -- shouldn't it do
 that if it can?

We could probably detect that the operator being used is the btree
equality operator, set the unique property of the btree, and avoid the
normal exclusion constraint check. I'm sure there are some details to
work out, but if we start collecting more use cases where people want
the flexibility of exclusion constraints and the speed of UNIQUE, we
should look into it.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-28 Thread Jeff Janes
On Fri, Jul 27, 2012 at 1:27 PM, Merlin Moncure mmonc...@gmail.com wrote:

 One point of concern though is that (following a bit of testing)
 alter table foo add exclude using btree (id with =);
 ...is always strictly slower for inserts than
 alter table foo add primary key(id);

 This is probably because it doesn't use the low level btree based
 uniqueness check (the index is not declared UNIQUE) -- shouldn't it do
 that if it can?

If it did that, then than would make it faster in precisely those
cases were I wouldn't use it in the first place--where there is a less
esoteric alternative that does exactly the same thing.  While that is
not something without value, it would seem better (although
potentially more difficult of course) to just make it faster in
general, instead.

I didn't look into the creation, but rather into inserts.  During
inserts, it looks like it is doing a look up into the btree twice,
presumably once to maintain it, and once to check for uniqueness.  If
there was some way to cache the look-up between those, I think it
would go a long way towards eliminating the performance difference.
Could that be done without losing the generality?

And, does it matter?  I would think covering indexes would be deployed
to best effect when your data is not cached in RAM, in which case the
IO cost common to both paths probably overwhelms any extra CPU cost.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-27 Thread Jeff Davis
On Thu, 2012-07-26 at 12:13 -0400, Bruce Momjian wrote: 
 So, do we want a TODO item about adding columns to a unique index that
 will not be used for uniqueness checks?

-1 from me, at least in its current form.

At it's heart, this is about separating the constraint from the index
that enforces it -- you'd like the columns to be available for querying
(for index only scans or otherwise), but not to take part in the
constraint.

And when you look at it from that perspective, this proposal is and
extremely limited form. You can't, for example, decide that an existing
index can be used for a new unique constraint. That's a lot more
plausible than the use cases mentioned in this thread as far as I can
see, but this proposal can't do that.

I tried proposing a more general use case when developing exclusion
constraints:

http://archives.postgresql.org/message-id/1253466074.6983.22.camel@jdavis

(allow user to specify multiple constraints enforced by one existing
index). But, at least at the time, my proposal didn't pass the
usefulness test:

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php

even though my proposal was strictly more powerful than this one is.

Also, this proposal extends the weird differences between CREATE UNIQUE
INDEX and a the declaration of a UNIQUE constraint. For example, if you
want DEFERRABLE you need to declare the constraint, but if you want to
use an expression (rather than a simple column reference) you need to
create the index. This problem does not exist with exclusion
constraints.

In my opinion, new innovations in unique constraints would be better
served as part of exclusion constraints, and we should keep unique
constraints simple. If we make an improvement to UNIQUE, then we will
want to do similar things for exclusion constraints anyway, so it just
seems duplicative.

Regards,
Jeff Davis







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-27 Thread Merlin Moncure
On Fri, Jul 27, 2012 at 12:24 PM, Jeff Davis pg...@j-davis.com wrote:
 On Thu, 2012-07-26 at 12:13 -0400, Bruce Momjian wrote:
 So, do we want a TODO item about adding columns to a unique index that
 will not be used for uniqueness checks?

 -1 from me, at least in its current form.

 At it's heart, this is about separating the constraint from the index
 that enforces it -- you'd like the columns to be available for querying
 (for index only scans or otherwise), but not to take part in the
 constraint.

 And when you look at it from that perspective, this proposal is and
 extremely limited form. You can't, for example, decide that an existing
 index can be used for a new unique constraint. That's a lot more
 plausible than the use cases mentioned in this thread as far as I can
 see, but this proposal can't do that.

 I tried proposing a more general use case when developing exclusion
 constraints:

 http://archives.postgresql.org/message-id/1253466074.6983.22.camel@jdavis

 (allow user to specify multiple constraints enforced by one existing
 index). But, at least at the time, my proposal didn't pass the
 usefulness test:

 http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php

 even though my proposal was strictly more powerful than this one is.

 Also, this proposal extends the weird differences between CREATE UNIQUE
 INDEX and a the declaration of a UNIQUE constraint. For example, if you
 want DEFERRABLE you need to declare the constraint, but if you want to
 use an expression (rather than a simple column reference) you need to
 create the index. This problem does not exist with exclusion
 constraints.

 In my opinion, new innovations in unique constraints would be better
 served as part of exclusion constraints, and we should keep unique
 constraints simple. If we make an improvement to UNIQUE, then we will
 want to do similar things for exclusion constraints anyway, so it just
 seems duplicative.

Well, you're right.  The exclusion constraint syntax is amazingly
general (if somewhat arcane) and it would be neat to be extended like
that.  It already decouples you from physical assumptions on the
index.  For example, it creates a two field index for a double field
btree equality exclusion and does a runtime, not equality based
uniqueness check.  The covering index/uniqueness use case adds
legitimacy to the INDEX clause of exclusion constraints IMNSHO.

One point of concern though is that (following a bit of testing)
alter table foo add exclude using btree (id with =);
...is always strictly slower for inserts than
alter table foo add primary key(id);

This is probably because it doesn't use the low level btree based
uniqueness check (the index is not declared UNIQUE) -- shouldn't it do
that if it can?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-26 Thread Bruce Momjian
On Tue, Jul 17, 2012 at 06:00:37PM +0100, Simon Riggs wrote:
  Either way the data in c and d are IN THE INDEX otherwise in neither
  case could the data values be returned while strictly querying the index.
 
  So the question that needs to be asked is what kind of performance increase
  can be had during DML (insert/update) statements and whether those gains are
  worth pursuing.  Since these other engines appear to allow both cases you
  should be able to get at least a partial idea of the performance gains
  between index (a,b,c,d) and index (a,b) covering (c,d).
 
 There is a use case, already discussed, whereby that is useful for
create unique index on foo (a,b) covering (c,d)
 
 but there really isn't any functional difference between
create index on foo (a,b) covering (c,d)
 
 and
create index on foo (a,b,c,d)
 
 There is a potential performance impact. But as Tom says, that might
 even be negative if it is actually measurable.

So, do we want a TODO item about adding columns to a unique index that
will not be used for uniqueness checks?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-26 Thread Merlin Moncure
On Thu, Jul 26, 2012 at 11:13 AM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jul 17, 2012 at 06:00:37PM +0100, Simon Riggs wrote:
  Either way the data in c and d are IN THE INDEX otherwise in neither
  case could the data values be returned while strictly querying the index.
 
  So the question that needs to be asked is what kind of performance increase
  can be had during DML (insert/update) statements and whether those gains 
  are
  worth pursuing.  Since these other engines appear to allow both cases you
  should be able to get at least a partial idea of the performance gains
  between index (a,b,c,d) and index (a,b) covering (c,d).

 There is a use case, already discussed, whereby that is useful for
create unique index on foo (a,b) covering (c,d)

 but there really isn't any functional difference between
create index on foo (a,b) covering (c,d)

 and
create index on foo (a,b,c,d)

 There is a potential performance impact. But as Tom says, that might
 even be negative if it is actually measurable.

 So, do we want a TODO item about adding columns to a unique index that
 will not be used for uniqueness checks?

I think so.  The case where you want a wide multiple column primary
key to be extended to cover that one extra commonly grabbed value is
not super common but entirely plausible.  With the existing
infrastructure to get the advantages of index covering you'd have to
duplicate the entire index for the extra field which really sucks:
aside from the huge waste of time and space, you force the planner to
play the 'which index do i use?' game.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-26 Thread Robert Haas
On Thu, Jul 26, 2012 at 12:25 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Jul 26, 2012 at 11:13 AM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jul 17, 2012 at 06:00:37PM +0100, Simon Riggs wrote:
  Either way the data in c and d are IN THE INDEX otherwise in neither
  case could the data values be returned while strictly querying the index.
 
  So the question that needs to be asked is what kind of performance 
  increase
  can be had during DML (insert/update) statements and whether those gains 
  are
  worth pursuing.  Since these other engines appear to allow both cases you
  should be able to get at least a partial idea of the performance gains
  between index (a,b,c,d) and index (a,b) covering (c,d).

 There is a use case, already discussed, whereby that is useful for
create unique index on foo (a,b) covering (c,d)

 but there really isn't any functional difference between
create index on foo (a,b) covering (c,d)

 and
create index on foo (a,b,c,d)

 There is a potential performance impact. But as Tom says, that might
 even be negative if it is actually measurable.

 So, do we want a TODO item about adding columns to a unique index that
 will not be used for uniqueness checks?

 I think so.  The case where you want a wide multiple column primary
 key to be extended to cover that one extra commonly grabbed value is
 not super common but entirely plausible.  With the existing
 infrastructure to get the advantages of index covering you'd have to
 duplicate the entire index for the extra field which really sucks:
 aside from the huge waste of time and space, you force the planner to
 play the 'which index do i use?' game.

I think it is going to take several years before we really understand
how index-only scans play out in the real world, and what factors
limit their usefulness.  This one has come up a few times because it's
sort of an obvious thing to want to do and we don't have it, but I
think that there's room for some skepticism about how well it will
actually work, for reasons that have already been mentioned, and of
course also because indexing more columns potentially means defeating
HOT, which I suspect will defeat many otherwise-promising applications
of index-only scans.

That having been said, it would be unwise to speculate too much in
advance of the data, and we're not going to get any data until someone
tries implementing it, so +1 from me for putting something on the TODO
list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-26 Thread Merlin Moncure
On Thu, Jul 26, 2012 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote:
 I think so.  The case where you want a wide multiple column primary
 key to be extended to cover that one extra commonly grabbed value is
 not super common but entirely plausible.  With the existing
 infrastructure to get the advantages of index covering you'd have to
 duplicate the entire index for the extra field which really sucks:
 aside from the huge waste of time and space, you force the planner to
 play the 'which index do i use?' game.

 I think it is going to take several years before we really understand
 how index-only scans play out in the real world, and what factors
 limit their usefulness.  This one has come up a few times because it's
 sort of an obvious thing to want to do and we don't have it, but I
 think that there's room for some skepticism about how well it will
 actually work, for reasons that have already been mentioned, and of
 course also because indexing more columns potentially means defeating
 HOT, which I suspect will defeat many otherwise-promising applications
 of index-only scans.

Sure.  many will still get to use them though: I'm doing tons of
OLAP/BI lately: wide keys, minimal to no updating, minimal to no RI,
andvery large tables (often clustered and partitioned), and extreme
performance requirements.  Covering indexes to me is basically a drop
in feature and COVERING seems to make a lot of sense on paper.  (I
absolutely can't wait to get 9.2 on some of our bigger servers here).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-17 Thread Simon Riggs
On 28 June 2012 13:16, David E. Wheeler da...@justatheory.com wrote:

 Very interesting design document for SQLite 4:

   http://www.sqlite.org/src4/doc/trunk/www/design.wiki

 I'm particularly intrigued by covering indexes. For example:

 CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

 This allows the following query to do an index-only scan:

 SELECT c, d FROM table1 WHERE a=? AND b=?;

 Now that we have index-only scans in 9.2, I'm wondering if it would make 
 sense to add covering index support, too, where additional, unindexed columns 
 are stored alongside indexed columns.


Just to be clear, the ability to have covered indexes is already in
9.2, I updated the release notes to explain that a few months back.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-17 Thread David E. Wheeler
On Jul 17, 2012, at 5:18 PM, Simon Riggs wrote:

 Now that we have index-only scans in 9.2, I'm wondering if it would make 
 sense to add covering index support, too, where additional, unindexed 
 columns are stored alongside indexed columns.
 
 Just to be clear, the ability to have covered indexes is already in
 9.2, I updated the release notes to explain that a few months back.

You mean this?

 Allow queries to retrieve data only from indexes, avoiding heap access 
 (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)
 
 This is often called index-only scans or covering indexes. This is 
 possible for heap pages with exclusively all-visible tuples, as reported by 
 the visibility map. The visibility map was made crash-safe as a necessary 
 part of implementing this feature.

That’s not how SQLite is using the term “covering index.” What they mean is the 
ability to have additional, unindexed columns in an index, so that they can 
*also* be returned in the event of an index-only scan.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-17 Thread Simon Riggs
On 17 July 2012 16:21, David E. Wheeler da...@justatheory.com wrote:
 On Jul 17, 2012, at 5:18 PM, Simon Riggs wrote:

 Now that we have index-only scans in 9.2, I'm wondering if it would make 
 sense to add covering index support, too, where additional, unindexed 
 columns are stored alongside indexed columns.

 Just to be clear, the ability to have covered indexes is already in
 9.2, I updated the release notes to explain that a few months back.

 You mean this?

 Allow queries to retrieve data only from indexes, avoiding heap access 
 (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)

 This is often called index-only scans or covering indexes. This is 
 possible for heap pages with exclusively all-visible tuples, as reported by 
 the visibility map. The visibility map was made crash-safe as a necessary 
 part of implementing this feature.

 That’s not how SQLite is using the term “covering index.” What they mean is 
 the ability to have additional, unindexed columns in an index, so that they 
 can *also* be returned in the event of an index-only scan.

  CREATE INDEX ON foo (a, b, c, d);

allows

  SELECT c, d FROM foo WHERE a = ? AND b = ?

to use an index only scan.

The phrase unindexed seems misleading since the data is clearly in
the index from the description on the URL you gave. And since the
index is non-unique, I don't see any gap between Postgres and
SQLliite4.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-17 Thread David E. Wheeler
On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:

  CREATE INDEX ON foo (a, b, c, d);
 
 allows
 
  SELECT c, d FROM foo WHERE a = ? AND b = ?
 
 to use an index only scan.
 
 The phrase unindexed seems misleading since the data is clearly in
 the index from the description on the URL you gave. And since the
 index is non-unique, I don't see any gap between Postgres and
 SQLliite4.

Yeah, but that index is unnecessarily big if one will never use c or d in the 
search. The nice thing about covering indexes as described for SQLite 4 and 
implemented in MSSQL is that you can specify additional columns that just come 
along for the ride, but are not part of the indexed data:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

Yes, you can do that by also indexing c and d as of 9.2, but it might be nice 
to be able to include them in the index as additional row data without actually 
indexing them.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-17 Thread Simon Riggs
On 17 July 2012 16:54, David E. Wheeler da...@justatheory.com wrote:
 On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:

  CREATE INDEX ON foo (a, b, c, d);

 allows

  SELECT c, d FROM foo WHERE a = ? AND b = ?

 to use an index only scan.

 The phrase unindexed seems misleading since the data is clearly in
 the index from the description on the URL you gave. And since the
 index is non-unique, I don't see any gap between Postgres and
 SQLliite4.

 Yeah, but that index is unnecessarily big if one will never use c or d in the 
 search. The nice thing about covering indexes as described for SQLite 4 and 
 implemented in MSSQL is that you can specify additional columns that just 
 come along for the ride, but are not part of the indexed data:

 CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

 Yes, you can do that by also indexing c and d as of 9.2, but it might be nice 
 to be able to include them in the index as additional row data without 
 actually indexing them.

Can you explain what you mean by without actually indexing them?
ISTM that it is a non-feature if the index is already non-unique, and
the difference is simply down to the amount of snake oil applied to
the descriptive text on the release notes.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-17 Thread Vik Reykja
On Tue, Jul 17, 2012 at 6:08 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 17 July 2012 16:54, David E. Wheeler da...@justatheory.com wrote:
  Yeah, but that index is unnecessarily big if one will never use c or d
 in the search. The nice thing about covering indexes as described for
 SQLite 4 and implemented in MSSQL is that you can specify additional
 columns that just come along for the ride, but are not part of the indexed
 data:
 
  CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
 
  Yes, you can do that by also indexing c and d as of 9.2, but it might be
 nice to be able to include them in the index as additional row data without
 actually indexing them.

 Can you explain what you mean by without actually indexing them?
 ISTM that it is a non-feature if the index is already non-unique, and
 the difference is simply down to the amount of snake oil applied to
 the descriptive text on the release notes.


It would be useful in non-unique indexes to store data without ordering
operators (like xml).


Re: [HACKERS] Covering Indexes

2012-07-17 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of David E. Wheeler
 Sent: Tuesday, July 17, 2012 11:55 AM
 To: Simon Riggs
 Cc: Pg Hackers
 Subject: Re: [HACKERS] Covering Indexes
 
 On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:
 
   CREATE INDEX ON foo (a, b, c, d);
 
  allows
 
   SELECT c, d FROM foo WHERE a = ? AND b = ?
 
  to use an index only scan.
 
  The phrase unindexed seems misleading since the data is clearly in
  the index from the description on the URL you gave. And since the
  index is non-unique, I don't see any gap between Postgres and
  SQLliite4.
 
 Yeah, but that index is unnecessarily big if one will never use c or d in
the
 search. The nice thing about covering indexes as described for SQLite 4
and
 implemented in MSSQL is that you can specify additional columns that just
 come along for the ride, but are not part of the indexed data:
 
 CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
 
 Yes, you can do that by also indexing c and d as of 9.2, but it might be
nice to
 be able to include them in the index as additional row data without
actually
 indexing them.
 
 Best,
 
 David

Concretely, I would presume that the contents of a covering index could then
look like the following (a,b,c,d):

(2,1,2,A)
(2,1,5,A) -- the 5 is out of natural order but exists in the covering
part
(2,1,3,A)

Whereas PostgreSQL would be forced to have the index ordered as such:

(2,1,2,A)
(2,1,3,A)
(2,1,5,A)

Either way the data in c and d are IN THE INDEX otherwise in neither
case could the data values be returned while strictly querying the index.

So the question that needs to be asked is what kind of performance increase
can be had during DML (insert/update) statements and whether those gains are
worth pursuing.  Since these other engines appear to allow both cases you
should be able to get at least a partial idea of the performance gains
between index (a,b,c,d) and index (a,b) covering (c,d).

Vik's concurrent point regarding non-indexable values makes some sense but
the use case there seems specialized as I suspect that in the general case
values that are non-indexable (if there truly are any) are generally those
that would be too large to warrant sticking into an index in the first
place.  But, XML values do ring true in my mind (particularly frequently
used fragments that are generally quite small).  But again whether that is a
reasonable use case for a covering index I do not know.  It feels like
trying to solve the remaining 10% when it took a long while to even muster
up enough support and resources to solve the 90%.

David J.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-17 Thread Simon Riggs
On 17 July 2012 17:41, David Johnston pol...@yahoo.com wrote:

 Concretely, I would presume that the contents of a covering index could then
 look like the following (a,b,c,d):

 (2,1,2,A)
 (2,1,5,A) -- the 5 is out of natural order but exists in the covering
 part
 (2,1,3,A)

 Whereas PostgreSQL would be forced to have the index ordered as such:

 (2,1,2,A)
 (2,1,3,A)
 (2,1,5,A)

 Either way the data in c and d are IN THE INDEX otherwise in neither
 case could the data values be returned while strictly querying the index.

 So the question that needs to be asked is what kind of performance increase
 can be had during DML (insert/update) statements and whether those gains are
 worth pursuing.  Since these other engines appear to allow both cases you
 should be able to get at least a partial idea of the performance gains
 between index (a,b,c,d) and index (a,b) covering (c,d).

There is a use case, already discussed, whereby that is useful for
   create unique index on foo (a,b) covering (c,d)

but there really isn't any functional difference between
   create index on foo (a,b) covering (c,d)

and
   create index on foo (a,b,c,d)

There is a potential performance impact. But as Tom says, that might
even be negative if it is actually measurable.


 Vik's concurrent point regarding non-indexable values makes some sense but
 the use case there seems specialized as I suspect that in the general case
 values that are non-indexable (if there truly are any) are generally those
 that would be too large to warrant sticking into an index in the first
 place.

I think it would be easy enough to add noop operators for sorts if you
wanted to do that.

 But, XML values do ring true in my mind (particularly frequently
 used fragments that are generally quite small).  But again whether that is a
 reasonable use case for a covering index I do not know.  It feels like
 trying to solve the remaining 10% when it took a long while to even muster
 up enough support and resources to solve the 90%.

The main thing is that we definitely already do have covering indexes
and we will be announcing we have that soon. The fact we have chosen
to implement that without adding new syntax strikes me as a selling
point as well, so all client tools still work.

So the feature we are talking about here needs to be called something
else, otherwise we will be confusing people. Unsorted trailing index
columns...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-17 Thread Andrew Dunstan


On 07/17/2012 12:41 PM, David Johnston wrote:


So the question that needs to be asked is what kind of performance increase
can be had during DML (insert/update) statements and whether those gains are
worth pursuing.  Since these other engines appear to allow both cases you
should be able to get at least a partial idea of the performance gains
between index (a,b,c,d) and index (a,b) covering (c,d).



Tom's recent answer to me on this point (as I understood it) was that he 
would expect performance to degrade, not improve, since the btree code 
is known not to perform well when there are many non-unique values.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-17 Thread Tom Lane
David E. Wheeler da...@justatheory.com 
ca+u5nmjz33zsvqpzk-auoindxkq6elip1hgq53byodlpwfd...@mail.gmail.com writes:
 On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:
 The phrase unindexed seems misleading since the data is clearly in
 the index from the description on the URL you gave. And since the
 index is non-unique, I don't see any gap between Postgres and
 SQLliite4.

 Yeah, but that index is unnecessarily big if one will never use c or d
 in the search.

The data would still have to be stored in the leaf entries, at least.
Yeah, you could possibly omit the unindexed columns from upper tree
levels, but with typical btree fanout ratios in the hundreds, the
overall space savings would be negligible.  The idea of different index
tuple descriptors on different tree levels doesn't appeal to me, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-06 Thread Csaba Nagy
Hi all,

 On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler da...@justatheory.com 
 wrote:
 I don't see the virtue of this in this case.  Since the index is not
 unique, why not just put the index on (a,b,c,d) and be done with it?
 Is there some advantage to be had in inventing a way to store c and d
 in the index without having them usable for indexing?

Why not restrict it to UNIQUE indexes ?

For not unique indexes it has no advantages (it could be in fact indexed
on all columns anyway as an implementation detail).

For the unique case the problem of many identical entries mentioned by
Tom is not relevant, so the additional data will only bloat the index
but not otherwise affect the index performance.

Would this get close enough to index-covered table ? _That_ would be
interesting - I have a really big table (table/index size: 370G/320G,
~8*10^9 rows) which is basically using double space because it's primary
key is covering all columns of the table.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-06 Thread Bruce Momjian
On Fri, Jun 29, 2012 at 08:10:03AM +0200, Csaba Nagy wrote:
 Hi all,
 
  On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler da...@justatheory.com 
  wrote:
  I don't see the virtue of this in this case.  Since the index is not
  unique, why not just put the index on (a,b,c,d) and be done with it?
  Is there some advantage to be had in inventing a way to store c and d
  in the index without having them usable for indexing?
 
 Why not restrict it to UNIQUE indexes ?
 
 For not unique indexes it has no advantages (it could be in fact indexed
 on all columns anyway as an implementation detail).
 
 For the unique case the problem of many identical entries mentioned by
 Tom is not relevant, so the additional data will only bloat the index
 but not otherwise affect the index performance.
 
 Would this get close enough to index-covered table ? _That_ would be
 interesting - I have a really big table (table/index size: 370G/320G,
 ~8*10^9 rows) which is basically using double space because it's primary
 key is covering all columns of the table.

I was wondering if there was some way to specify an expression index
that did a unique index check on some columns but included more columns
not part of the unique index.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-06 Thread Cédric Villemain
Le vendredi 6 juillet 2012 15:41:01, Bruce Momjian a écrit :
 On Fri, Jun 29, 2012 at 08:10:03AM +0200, Csaba Nagy wrote:
  Hi all,
  
   On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler
   da...@justatheory.com wrote: I don't see the virtue of this in this
   case.  Since the index is not unique, why not just put the index on
   (a,b,c,d) and be done with it? Is there some advantage to be had in
   inventing a way to store c and d in the index without having them
   usable for indexing?
  
  Why not restrict it to UNIQUE indexes ?
  
  For not unique indexes it has no advantages (it could be in fact indexed
  on all columns anyway as an implementation detail).
  
  For the unique case the problem of many identical entries mentioned by
  Tom is not relevant, so the additional data will only bloat the index
  but not otherwise affect the index performance.
  
  Would this get close enough to index-covered table ? _That_ would be
  interesting - I have a really big table (table/index size: 370G/320G,
  ~8*10^9 rows) which is basically using double space because it's primary
  key is covering all columns of the table.
 
 I was wondering if there was some way to specify an expression index
 that did a unique index check on some columns but included more columns
 not part of the unique index.

I haven't tryed it, but I suppose that Exclusion Constraint should allow that.

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Covering Indexes

2012-07-06 Thread Tom Lane
Csaba Nagy ncsli...@googlemail.com writes:
 Why not restrict it to UNIQUE indexes ?

What benefit would such a restriction provide?  AFAICS it doesn't make
implementation any easier.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-06-30 Thread Thomas Munro
On 28 June 2012 14:02, Rob Wultsch wult...@gmail.com wrote:
 On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler da...@justatheory.com 
 wrote:
 I'm particularly intrigued by covering indexes. For example:

    CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

 IRC MS SQL also allow unindexed columns in the index.

For what it's worth, DB2 also has this feature, written roughly the
same way as MS SQL Server: CREATE INDEX cover1 ON table1(a, b) INCLUDE
(c, d).

http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.sql.ref.doc/doc/r919.html

Oracle doesn't seem to have this feature (and the SQL standard doesn't
mention indexes at all).

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-06-29 Thread Eric McKeeth
On Thu, Jun 28, 2012 at 7:02 AM, Rob Wultsch wult...@gmail.com wrote:
 On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler da...@justatheory.com 
 wrote:
 Hackers,

 Very interesting design document for SQLite 4:

  http://www.sqlite.org/src4/doc/trunk/www/design.wiki

 I'm particularly intrigued by covering indexes. For example:

    CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

 This allows the following query to do an index-only scan:

    SELECT c, d FROM table1 WHERE a=? AND b=?;

 Now that we have index-only scans in 9.2, I'm wondering if it would make 
 sense to add covering index support, too, where additional, unindexed 
 columns are stored alongside indexed columns.

 And I wonder if it would work well with expressions, too?

 David

 IRC MS SQL also allow unindexed columns in the index.

 --
 Rob Wultsch
 wult...@gmail.com

MS SQL Server does support including non-key columns in indexes,
allowing index only scans for queries returning these columns. Their
syntax is different from that proposed in the linked SQLite document.
To the best of my experience, the advantages in SQL Server to such an
index (as opposed to just adding the columns to the index normally)
are as follows:

1- You can include extra columns in a unique index which do not
participate in the uniqueness determination.
2- The non-key columns can be of types which normally cannot be
included in a b-tree index due to lack of proper sorting operators.
3- The resulting index is smaller, because the non-key columns are
only contained in leaf nodes, not in internal nodes.
4- The insert/update overhead is lower.

Of course, an implementation in a different database system, such as
Postgres, may or may not have the same set of benefits. Number 4 in
particular seems to be dependent on the details of the b-tree
implementation. In my mind numbers 1 and 2 are the most compelling
arguments in favor of this feature. Whether the it's worth the effort
of coding the feature would depend on how well the above benefits (or
any benefits I missed) hold true, and how useful such an index
actually proved for index only scans in Postgres.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-06-28 Thread Andreas Joseph Krogh

On 06/28/2012 02:16 PM, David E. Wheeler wrote:

Hackers,

Very interesting design document for SQLite 4:

   http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by covering indexes. For example:

 CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

This allows the following query to do an index-only scan:

 SELECT c, d FROM table1 WHERE a=? AND b=?;

Now that we have index-only scans in 9.2, I'm wondering if it would make sense 
to add covering index support, too, where additional, unindexed columns are 
stored alongside indexed columns.

And I wonder if it would work well with expressions, too?

David


This is analogous to SQL Server's include :

|CREATE NONCLUSTERED INDEX my_idx|
|ON my_table (status)|
|INCLUDE (someColumn, otherColumn)|

Which is useful, but bloats the index.

--
Andreas Joseph Kroghandr...@officenet.no  - mob: +47 909 56 963
Senior Software Developer / CEO - OfficeNet AS - http://www.officenet.no
Public key: http://home.officenet.no/~andreak/public_key.asc



Re: [HACKERS] Covering Indexes

2012-06-28 Thread Rob Wultsch
On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler da...@justatheory.com wrote:
 Hackers,

 Very interesting design document for SQLite 4:

  http://www.sqlite.org/src4/doc/trunk/www/design.wiki

 I'm particularly intrigued by covering indexes. For example:

    CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

 This allows the following query to do an index-only scan:

    SELECT c, d FROM table1 WHERE a=? AND b=?;

 Now that we have index-only scans in 9.2, I'm wondering if it would make 
 sense to add covering index support, too, where additional, unindexed columns 
 are stored alongside indexed columns.

 And I wonder if it would work well with expressions, too?

 David

IRC MS SQL also allow unindexed columns in the index.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-06-28 Thread Jeff Janes
On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler da...@justatheory.com wrote:
 Hackers,

 Very interesting design document for SQLite 4:

  http://www.sqlite.org/src4/doc/trunk/www/design.wiki

 I'm particularly intrigued by covering indexes. For example:

    CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

I don't see the virtue of this in this case.  Since the index is not
unique, why not just put the index on (a,b,c,d) and be done with it?
Is there some advantage to be had in inventing a way to store c and d
in the index without having them usable for indexing?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-06-28 Thread Andrew Dunstan



On 06/28/2012 11:37 AM, Jeff Janes wrote:

On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheelerda...@justatheory.com  wrote:

Hackers,

Very interesting design document for SQLite 4:

  http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by covering indexes. For example:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

I don't see the virtue of this in this case.  Since the index is not
unique, why not just put the index on (a,b,c,d) and be done with it?
Is there some advantage to be had in inventing a way to store c and d
in the index without having them usable for indexing?



Presumably the comparison function will be faster the fewer attributes 
it needs to compare.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-06-28 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 06/28/2012 11:37 AM, Jeff Janes wrote:
 On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheelerda...@justatheory.com  
 wrote:
 I'm particularly intrigued by covering indexes. For example:
 CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

 I don't see the virtue of this in this case.  Since the index is not
 unique, why not just put the index on (a,b,c,d) and be done with it?

 Presumably the comparison function will be faster the fewer attributes 
 it needs to compare.

Well, no, because queries will only be comparing the columns used in
the query.  Insertions would likely actually be faster with the extra
columns considered significant, since we've known for a long time that
btree doesn't especially like large numbers of identical index entries.

When this came up a couple weeks ago, the argument that was made for it
was that you could attach non-significant columns to an index that *is*
unique.  That might or might not be a wide enough use-case to justify
adding such a horrid kludge.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-06-28 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012:

 When this came up a couple weeks ago, the argument that was made for it
 was that you could attach non-significant columns to an index that *is*
 unique.  That might or might not be a wide enough use-case to justify
 adding such a horrid kludge.

The other question is whether such an index would prevent an update from
being HOT when the non-indexed values are touched.  That could be a
significant difference.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-06-28 Thread Aidan Van Dyk
On Thu, Jun 28, 2012 at 12:12 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 The other question is whether such an index would prevent an update from
 being HOT when the non-indexed values are touched.  That could be a
 significant difference.

I don't see Index-Only-Scans being something that will be used in
high churn tables.

So as long as the value of these covering/included fields is tied to
index-only scans, maybe it isn't a problem?

Of course, we have have a hard time convincing people that the  index
only scans they want can't be index only because heap pages aren't
all visible...

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-06-28 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 The other question is whether such an index would prevent an update from
 being HOT when the non-indexed values are touched.

Surely it would *have* to, whether the columns are significant or not
for uniqueness purposes.  Else an index-only scan gives the wrong value
after the update.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-06-28 Thread Jeff Janes
On Thu, Jun 28, 2012 at 9:12 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012:

 When this came up a couple weeks ago, the argument that was made for it
 was that you could attach non-significant columns to an index that *is*
 unique.  That might or might not be a wide enough use-case to justify
 adding such a horrid kludge.

 The other question is whether such an index would prevent an update from
 being HOT when the non-indexed values are touched.

That seems like an easy question to answer.  How could it not disable
HOT and still work correctly?

 That could be a
 significant difference.

True, adding the covering column would not always be a win.  But
surely it more likely to be a win when it can be done without adding
yet another index that also needs to be maintained.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers