Re: [HACKERS] Possible patch for better index name choosing

2009-12-22 Thread Tom Lane
I wrote:
 Attached is a WIP patch for addressing the problems mentioned in this
 thread:
 http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php
 ...
 There is one thing that is not terribly nice about the behavior, which
 is that CREATE TABLE LIKE INCLUDING INDEXES is unable to generate smart
 names for expression indexes;
 ...
 The reason for this is that the patch depends on FigureColname which
 works on untransformed parse trees, and we don't have access to such
 a tree when copying an existing index.  There seem to be three possible
 responses to that:
 ...
 3. Implement a separate FigureIndexColname function that works as much
 like FigureColname as it can, but takes a transformed parse tree.

I fooled around with this solution and decided that it is a lot messier
than it's worth.

In the first place, we can't make a FigureColname-like function that
just takes a transformed tree: there is no way to interpret Vars without
some context.  You need at least a table OID, and more than that if
you'd like to handle cases like multiple-relation expressions or
non-RELATION RTEs.  For the case at hand of index expressions, a table
OID would be enough, but that doesn't leave much room for imagining the
function could be used for anything else in future.  Worse, for the
problematic case (CREATE TABLE LIKE) we actually do not have a table OID
because the target table doesn't exist yet.  We could finesse that by
passing the source table's OID instead, but that seems pretty klugy
itself.

In the second place, the number of corner cases where we'd generate
output different from FigureColname is much greater than I realized.
As an example, if foo is a type name then foo(x) and x::foo produce
the same parsed tree, but FigureColname will treat them differently.

Seeing that CREATE TABLE LIKE doesn't try to reproduce the source table's
index names anyway, I'm inclined to just go with the patch as-is and not
try to make it handle this one case nicely.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible patch for better index name choosing

2009-12-21 Thread Peter Eisentraut
On mån, 2009-12-21 at 00:03 -0500, Tom Lane wrote:
 Well, we could tamp down the risks considerably if we undid my point
 (1), namely to still consider only the first index column when
 generating a name.

I think putting all the column names into the index names instead of
only the first is a significant improvement that should be kept.  If we
can't do it properly in some cases, we should punt in some obvious way,
not pretend to do the correct thing but actually omit some bits.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible patch for better index name choosing

2009-12-21 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On mån, 2009-12-21 at 00:03 -0500, Tom Lane wrote:
 Well, we could tamp down the risks considerably if we undid my point
 (1), namely to still consider only the first index column when
 generating a name.

 I think putting all the column names into the index names instead of
 only the first is a significant improvement that should be kept.

Yeah, I think so too.  It's well worth any risk of application
incompatibility --- we make much bigger changes in every major
release without blinking.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Possible patch for better index name choosing

2009-12-20 Thread Tom Lane
Attached is a WIP patch for addressing the problems mentioned in this
thread:
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php

The main things that it does are (1) consider all index columns, not
just the first one as formerly; and (2) try to generate a usable name
for index expression columns, rather than just ignoring them which was
the effective behavior formerly.

There are several changes in the regression test outputs, mostly as a
result of choice (1).  I've not bothered to update the expected files
yet but just attached the output diffs to show what happens.

There is one thing that is not terribly nice about the behavior, which
is that CREATE TABLE LIKE INCLUDING INDEXES is unable to generate smart
names for expression indexes; it falls back to expr, as for example
in

regression=# create table foo (f1 text, exclude (lower(f1) with =));
NOTICE:  CREATE TABLE / EXCLUDE will create implicit index 
foo_lower_exclusion for table foo
CREATE TABLE
regression=# create table foo2 (like foo including all);
NOTICE:  CREATE TABLE / EXCLUDE will create implicit index 
foo2_expr_exclusion for table foo2
CREATE TABLE

The reason for this is that the patch depends on FigureColname which
works on untransformed parse trees, and we don't have access to such
a tree when copying an existing index.  There seem to be three possible
responses to that:

1. Decide this isn't worth worrying about and use the patch as-is.

2. Change FigureColname to work on already-transformed expressions.
I don't care for this idea much, for two reasons.  First, FigureColname
would become significantly slower (eg, it would have to do catalog
lookups to resolve names of Vars, instead of just pulling the name out
of a ColumnRef), and this is objectionable considering it's part of
the required parsing path for even very simple commands.  Second, there
are various corner cases where we'd get different results, which would
likely break applications that are expecting specific result column
names from given queries.

3. Implement a separate FigureIndexColname function that works as much
like FigureColname as it can, but takes a transformed parse tree.
This fixes the LIKE case and also removes the need for the iexprname
field that the attached patch adds to IndexElem.  I think it largely
overcomes the two objections to idea #2, since an extra few lookups
during index creation are hardly a performance problem, and exact
application compatibility shouldn't be an issue here either.  It's
a bit ugly to have to keep two such functions in sync though.

I'm not real sure whether to go with the patch as-is or use idea #3.
It seems to depend on how annoyed you are by the LIKE behavior.

A different consideration is whether it's really a good idea to be
messing with default index names at all.  As illustrated in the attached
regression diffs, this does impact the error messages returned to
applications for unique-index failures.  I don't think this is a serious
problem across a major version update, but maybe someone thinks
differently.

Comments?

regards, tom lane

Index: src/backend/bootstrap/bootparse.y
===
RCS file: /cvsroot/pgsql/src/backend/bootstrap/bootparse.y,v
retrieving revision 1.101
diff -c -r1.101 bootparse.y
*** src/backend/bootstrap/bootparse.y	7 Dec 2009 05:22:21 -	1.101
--- src/backend/bootstrap/bootparse.y	21 Dec 2009 02:47:04 -
***
*** 322,328 
  {
  	IndexElem *n = makeNode(IndexElem);
  	n-name = $1;
! 	n-expr = NULL;
  	n-opclass = list_make1(makeString($2));
  	n-ordering = SORTBY_DEFAULT;
  	n-nulls_ordering = SORTBY_NULLS_DEFAULT;
--- 322,329 
  {
  	IndexElem *n = makeNode(IndexElem);
  	n-name = $1;
! 	n-iexpr = NULL;
! 	n-iexprname = NULL;
  	n-opclass = list_make1(makeString($2));
  	n-ordering = SORTBY_DEFAULT;
  	n-nulls_ordering = SORTBY_NULLS_DEFAULT;
Index: src/backend/catalog/index.c
===
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.326
diff -c -r1.326 index.c
*** src/backend/catalog/index.c	9 Dec 2009 21:57:50 -	1.326
--- src/backend/catalog/index.c	21 Dec 2009 02:47:04 -
***
*** 217,224 
  			indexpr_item = lnext(indexpr_item);
  
  			/*
! 			 * Make the attribute's name pg_expresssion_nnn (maybe think of
! 			 * something better later)
  			 */
  			sprintf(NameStr(to-attname), pg_expression_%d, i + 1);
  
--- 217,226 
  			indexpr_item = lnext(indexpr_item);
  
  			/*
! 			 * Make the attribute's name pg_expression_nnn (maybe think of
! 			 * something better later?  we're kind of locked in now, though,
! 			 * because pg_dump exposes these names when dumping comments on
! 			 * index columns)
  			 */
  			sprintf(NameStr(to-attname), pg_expression_%d, i + 1);
  
Index: src/backend/commands/indexcmds.c

Re: [HACKERS] Possible patch for better index name choosing

2009-12-20 Thread Robert Haas
On Sun, Dec 20, 2009 at 10:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Attached is a WIP patch for addressing the problems mentioned in this
 thread:
 http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php

 The main things that it does are (1) consider all index columns, not
 just the first one as formerly; and (2) try to generate a usable name
 for index expression columns, rather than just ignoring them which was
 the effective behavior formerly.

I'm not really sure there's any point to this.  Anyone who cares about
giving their index an intelligible name should manually assign one.
If they don't bother doing that, I don't really see why we should
worry about it either.  If anything, it seems like we should err on
the side of simplicity, since some users (or even applications) might
attempt to identify or predict automatically generated names.

 A different consideration is whether it's really a good idea to be
 messing with default index names at all.  As illustrated in the attached
 regression diffs, this does impact the error messages returned to
 applications for unique-index failures.  I don't think this is a serious
 problem across a major version update, but maybe someone thinks
 differently.

Maybe I'll reserve final judgement pending further discussion, but my
first reaction is to say it's not worth the risk.  Probably this
shouldn't be an issue for a well-designed application, but the world
is full of badly-written code.  We shouldn't throw up barriers (even
relatively trivial ones) to updating applications unless we get
something out of it, and I'm not convinced that's the case here.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible patch for better index name choosing

2009-12-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Dec 20, 2009 at 10:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Attached is a WIP patch for addressing the problems mentioned in this
 thread:
 http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php

 I'm not really sure there's any point to this.  Anyone who cares about
 giving their index an intelligible name should manually assign one.
 If they don't bother doing that, I don't really see why we should
 worry about it either.

Mainly because we historically *have* put some work into it, and it
would be inconsistent to not pay attention to the point now as we extend
the set of possible index-building constraints further.  In particular
we're going to see a lot of exclusion constraints named foo_exclusionN
if we don't expend any effort on it now.  I also claim that this is
necessary infrastructure if we are going to accept Peter's proposal of
allowing CREATE INDEX without an explicit index name.  That is really
dependent on the assumption that the system will expend more than no
effort on picking useful names.

 Maybe I'll reserve final judgement pending further discussion, but my
 first reaction is to say it's not worth the risk.  Probably this
 shouldn't be an issue for a well-designed application, but the world
 is full of badly-written code.  We shouldn't throw up barriers (even
 relatively trivial ones) to updating applications unless we get
 something out of it, and I'm not convinced that's the case here.

Well, we could tamp down the risks considerably if we undid my point
(1), namely to still consider only the first index column when
generating a name.  I am not really happy with that answer though.
I could turn your first point back on you: if an app is concerned about
the exact names assigned to indexes, why isn't it specifying them?

It's worth noting that pg_dump does preserve index names, so this isn't
going to be an issue in any case for existing apps that dump and reload
their databases.  AFAICS the only case where it would actually create a
compatibility issue is if an existing app creates multi-column UNIQUE
(non-PKEY) constraints on-the-fly, without a constraint name, and
depends on the generated name being the same as before.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible patch for better index name choosing

2009-12-20 Thread Robert Haas
On Mon, Dec 21, 2009 at 12:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Dec 20, 2009 at 10:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Attached is a WIP patch for addressing the problems mentioned in this
 thread:
 http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php

 I'm not really sure there's any point to this.  Anyone who cares about
 giving their index an intelligible name should manually assign one.
 If they don't bother doing that, I don't really see why we should
 worry about it either.

 Mainly because we historically *have* put some work into it, and it
 would be inconsistent to not pay attention to the point now as we extend
 the set of possible index-building constraints further.  In particular
 we're going to see a lot of exclusion constraints named foo_exclusionN
 if we don't expend any effort on it now.

Maybe that's worth fixing and maybe it isn't.  My first reaction is
so what?  In all likelihood, you're going to have to look at the
index definition to see what the thing does anyway.

 I also claim that this is
 necessary infrastructure if we are going to accept Peter's proposal of
 allowing CREATE INDEX without an explicit index name.  That is really
 dependent on the assumption that the system will expend more than no
 effort on picking useful names.

That's a point to consider, though perhaps if they aren't specifying a
name it means they don't care that much.

 Maybe I'll reserve final judgement pending further discussion, but my
 first reaction is to say it's not worth the risk.  Probably this
 shouldn't be an issue for a well-designed application, but the world
 is full of badly-written code.  We shouldn't throw up barriers (even
 relatively trivial ones) to updating applications unless we get
 something out of it, and I'm not convinced that's the case here.

 Well, we could tamp down the risks considerably if we undid my point
 (1), namely to still consider only the first index column when
 generating a name.  I am not really happy with that answer though.
 I could turn your first point back on you: if an app is concerned about
 the exact names assigned to indexes, why isn't it specifying them?

 It's worth noting that pg_dump does preserve index names, so this isn't
 going to be an issue in any case for existing apps that dump and reload
 their databases.  AFAICS the only case where it would actually create a
 compatibility issue is if an existing app creates multi-column UNIQUE
 (non-PKEY) constraints on-the-fly, without a constraint name, and
 depends on the generated name being the same as before.

Right.  Imagine, for example, a poorly written initialization script
for an app.  Existing instances that are dumped and reloaded will be
OK, but new instances might not come out as expected.

I don't think that what you're proposing here is completely stupid;
I'm just wondering if it's not an ultimately somewhat pointless
activity.  I'm not convinced that it's possible or sensible to try to
stringify all the things that people put in their index definitions,
or that we're going to be able to do it well enough to really add any
value.Perhaps I should RTFP before sticking my neck out too far,
but... will you serialize EXCLUDE (a =), EXCLUDE (a ), and EXCLUDE
(a some other operator) differently?  And if so, do you expect the
user to be able to reconstruct what the constraint is doing by looking
at the serialized version?  It seems like something reasonably sane
can be done when the definition uses mostly column names and
functions, but operators seem like more of a problem.  I think mostly
people are going to see the constraint name that got violated and then
run \d on the table and look for it.  foo_exclusion3 may not be very
informative, but it's easy to remember for long enough to find it in
the \d output, whereas something long and hairy may not be.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible patch for better index name choosing

2009-12-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Perhaps I should RTFP before sticking my neck out too far,
 but... will you serialize EXCLUDE (a =), EXCLUDE (a ), and EXCLUDE
 (a some other operator) differently?

No, and I'm not proposing to expose ASC/DESC/NULLS FIRST/LAST or
nondefault opclasses (to say nothing of non-btree AMs) or index
predicates either.  The proposed patch is to my mind just a logical
extension of what we have always done --- namely, to pay attention
to index column names --- to some new cases that were never exposed
before.

We could certainly make it pay attention to all that stuff, but I have
the same feeling you do that it wouldn't produce readable results.
And it would make any compatibility issues a lot worse.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers