Re: [HACKERS] extensible enum types

2010-06-23 Thread Bruce Momjian
Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
  Robert Haas wrote:
  I like the idea of being able to modify enums on the fly, but I'm
  skeptical of an implementation that won't always work.  Maybe it's
  still better than what we have now, but it seems grotty.
 
  I'd be perfectly happy to hear a reasonable alternative.
 
 Insert a sort order column into pg_enum, and rearrange the values in
 that whenever the user wants to add a new value in a particular place.
 You give up cheap comparisons in exchange for flexibility.  I think lots
 of people would accept that tradeoff, especially if they could make it
 per-datatype.
 
 One point here is that you'd have to restrict the rearrangements so that
 the relative sort order of existing values never changes, else you break
 (for example) indexes on columns of that type.

Sorry to be commenting late, but don't most people want to add to the
end or beginning of the enum list, rather than in the middle, and can't
we support that already?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-23 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Sorry to be commenting late, but don't most people want to add to the
 end or beginning of the enum list, rather than in the middle, and can't
 we support that already?

We could allow adding a value, but we couldn't guarantee where it would
appear in the type's sort ordering.  Depending on the current OID
counter it would usually show up either at the end or the beginning.
I think the general feeling is that this is too implementation-dependent
to be acceptable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-23 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Sorry to be commenting late, but don't most people want to add to the
  end or beginning of the enum list, rather than in the middle, and can't
  we support that already?
 
 We could allow adding a value, but we couldn't guarantee where it would
 appear in the type's sort ordering.  Depending on the current OID
 counter it would usually show up either at the end or the beginning.
 I think the general feeling is that this is too implementation-dependent
 to be acceptable.

Well, we don't need the enum value to map into the entire oid range. 
Can't we just add one to the top-most value and see if there is a
conflict?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-23 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Well, we don't need the enum value to map into the entire oid range. 
 Can't we just add one to the top-most value and see if there is a
 conflict?

If you don't use the OID counter to generate the new value, you're going
to have problems with race conditions.  There's also that small chance
that there is no free value before 2^32.

The bottom line here is not wanting a feature that usually works but
fails once in awhile on the basis of conditions the user can't control.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-23 Thread Andrew Dunstan



Tom Lane wrote:

Bruce Momjian br...@momjian.us writes:
  
Well, we don't need the enum value to map into the entire oid range. 
Can't we just add one to the top-most value and see if there is a

conflict?



If you don't use the OID counter to generate the new value, you're going
to have problems with race conditions.  There's also that small chance
that there is no free value before 2^32.

The bottom line here is not wanting a feature that usually works but
fails once in awhile on the basis of conditions the user can't control.

  


Yeah, what I'm now hoping to be able to do, following good suggestions 
from Tom, is to provide a way to get virtually no degradation in bulk 
comparison performance in the common case where any additions have been 
made at the end of the list with no oid wraparound, and acceptable 
performance otherwise, but provide for an ability to add new values at 
arbitrary places in the ordering, with no limit.


If we can do that why would we want less?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-21 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  
Another thought: could we add a column to pg_type with a flag that's 
true if the oids are in sort order? Then the comparison routines could 
just look that up in the type cache and if it's true (as it often will 
be) just return the oid comparison.



Well, having to do a cache lookup already makes it a couple orders of
magnitude more expensive than an OID comparison.  However, it's hard to
say how much that matters in terms of total application performance.
We really could do with a bit of performance testing here ...


  


I have done some. The performance hit is fairly horrible. Adding cache 
lookups for the enum rows to the comarison routines made a REINDEX on a 
1m row table where the index is on an enum column (the enum has 500 
randomly ordered labels) jump from around 10s to around 70s. I think 
that probably rules out doing anything like this for the existing enum  
types. I think the most we can reasonably do there is to allow adding a 
label to the end of the enum list. I'm fairly resistant to doing 
something which will have a major performance impact, as I know there 
are users who are relying on enums for performce reasons. I'm also 
fairly resistant to doing things which will require table rewriting.


So the question then is: do we want to allow lots of flexibility for 
positioning new labels with significant degradation in comparison 
performace for a new enum variant, or have a new variant with some 
restrictions which probably won't impact most users but would have 
equivalent performance to the current enum family, or do nothing?



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-21 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Tom Lane wrote:
 Well, having to do a cache lookup already makes it a couple orders of
 magnitude more expensive than an OID comparison.  However, it's hard to
 say how much that matters in terms of total application performance.
 We really could do with a bit of performance testing here ...

 I have done some. The performance hit is fairly horrible. Adding cache 
 lookups for the enum rows to the comarison routines made a REINDEX on a 
 1m row table where the index is on an enum column (the enum has 500 
 randomly ordered labels) jump from around 10s to around 70s.

Hmmm... that's bad, but I bet it's still less than the cost of comparing
NUMERICs.  Also, did you make any attempt to avoid repetitive cache
lookups by storing a pointer in fn_extra (cf array comparisons)?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-21 Thread Andrew Dunstan



Tom Lane wrote:
Adding cache 
lookups for the enum rows to the comarison routines made a REINDEX on a 
1m row table where the index is on an enum column (the enum has 500 
randomly ordered labels) jump from around 10s to around 70s.



Hmmm... that's bad, but I bet it's still less than the cost of comparing
NUMERICs.  Also, did you make any attempt to avoid repetitive cache
lookups by storing a pointer in fn_extra (cf array comparisons)?


  


No. Will work on that. Thanks.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-21 Thread Kevin Grittner
Peter Geoghegan peter.geoghega...@gmail.com wrote:
 
 In my experience, lookup tables generally have two columns, an
 integer PK and a description/state.
 
Eek.  If that's what you consider a lookup table, I wouldn't
advocate their use for anything.  Ever.  Period.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-21 Thread Simon Riggs
On Mon, 2010-06-21 at 12:04 -0500, Kevin Grittner wrote:
 Peter Geoghegan peter.geoghega...@gmail.com wrote:
  
  In my experience, lookup tables generally have two columns, an
  integer PK and a description/state.
  
 Eek.  If that's what you consider a lookup table, I wouldn't
 advocate their use for anything.  Ever.  Period.

Do you mean you don't use relational modelling, or do you mean you would
never implement your physical database that way because of the
performance impact of RI on PostgreSQL? Or?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-21 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  

Tom Lane wrote:


Well, having to do a cache lookup already makes it a couple orders of
magnitude more expensive than an OID comparison.  However, it's hard to
say how much that matters in terms of total application performance.
We really could do with a bit of performance testing here ...
  


  
I have done some. The performance hit is fairly horrible. Adding cache 
lookups for the enum rows to the comarison routines made a REINDEX on a 
1m row table where the index is on an enum column (the enum has 500 
regards, tom lane


randomly ordered labels) jump from around 10s to around 70s.



Hmmm... that's bad, but I bet it's still less than the cost of comparing
NUMERICs.  Also, did you make any attempt to avoid repetitive cache
lookups by storing a pointer in fn_extra (cf array comparisons)?


  


OK, making a bit of progress. Attached is a sort of proof of concept 
patch that does that. It stores a bsearchable list of {enum, sort_order} 
pairs in fn_extra, along with a flag that indicates if the oids are  in 
fact ordered. This flag, which would be maintained in and populated from 
pg_type, would allow avoidance of any significant performance penalty in 
such cases by relying on straight Oid comparison. We'd probably need to 
keep a count of labels in pg_type too so we could size the cache 
appropriately. This approach just about buys the best of both worlds. 
The execution time for the test mentioned above is down from around 70s 
to around 20s. I think for a worst case that's not too bad, especially 
when it is completely avoided unless we have perturbed the sort order.


If anyone wants to play along, my test set is available at 
http://developer.postgresql.org/~adunstan/enumtest.dmp It's about 8.5Mb.


cheers

andrew
Index: enum.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/adt/enum.c,v
retrieving revision 1.11
diff -c -r1.11 enum.c
*** enum.c	26 Feb 2010 02:01:08 -	1.11
--- enum.c	22 Jun 2010 02:16:48 -
***
*** 14,19 
--- 14,20 
  #include postgres.h
  
  #include catalog/pg_enum.h
+ #include catalog/pg_type.h
  #include fmgr.h
  #include utils/array.h
  #include utils/builtins.h
***
*** 22,27 
--- 23,52 
  #include libpq/pqformat.h
  #include miscadmin.h
  
+ typedef struct 
+ {
+ 	Oid  enum_oid;
+ 	uint32   sort_order;
+ } enum_sort;
+ 
+ typedef struct 
+ {
+ 	bool  oids_are_sorted;
+ 	int   sort_list_length;
+ 	enum_sort sort_order_list[1024];
+ } enum_sort_cache;
+ 	
+ 
+ static int 
+ enum_sort_cmp(void * es1, void * es2)
+ {
+ 	enum_sort *p1, *p2;
+ 	p1 = (enum_sort *)es1;
+ 	p2 = (enum_sort *)es2;
+ 	return p1-enum_oid - p2-enum_oid;
+ }
+ 
+ 
  
  static ArrayType *enum_range_internal(Oid enumtypoid, Oid lower, Oid upper);
  static int	enum_elem_cmp(const void *left, const void *right);
***
*** 155,167 
  
  /* Comparison functions and related */
  
  Datum
  enum_lt(PG_FUNCTION_ARGS)
  {
  	Oid			a = PG_GETARG_OID(0);
  	Oid			b = PG_GETARG_OID(1);
  
! 	PG_RETURN_BOOL(a  b);
  }
  
  Datum
--- 180,283 
  
  /* Comparison functions and related */
  
+ static inline int 
+ enum_ccmp(Oid arg1, Oid arg2, FunctionCallInfo fcinfo)
+ {
+ 
+ 	enum_sort_cache * mycache;
+ 	enum_sort *es1, *es2;
+ 	int sort1, sort2;
+ 	bool added = false;
+ 	HeapTuple	tup;
+ 	Form_pg_enum en;
+ 	Oid typeoid;
+ 	Form_pg_type typ;
+ 
+ 	if (arg1 == arg2)
+ 		return 0;
+ 
+ 	mycache = (enum_sort_cache *) fcinfo-flinfo-fn_extra;
+ 	if (mycache == NULL )
+ 	{
+ fcinfo-flinfo-fn_extra = MemoryContextAlloc(fcinfo-flinfo-fn_mcxt,
+   sizeof(enum_sort_cache));
+ mycache = (enum_sort_cache *) fcinfo-flinfo-fn_extra;
+ 		mycache-sort_list_length = 1;
+ 		tup = SearchSysCache1(ENUMOID, ObjectIdGetDatum(arg1));
+ 		en = (Form_pg_enum) GETSTRUCT(tup);
+ 		mycache-sort_order_list[0].enum_oid = arg1;
+ 		mycache-sort_order_list[0].sort_order = arg1;
+ 		typeoid = en-enumtypid;
+ 		ReleaseSysCache(tup);
+ 		tup = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typeoid));
+ 		typ = (Form_pg_type)  GETSTRUCT(tup);
+ 		if (typ-typtype != 'e')
+ 			elog(ERROR,wrong type for oid %u,typeoid);
+ 		/* XXX TODO fill in oids_are_sorted property from type tuple here */
+ 		mycache-oids_are_sorted = false;
+ 		ReleaseSysCache(tup);
+ 	}
+ 
+ 	if (mycache-oids_are_sorted)
+ 		return arg1 - arg2;
+ 
+ 	es1 = bsearch(arg1,mycache-sort_order_list,mycache-sort_list_length,
+   sizeof(enum_sort),enum_sort_cmp);
+ 	es2 = bsearch(arg2,mycache-sort_order_list,mycache-sort_list_length,
+   sizeof(enum_sort),enum_sort_cmp);
+ 
+ 	if (es1 == NULL)
+ 	{
+ 		
+ 		tup = SearchSysCache1(ENUMOID, ObjectIdGetDatum(arg1));
+ 		en = (Form_pg_enum) GETSTRUCT(tup);
+ 		mycache-sort_order_list[mycache-sort_list_length].enum_oid = arg1;
+ 		sort1 = 

Re: [HACKERS] extensible enum types

2010-06-20 Thread Peter Geoghegan
 Ahem. That is what a natural key is for :)

Well, they have their own drawbacks that don't make them particularly
appealing to use with lookup tables to ape enums. How many lookup
tables have you seen in the wild with a natural key?

People sometimes represent things like US states as enums. This is
probably a mistake, because you cannot control or predict if there'll
be a new US state, unlikely though that me be. You *can* control, for
example, what types of payment your application can deal with, and
you'll probably have to hardcode differences in dealing with each
inside your application, which makes enums a good choice. In my
earlier example, in addition to 'cash', there is a value for
payment_type of 'credit_card' . There is a separate column in the
payments table that references a credit_cards table, because credit
cards are considered transitory. A check constraint enforces that
credit_cards_id is null or not null as appropriate.

I don't like the idea of having values in a table that aren't so much
data as an integral part of your application/database. I think it's
wrong-headed. That's why I am not in favour of your enums as a lookup
table wrapper suggestion.

-- 
Regards,
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-20 Thread Kevin Grittner
Peter Geoghegan  wrote:
 
 How many lookup tables have you seen in the wild with a natural
 key?
 
Me?  Personally?  A few hundred.

 People sometimes represent things like US states as enums. This is
 probably a mistake, because you cannot control or predict if
 there'll be a new US state, unlikely though that me be.
 
More importantly, you're likely to need to associate properties with
the state.  Sales tax info, maybe a sales manager, etc.  A state
table can be a handy place to store things like that.
 
 I don't like the idea of having values in a table that aren't so
 much data as an integral part of your application/database.
 
Yep, exactly why natural keys should be used when possible.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-20 Thread Peter Geoghegan
 People sometimes represent things like US states as enums. This is
 probably a mistake, because you cannot control or predict if
 there'll be a new US state, unlikely though that me be.

 More importantly, you're likely to need to associate properties with
 the state.  Sales tax info, maybe a sales manager, etc.  A state
 table can be a handy place to store things like that.

That's probably true, but if there was any question of needing to
associate such values with US states, it ought to be perfectly obvious
to everyone that enums are totally inappropriate. If that wasn't the
case, then their use is only highly questionable, at least IMHO. What
you're describing isn't really a lookup table as I understand the
term. It's just a table. Lookup tables typically have things in them
like the various possible states of another table's tuples. In my
experience, lookup tables generally have two columns, an integer PK
and a description/state.

 I don't like the idea of having values in a table that aren't so
 much data as an integral part of your application/database.

 Yep, exactly why natural keys should be used when possible.

The not having to remember lookup value PK point I made was very
much ancillary to my main point. Ideally, if you restore a schema-only
dump of your database, you shouldn't be missing anything that is
schema. Things like the possible states of a table's tuples are often
schema, not data, and should be treated as such.

-- 
Regards,
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-19 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  

Tom Lane wrote:


Insert a sort order column into pg_enum, and rearrange the values in
that whenever the user wants to add a new value in a particular place.
You give up cheap comparisons in exchange for flexibility.  I think lots
of people would accept that tradeoff, especially if they could make it
per-datatype.
  


  

But I'm not happy about giving up cheap comparison.



I don't think it would be all that bad.  We could teach typcache.c to
cache the ordering data for any type that's in active use.  It'd
certainly be a lot more expensive than OID comparison, but perhaps not
worse than NUMERIC comparisons.

  
  


Another thought: could we add a column to pg_type with a flag that's 
true if the oids are in sort order? Then the comparison routines could 
just look that up in the type cache and if it's true (as it often will 
be) just return the oid comparison.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-19 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Another thought: could we add a column to pg_type with a flag that's 
 true if the oids are in sort order? Then the comparison routines could 
 just look that up in the type cache and if it's true (as it often will 
 be) just return the oid comparison.

Well, having to do a cache lookup already makes it a couple orders of
magnitude more expensive than an OID comparison.  However, it's hard to
say how much that matters in terms of total application performance.
We really could do with a bit of performance testing here ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-19 Thread Gurjeet Singh
On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan and...@dunslane.netwrote:



 David E. Wheeler wrote:

 On Jun 18, 2010, at 9:34 AM, Andrew Dunstan wrote:



 I'd be perfectly happy to hear a reasonable alternative. Assuming we use
 some integer representation, given two labels represented by n and n+1, we
 can't add a label between them without rewriting the tables that use the
 type, whether it's my representation scheme or some other. Maybe we could
 have a FORCE option which would rewrite if necessary.



 People would likely always use it.

 Why not use a decimal number?





 You are just bumping up the storage cost. Part of the attraction of enums
 is their efficiency.


Probably it'd be the same as the decimal suggestion above, but we can use
floating-point data type.

It will allow injection of a new label at any stage.

CREATE leads to

Label1 - 1.0
Label2 - 2.0
Label3 - 3.0

ALTER ... ADD Label4 AFTER Label2; leads to
Label1 - 1.0
Label2 - 2.0
Label4 - 2.5
Label3 - 3.0

ALTER ... ADD Label5 AFTER Label2; leads to
Label1 - 1.0
Label2 - 2.0
Label5 - 2.25
Label4 - 2.5
Label3 - 3.0

Since floating-point implementation is architecture dependent, the ALTER
command should check that the injected value does not equate to any value
around it (eg. comparisons of (2.5  == 2) and (2.25 == 2.5) should not yield
0); and if it does, then throw an error and ask the user to use the
rewrite-the-table version of the command.

And since it is still 32 bit, and comparisons done by machine, performance
should be acceptably close to current integer comparisons, and much faster
that the cache lookups etc. being proposed.

This is very similar to Andrew's original suggestion of splitting 32 bits
into 16+16, but managed by the machine hence no complicated comparison algos
needed on our part. Also, since this is all transparent to the SQL
interface, our dump-reload cycle or Slony replication, etc. should not be
affected either.

Regards,
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device


Re: [HACKERS] extensible enum types

2010-06-19 Thread Andrew Dunstan



Gurjeet Singh wrote:



This is very similar to Andrew's original suggestion of splitting 32 
bits into 16+16, but managed by the machine hence no complicated 
comparison algos needed on our part. Also, since this is all 
transparent to the SQL interface, our dump-reload cycle or Slony 
replication, etc. should not be affected either.





It would break the on-disk representation, though. That's not something 
we want to do any more if it can possibly be avoided. We want to keep 
pg_upgrade working.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-19 Thread Merlin Moncure
On Sat, Jun 19, 2010 at 4:55 PM, Andrew Dunstan and...@dunslane.net wrote:
 Gurjeet Singh wrote:


 This is very similar to Andrew's original suggestion of splitting 32 bits
 into 16+16, but managed by the machine hence no complicated comparison algos
 needed on our part. Also, since this is all transparent to the SQL
 interface, our dump-reload cycle or Slony replication, etc. should not be
 affected either.



 It would break the on-disk representation, though. That's not something we
 want to do any more if it can possibly be avoided. We want to keep
 pg_upgrade working.

I was partial to your original idea -- i thought it was quite clever
actually.  enums are a performance side of a tradeoff already so I
think any improvement for them should be looked at through that lens.

16 bits is IMO enough to pick a reasonable bucket size that gives you
enough play to handle the vast majority of cases that are appropriate
for enums.  your workaround in the rare case you actually hit the
limitations (most of these would fall under the 'oops, i used the
wrong tool' category) seems perfectly ok imo.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-19 Thread Simon Riggs
On Fri, 2010-06-18 at 11:50 -0400, Andrew Dunstan wrote:

 Thoughts?

enum types exist as an optimisation-by-avoidance of referential
integrity.

We're a relational database, so IMHO we should spend time performance
tuning RI. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-19 Thread Andrew Dunstan



Simon Riggs wrote:

On Fri, 2010-06-18 at 11:50 -0400, Andrew Dunstan wrote:

  

Thoughts?



enum types exist as an optimisation-by-avoidance of referential
integrity.

We're a relational database, so IMHO we should spend time performance
tuning RI. 

  


I don't accept your initial assertion at all. But in any case, these are 
not mutually exclusive. Your work tuning RI will not obstruct mine in 
making enums more useful, nor vice versa.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-19 Thread Peter Geoghegan
 Thoughts?

 enum types exist as an optimisation-by-avoidance of referential
 integrity.

 We're a relational database, so IMHO we should spend time performance
 tuning RI.

I take the view that they exist as a way of representing enumerations
of application/domain values - if it's hard coded in the application,
it's hard coded in the database by using an enum. This is why it isn't
that big a problem that they cannot be amended - they ought to be very
static, immutable values in the first place. I still think it would be
handy to be able to append new values though, and not have to ALTER
COLUMN USING to a new enum type. Besides, using enums in place of
lookup tables doesn't really make much sense as an optimisation.

It's very cool to be able to write queries like SELECT * FROM payments
WHERE payment_type = 'cash', rather than having to recall time and
again what the PK of cash is within your lookup table.

-- 
Regards,
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-19 Thread Joshua D. Drake
On Sun, 2010-06-20 at 03:42 +0100, Peter Geoghegan wrote:
  Thoughts?

 It's very cool to be able to write queries like SELECT * FROM payments
 WHERE payment_type = 'cash', rather than having to recall time and
 again what the PK of cash is within your lookup table.

Ahem. That is what a natural key is for :) 

Joshua D. Drake


 
 -- 
 Regards,
 Peter Geoghegan
 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Robert Haas
On Fri, Jun 18, 2010 at 11:50 AM, Andrew Dunstan and...@dunslane.net wrote:
 Then set the
 first value at  8 * p, then next at 9* p and so on. This is designed to
 allow more space to add labels at the beginning and end of the list, where
 this is more likely. Adding a label would be a matter of finding the labels
 adjacent to the position where we want to add the new label, and placing it
 half way between them, possibly with special rules for the list ends. If we
 want to add the label between two labels having values n and n+1 the
 addition would fail.

I like the idea of being able to modify enums on the fly, but I'm
skeptical of an implementation that won't always work.  Maybe it's
still better than what we have now, but it seems grotty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Andrew Dunstan



Robert Haas wrote:

On Fri, Jun 18, 2010 at 11:50 AM, Andrew Dunstan and...@dunslane.net wrote:
  

Then set the
first value at  8 * p, then next at 9* p and so on. This is designed to
allow more space to add labels at the beginning and end of the list, where
this is more likely. Adding a label would be a matter of finding the labels
adjacent to the position where we want to add the new label, and placing it
half way between them, possibly with special rules for the list ends. If we
want to add the label between two labels having values n and n+1 the
addition would fail.



I like the idea of being able to modify enums on the fly, but I'm
skeptical of an implementation that won't always work.  Maybe it's
still better than what we have now, but it seems grotty.

  


I'd be perfectly happy to hear a reasonable alternative. Assuming we use 
some integer representation, given two labels represented by n and n+1, 
we can't add a label between them without rewriting the tables that use 
the type, whether it's my representation scheme or some other. Maybe we 
could have a FORCE option which would rewrite if necessary.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread David E. Wheeler
On Jun 18, 2010, at 9:07 AM, Robert Haas wrote:

 Then set the
 first value at  8 * p, then next at 9* p and so on. This is designed to
 allow more space to add labels at the beginning and end of the list, where
 this is more likely. Adding a label would be a matter of finding the labels
 adjacent to the position where we want to add the new label, and placing it
 half way between them, possibly with special rules for the list ends. If we
 want to add the label between two labels having values n and n+1 the
 addition would fail.
 
 I like the idea of being able to modify enums on the fly, but I'm
 skeptical of an implementation that won't always work.  Maybe it's
 still better than what we have now, but it seems grotty.

Yes, other than that I fully endorse the idea. What's the likelihood of a 
failure? And would the position of the new label (represented by its internal 
number) be predictive? IOW, would updating the same varenumtype in two 
databases (or on two servers) yield the same underlying positional value?

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread David E. Wheeler
On Jun 18, 2010, at 9:34 AM, Andrew Dunstan wrote:

 I'd be perfectly happy to hear a reasonable alternative. Assuming we use some 
 integer representation, given two labels represented by n and n+1, we can't 
 add a label between them without rewriting the tables that use the type, 
 whether it's my representation scheme or some other. Maybe we could have a 
 FORCE option which would rewrite if necessary.

People would likely always use it.

Why not use a decimal number?

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Joshua D. Drake
On Fri, 2010-06-18 at 12:34 -0400, Andrew Dunstan wrote:
 
 Robert Haas wrote:
  On Fri, Jun 18, 2010 at 11:50 AM, Andrew Dunstan and...@dunslane.net 
  wrote:

  Then set the
  first value at  8 * p, then next at 9* p and so on. This is designed to
  allow more space to add labels at the beginning and end of the list, where
  this is more likely. Adding a label would be a matter of finding the labels
  adjacent to the position where we want to add the new label, and placing it
  half way between them, possibly with special rules for the list ends. If we
  want to add the label between two labels having values n and n+1 the
  addition would fail.
  
 
  I like the idea of being able to modify enums on the fly, but I'm
  skeptical of an implementation that won't always work.  Maybe it's
  still better than what we have now, but it seems grotty.
 

 
 I'd be perfectly happy to hear a reasonable alternative. Assuming we use 
 some integer representation, given two labels represented by n and n+1, 
 we can't add a label between them without rewriting the tables that use 
 the type, whether it's my representation scheme or some other. Maybe we 
 could have a FORCE option which would rewrite if necessary.

I apologize as this thread has already moved past the initial proposal.
I had mail problems so I didn't see any part of the thread in my client
until now.

My standard response to enums is, don't use them. Specifically because
you can't modify them. When I talk to developers and they see we have
enums they get excited and then I tell them you can't modify them and
they get very downtrodden. I tell them just to use a look up table.

Anyway, Andrew if you can find a reasonable way to make them modifiable,
I am all for it :)

On that note and yes this would be weird but have we considered some
kind of wrapper around just a lookup table? I mean what if there was a
system table that listed :

nspname, relation, column, value

the type enum would know to look in that table for valid values

Then we just add various syntactical sugar to manage the values in the
table via that specific enum:

ALTER TABLE foo ALTER COLUMN bar VALUES (bar,baz,bing,foo)

ALTER TABLE would know its an enum and would perform proper operations
on the system table.

Sincerely,

Joshua D. Drake


 
 cheers
 
 andrew
 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Robert Haas wrote:
 I like the idea of being able to modify enums on the fly, but I'm
 skeptical of an implementation that won't always work.  Maybe it's
 still better than what we have now, but it seems grotty.

 I'd be perfectly happy to hear a reasonable alternative.

Insert a sort order column into pg_enum, and rearrange the values in
that whenever the user wants to add a new value in a particular place.
You give up cheap comparisons in exchange for flexibility.  I think lots
of people would accept that tradeoff, especially if they could make it
per-datatype.

One point here is that you'd have to restrict the rearrangements so that
the relative sort order of existing values never changes, else you break
(for example) indexes on columns of that type.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Andrew Dunstan



David E. Wheeler wrote:
What's the likelihood of a failure? 


Constructing a failure case would be simple. In practice, I suspect it 
would be very low.



And would the position of the new label (represented by its internal number) be 
predictive? IOW, would updating the same varenumtype in two databases (or on 
two servers) yield the same underlying positional value?


  


The algorithm I outlined is deterministic, so the same sequence of 
operations on the type would yield the same set of values on the 
low-order 16 bits. But that doesn't mean they would have the same high 
order 16 bits. That would depend on the history of the system. But more 
importantly, why do you care? the stored value is an implementation 
artefact that should be totally invisible to users. There would be no 
guarantee of the same underlying values on dump/reload either, just as 
there is not now for enums.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Andrew Dunstan



David E. Wheeler wrote:

On Jun 18, 2010, at 9:34 AM, Andrew Dunstan wrote:

  

I'd be perfectly happy to hear a reasonable alternative. Assuming we use some 
integer representation, given two labels represented by n and n+1, we can't add 
a label between them without rewriting the tables that use the type, whether 
it's my representation scheme or some other. Maybe we could have a FORCE option 
which would rewrite if necessary.



People would likely always use it.

Why not use a decimal number?


  


You are just bumping up the storage cost. Part of the attraction of 
enums is their efficiency.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  

Robert Haas wrote:


I like the idea of being able to modify enums on the fly, but I'm
skeptical of an implementation that won't always work.  Maybe it's
still better than what we have now, but it seems grotty.
  


  

I'd be perfectly happy to hear a reasonable alternative.



Insert a sort order column into pg_enum, and rearrange the values in
that whenever the user wants to add a new value in a particular place.
You give up cheap comparisons in exchange for flexibility.  I think lots
of people would accept that tradeoff, especially if they could make it
per-datatype.

One point here is that you'd have to restrict the rearrangements so that
the relative sort order of existing values never changes, else you break
(for example) indexes on columns of that type.


  


Hmm. Yes, that could work. The assumption in my proposal was that 
existing values would not be reordered anyway.


But I'm not happy about giving up cheap comparison. And how would it be 
per data-type? That part isn't clear to me. Would we mark a given enum 
type as having its oids in order? It would also be sensible to quantify 
how much more expensive comparisons would become. If the sort order data 
were kept in the syscache the extra cost might get  very small.


What I actually like most about this suggestion is that we would be able 
to apply it cleanly to existing enum types without inventing anything 
much new.


cheers

cheers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Robert Haas
On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan and...@dunslane.net wrote:
 You are just bumping up the storage cost. Part of the attraction of enums is
 their efficiency.

What's efficient about them?  Aren't we using 4 bytes to store a value
that will nearly always fit in 2, if not 1?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Andrew Dunstan



Robert Haas wrote:

On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan and...@dunslane.net wrote:
  

You are just bumping up the storage cost. Part of the attraction of enums is
their efficiency.



What's efficient about them?  Aren't we using 4 bytes to store a value
that will nearly always fit in 2, if not 1?

  
This was debated when we implemented enums. As between 1,2 and 4 there 
is often not much to choose, as alignment padding makes it pretty much 
the same. But any of them are more efficient than storing a numeric 
value or the label itself.


Anyway, it might well be moot.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Robert Haas
On Fri, Jun 18, 2010 at 1:59 PM, Andrew Dunstan and...@dunslane.net wrote:
 This was debated when we implemented enums. As between 1,2 and 4 there is
 often not much to choose, as alignment padding makes it pretty much the
 same.  But any of them are more efficient than storing a numeric value or the
 label itself.

I was assuming the alternative was an integer, rather than a
numeric...  but yeah, a numeric or the label itself would definitely
be larger.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Greg Stark
On Fri, Jun 18, 2010 at 6:17 PM, Andrew Dunstan and...@dunslane.net wrote:


 Tom Lane wrote:

 Insert a sort order column into pg_enum, and rearrange the values in
 that whenever the user wants to add a new value in a particular place.

+1 I was going to say exactly the same thing.

 You give up cheap comparisons in exchange for flexibility.  I think lots
 of people would accept that tradeoff, especially if they could make it
 per-datatype.
 Hmm. Yes, that could work. The assumption in my proposal was that existing
 values would not be reordered anyway.

 But I'm not happy about giving up cheap comparison. And how would it be per
 data-type? That part isn't clear to me. Would we mark a given enum type as
 having its oids in order? It would also be sensible to quantify how much
 more expensive comparisons would become. If the sort order data were kept in
 the syscache the extra cost might get  very small.

I think you would need a syscache or something like it. My first
instinct was to load the whole enum value-sort order mapping into a
hash table the first time you're asked to compare two values in a
given type. Then your comparison operator amounts to look
up/initialize hash table for this enum type, look up both sort orders
in hash table, return comparison. You might need something like a
syscache for the hash tables so that you don't keep the hash tables
around forever.

Using a syscache for the individual sort values would be slower to
initially load if you're sorting a list since you would be doing a lot
of retail lookups of individual values. But then perhaps it's a cheap
way to protect against very large enums which using a hash table per
enum type would be fragile against.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Tom Lane wrote:
 Insert a sort order column into pg_enum, and rearrange the values in
 that whenever the user wants to add a new value in a particular place.
 You give up cheap comparisons in exchange for flexibility.  I think lots
 of people would accept that tradeoff, especially if they could make it
 per-datatype.

 But I'm not happy about giving up cheap comparison.

I don't think it would be all that bad.  We could teach typcache.c to
cache the ordering data for any type that's in active use.  It'd
certainly be a lot more expensive than OID comparison, but perhaps not
worse than NUMERIC comparisons.

 And how would it be per data-type?

Well, there'd be two kinds of enums, just as you were saying before.
I'm not sure how we'd expose that to users exactly, or whether there
could be provisions for switching a type's behavior after creation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Joseph Adams
On Fri, Jun 18, 2010 at 1:59 PM, Andrew Dunstan and...@dunslane.net wrote:


 Robert Haas wrote:

 On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan and...@dunslane.net
 wrote:


 You are just bumping up the storage cost. Part of the attraction of enums
 is
 their efficiency.


 What's efficient about them?  Aren't we using 4 bytes to store a value
 that will nearly always fit in 2, if not 1?



 This was debated when we implemented enums. As between 1,2 and 4 there is
 often not much to choose, as alignment padding makes it pretty much the
 same. But any of them are more efficient than storing a numeric value or the
 label itself.

 Anyway, it might well be moot.

 cheers

 andrew

Something I don't understand in all this is: why can't the type of an
enum be determined statically rather than stored in every single
value?  For instance, if we have:

CREATE TYPE number AS ENUM ('one', 'two', 'three');
CREATE TYPE color AS ENUM ('red', 'green', 'blue');

PostgreSQL won't allow a comparison between two different enum types, e.g.:

 SELECT 'one'::number = 'red'::color;
ERROR:  operator does not exist: number = color

However, when we say:

SELECT 'one'::number = 'one'::number

Couldn't enum_eq just use get_fn_expr_argtype to determine the type of
enum input rather than rely on it being stored in the value (either
implicitly via OID or explicitly as a word half)?

Also, I can't seem to find the original debates from when enums were
implemented.  Does anyone have a link to that thread in the archives?
Thanks.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Andrew Dunstan



Joseph Adams wrote:

Also, I can't seem to find the original debates from when enums were
implemented.  Does anyone have a link to that thread in the archives?
Thanks.
  


Start here 
http://archives.postgresql.org/pgsql-hackers/2006-08/msg00979.php



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Alvaro Herrera
Excerpts from Joseph Adams's message of vie jun 18 18:17:50 -0400 2010:

 Couldn't enum_eq just use get_fn_expr_argtype to determine the type of
 enum input rather than rely on it being stored in the value (either
 implicitly via OID or explicitly as a word half)?
 
 Also, I can't seem to find the original debates from when enums were
 implemented.  Does anyone have a link to that thread in the archives?

Probably around here
http://archives.postgresql.org/pgsql-patches/2006-12/msg00099.php

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Tom Lane
Joseph Adams joeyadams3.14...@gmail.com writes:
 Couldn't enum_eq just use get_fn_expr_argtype to determine the type of
 enum input rather than rely on it being stored in the value

No.  Support functions have to work in many contexts where there is no
side channel such as get_fn_expr_argtype.  What's more, it's very
difficult to provide a side channel without creating security holes.
We used to think it was OK for output functions to rely on a type OID
passed separately from the actual value, but that's insecure:
http://archives.postgresql.org/pgsql-hackers/2005-04/msg00998.php

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible enum types

2010-06-18 Thread Andrew Dunstan



Tom Lane wrote:

And how would it be per data-type?



Well, there'd be two kinds of enums, just as you were saying before.
I'm not sure how we'd expose that to users exactly, or whether there
could be provisions for switching a type's behavior after creation.


  


I'd be a whole lot happier if we didn't have to do that. I've been 
trying to wrack my brain for some clever way to avoid it (so far without 
success).


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers