Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-07-09 Thread Martijn van Oosterhout
On Sat, Jul 08, 2006 at 05:54:26PM -0400, Jim Nasby wrote:
 +1. If there's enough user demand we can look at adding the type to  
 core (I don't see any real advantage to contrib over pgFoundry for  
 this). I'm not sure if it makes sense to add a generic 16 byte RAW to  
 core, either. I'd *much* rather see effort expended on a generic RAW  
 type which had it's size defined as part of the type and didn't use  
 varlena.

You could place a nice wrapper around type generators, which would let
you say:

DECLARE TYPE RAW(16);

After which point you could use that type in function declarations and
such. It would create an OID for that type would could be used as
normal.

I think that trying to get the backend to pay more attention to typmods
is not going to be successful. Simply because functions and operators
have an affect on the typmod and once you start relying on typmods to
decode a tuple, you've got a real problem.

As an example, what do you get when you concatenate two CHAR(4)'s? Do
you get another CHAR(4) or is it a CHAR(8)? How does the backend know?
You'd have to accompany each function with another function just to
tell you how the typmods would be related.

The only way out I can think of is that RAW(n) is merely a sort of
template and RAW(x) cannot be used in a place where RAW(y) is expected
(if xy). Hence it makes sense to have a template that people can
instantiate instances of and let the rest of the system treat them as
new types, unrelated to anything else.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-07-08 Thread Jim Nasby

On Jul 6, 2006, at 4:02 PM, Thomas Hallgren wrote:
In answer to your question, though my opinion carries no special  
weight at
all, I would suggest adding a bare bones 16-byte data type to core  
and a
second binary-compatible data type based on it that parsed/output  
as uuids.

The extended uuid libraries should only go in pgfoundry/contrib.

I second that.


+1. If there's enough user demand we can look at adding the type to  
core (I don't see any real advantage to contrib over pgFoundry for  
this). I'm not sure if it makes sense to add a generic 16 byte RAW to  
core, either. I'd *much* rather see effort expended on a generic RAW  
type which had it's size defined as part of the type and didn't use  
varlena.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-07-08 Thread Joshua D. Drake
On Saturday 08 July 2006 14:54, Jim Nasby wrote:
 On Jul 6, 2006, at 4:02 PM, Thomas Hallgren wrote:
  In answer to your question, though my opinion carries no special
  weight at
  all, I would suggest adding a bare bones 16-byte data type to core
  and a
  second binary-compatible data type based on it that parsed/output
  as uuids.
  The extended uuid libraries should only go in pgfoundry/contrib.
 
  I second that.

 +1. If there's enough user demand we can look at adding the type to
 core (I don't see any real advantage to contrib over pgFoundry for
 this).

The advantage of contrib over pgFoundry is that it will be packaged by the 
major distributions. Every distribution includes a package of the contrib
modules.

Sincerely,

Joshua D. Drake


-- 
   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-07-06 Thread mark
On Thu, Jun 29, 2006 at 06:47:17PM +0200, Martijn van Oosterhout wrote:
 It seems to me that maybe the backend should include a 16-byte fixed
 length object (after all, we've got 1, 2, 4 and 8 bytes already) and
 then people can use that to build whatever they like, using domains,
 for example...

So... Back to this.

It won't happen unless somebody does it - and I realize that people
are busy with their own projects, so unless somebody more willing and
better suited will step up, I'm going to take a stab at getting
advanced consensus.

Please answer the below questions, and state whether your opinion is
just an opinion, or whether you are stating it as a PostgreSQL
maintainer and it is law. If you wish, you can rank preferences.

1) The added 128-bit type should take the form of:

a) UUID, with all functions
b) UUID, with only basic generation functions + encode/decode/indexable
c) UUID, with only encode/decode/indexable - generic except for the
   name of the type, and the encoding format.
d) Generic 128-bit type - same as c) except may not encode or decode
   as UUID (dashes). Either a large number (hex string?), or binary data.
e) Generic n-byte binary data type generator. Not sure of feasibility
   of this at this point. See thread.

2) According to your answer in 1), the added 128-bit type should be:

a) In core first.
b) In contrib first.
c) In pgfoundry first.


Thanks,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-07-06 Thread Martijn van Oosterhout
On Thu, Jul 06, 2006 at 12:12:18PM -0400, [EMAIL PROTECTED] wrote:
 Please answer the below questions, and state whether your opinion is
 just an opinion, or whether you are stating it as a PostgreSQL
 maintainer and it is law. If you wish, you can rank preferences.

Do I have to pick only one? I'd choose firstly for:

1c) UUID, with only encode/decode/indexable - generic except for the
 name of the type, and the encoding format.
2a) In core first

And in addation to that:

1b) UUID, with only basic generation functions +
 encode/decode/indexable
2b) In contrib first.

And maybe finally:

1a)  UUID, with all functions
2c)  In pgfoundry first.

IOW, I'm not so convinced that full UUID support should appear in core,
but I think a 16-byte type should be available in core, with basic UUID
functions in contrib and the full suite on pgfoundry.

But that's just my opinion ofcourse.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-07-06 Thread Jochem van Dieten

On 7/6/06, [EMAIL PROTECTED] wrote:


Please answer the below questions, and state whether your opinion is
just an opinion, or whether you are stating it as a PostgreSQL
maintainer and it is law. If you wish, you can rank preferences.

1) The added 128-bit type should take the form of:

a) UUID, with all functions



2) According to your answer in 1), the added 128-bit type should be:

a) In core first.


Opinion, 1 a, 2 a

Jochem

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-07-06 Thread Greg Stark
[EMAIL PROTECTED] writes:

 e) Generic n-byte binary data type generator. Not sure of feasibility
of this at this point. See thread.

I don't like the idea of a generator that would have to be manually invoked,
though such a thing would be a fine tool for contrib or pgfoundry, I think it
would never be a clean enough interface for core.

On the other hand core could conceivably translate things like char(n) into
such a type generated on the fly. That is, instead of having a single char oid
it could check a cache of fixed length char(n) data types and if there isn't
one already generate one on the fly. That would be somewhat grotty of an
implementation but the user interface at least would be entirely transparent.
If one day we change things to pass around typmod database designs wouldn't
have to change at all.

(Actually Postgres can never do this for char(n), at least not as long as we
insist on making char/varchar/text locale-aware. Personally I think the
default char/varchar/text locale should be C unless you specify otherwise on a
per-column basis. But that seems to be a minority opinion. Postgres could
however do this for separate raw binary datatypes like bit(n) or bytea(n).)

In answer to your question, though my opinion carries no special weight at
all, I would suggest adding a bare bones 16-byte data type to core and a
second binary-compatible data type based on it that parsed/output as uuids.
The extended uuid libraries should only go in pgfoundry/contrib.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-07-06 Thread Sander Steffann

Hi,

Just MHO:


1) The added 128-bit type should take the form of:

   c) UUID, with only encode/decode/indexable - generic except for the
  name of the type, and the encoding format.

2) According to your answer in 1), the added 128-bit type should be:

   a) In core first.


1c is what I would need. 1b or 1a would be nice to have.

- Sander



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-07-06 Thread Thomas Hallgren

Greg Stark wrote:


In answer to your question, though my opinion carries no special weight at
all, I would suggest adding a bare bones 16-byte data type to core and a
second binary-compatible data type based on it that parsed/output as uuids.
The extended uuid libraries should only go in pgfoundry/contrib.


I second that.

Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-29 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 Greg Stark [EMAIL PROTECTED] writes:
  Hm, so it could be stored on disk without the length header as long as
  the length header is added to the in-memory representation? I don't
  think the type system has hooks for reading and storing data to disk
  though.
 
 No, it doesn't, and we'd pay a nonzero price for allowing that.
 Currently the executor doesn't have to care (much) about whether a
 tuple is on-disk or in-memory --- the individual datums look the same
 either way.  Allowing them to be different would force a lot of
 format conversion steps that currently need not happen.

Is there ever a case where an entire tuple is passed around without knowing
the typmod of an attribute in the tuple?

The conversion would only really have to happen when the attribute is fetched
or stored, not when the tuple is being passed around wholesale. But I have a
feeling that would be more intrusive than just making the entire system typmod
aware.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-29 Thread Martijn van Oosterhout
On Thu, Jun 29, 2006 at 02:40:15AM -0400, Greg Stark wrote:
  Greg Stark [EMAIL PROTECTED] writes:
  No, it doesn't, and we'd pay a nonzero price for allowing that.
  Currently the executor doesn't have to care (much) about whether a
  tuple is on-disk or in-memory --- the individual datums look the same
  either way.  Allowing them to be different would force a lot of
  format conversion steps that currently need not happen.
 
 Is there ever a case where an entire tuple is passed around without knowing
 the typmod of an attribute in the tuple?

A tuple is just an array of datums, with some header information. The
problems come when you don't have a tuple anymore, but only the datum,
like in arguments for functions.

I think it's more a case that most places that deal with datums simply
don't know about typmods. For example, the return type of a function
can only be char, not char(16). If you consider the case of a function
returning a RAW, the caller will have no way of knowing the typmod,
they do know the type though.

To be honest, it seems like a lot of work to save the four bytes of
overhead for the varlena structure on disk if you're going to need it
in memory anyway. And anything like RAW(16) which people want for
UUIDs, if it's going to have a lot of functions associated with it, may
as well just be a new type. 

I think time would be much better spent finding a way of allowing
user-defined types to be created without using C functions.

 The conversion would only really have to happen when the attribute is fetched
 or stored, not when the tuple is being passed around wholesale. But I have a
 feeling that would be more intrusive than just making the entire system typmod
 aware.

I'm not sure if tuples are ever passed wholesale very far. The first
node to actually do anything with it (any join, expression or condition
test) is going to need to deconstruct it. Consider where we currently we
have a Filter Cond on a Seq Scan. Currently the filter can access
the datums directly on the disk page, with what you're proposing, it
can't.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-29 Thread Greg Stark

Martijn van Oosterhout kleptog@svana.org writes:

 A tuple is just an array of datums, with some header information. The
 problems come when you don't have a tuple anymore, but only the datum,
 like in arguments for functions.
 
 I think it's more a case that most places that deal with datums simply
 don't know about typmods. For example, the return type of a function
 can only be char, not char(16). If you consider the case of a function
 returning a RAW, the caller will have no way of knowing the typmod,
 they do know the type though.
 
 To be honest, it seems like a lot of work to save the four bytes of
 overhead for the varlena structure on disk if you're going to need it
 in memory anyway. And anything like RAW(16) which people want for
 UUIDs, if it's going to have a lot of functions associated with it, may
 as well just be a new type. 

For large databases storage density leads directly to speed. Saving four bytes
of overhead on a 16-byte data structure would mean a 20% speed increase. Even
if that's only helpful on a tenth of the columns you're still talking about a
2% speed increase for all queries on the table. A lot of databases use CHAR(1)
for flags. The overhead is even worse there.

 Consider where we currently we have a Filter Cond on a Seq Scan.
 Currently the filter can access the datums directly on the disk page, with
 what you're proposing, it can't.

Well it only can't if the data type has conversion functions. I'm not sure how
complex it would be having pointers that *getattr sometimes return pointers to
the disk page and sometimes return pointers to a palloced copy though.



-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-29 Thread Thomas Hallgren

Greg Stark wrote:

Martijn van Oosterhout kleptog@svana.org writes:

  

To be honest, it seems like a lot of work to save the four bytes of
overhead for the varlena structure on disk if you're going to need it
in memory anyway. And anything like RAW(16) which people want for
UUIDs, if it's going to have a lot of functions associated with it, may
as well just be a new type. 



For large databases storage density leads directly to speed. Saving four bytes
of overhead on a 16-byte data structure would mean a 20% speed increase. Even
if that's only helpful on a tenth of the columns you're still talking about a
2% speed increase for all queries on the table. A lot of databases use CHAR(1)
for flags. The overhead is even worse there.

  
I have to concur with this. Assume you use a bytea for a UUID that in 
turn is used as a primary key. The extra overhead will be reflected in 
all indexes, all foreign keys, etc. In a normalized database some tables 
may consist of UUID columns only.


Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-29 Thread Martijn van Oosterhout
On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
 I have to concur with this. Assume you use a bytea for a UUID that in 
 turn is used as a primary key. The extra overhead will be reflected in 
 all indexes, all foreign keys, etc. In a normalized database some tables 
 may consist of UUID columns only.

So you create a UUID type. It's cheap enough to create new types after
all, that's one of postgresql's strengths. What I'm saying is that it's
easier to create new fixed length types for the cases that need it,
than it is to redo the entire type handling of the backend.

And for people that want char(1), they should be using char, which
really is one byte (ex padding ofcourse).

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-29 Thread Greg Stark

Martijn van Oosterhout kleptog@svana.org writes:

 On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
  I have to concur with this. Assume you use a bytea for a UUID that in 
  turn is used as a primary key. The extra overhead will be reflected in 
  all indexes, all foreign keys, etc. In a normalized database some tables 
  may consist of UUID columns only.
 
 So you create a UUID type. It's cheap enough to create new types after
 all, that's one of postgresql's strengths. What I'm saying is that it's
 easier to create new fixed length types for the cases that need it,
 than it is to redo the entire type handling of the backend.

I guess my motivation here is that I feel currently char(n) is basically
broken in Postgres. Sure it satisfies the letter of the specification, but
it's failing to actually achieve anything for the users. There's no point at
all in using char(n) in Postgres since it takes exactly the same amount of
space as varchar() if you're always stuffing it full and more space if you're
not.

In the current setup the only reason for Postgres to have this data type at
all is purely for legacy compatibility. It doesn't actually work in that it
doesn't provide the space savings it's intended to and that would give users
an actual reason to use it in new databases.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-29 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 In the current setup the only reason for Postgres to have this data type at
 all is purely for legacy compatibility.

Yes.  So?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-29 Thread Thomas Hallgren

Martijn van Oosterhout wrote:

On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
  
I have to concur with this. Assume you use a bytea for a UUID that in 
turn is used as a primary key. The extra overhead will be reflected in 
all indexes, all foreign keys, etc. In a normalized database some tables 
may consist of UUID columns only.



So you create a UUID type. It's cheap enough to create new types after
all, that's one of postgresql's strengths.

It would be a whole lot easier if I could use a domain.


 What I'm saying is that it's
easier to create new fixed length types for the cases that need it,
than it is to redo the entire type handling of the backend.

  
Of course. But it's a matter of who does what. Your reasoning push the 
burden to the users.


Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-29 Thread Martijn van Oosterhout
On Thu, Jun 29, 2006 at 06:40:13PM +0200, Thomas Hallgren wrote:
 Martijn van Oosterhout wrote:
 On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
   
 I have to concur with this. Assume you use a bytea for a UUID that in 
 turn is used as a primary key. The extra overhead will be reflected in 
 all indexes, all foreign keys, etc. In a normalized database some tables 
 may consist of UUID columns only.
 
 
 So you create a UUID type. It's cheap enough to create new types after
 all, that's one of postgresql's strengths.
 It would be a whole lot easier if I could use a domain.

It seems to me that maybe the backend should include a 16-byte fixed
length object (after all, we've got 1, 2, 4 and 8 bytes already) and
then people can use that to build whatever they like, using domains,
for example...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-28 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 Martijn van Oosterhout kleptog@svana.org writes:
  The input functions get it, the output functions (bpcharout,
  bpcharsend, etc) don't. Which makes it kind of hard to print a raw
  value if you don't know how long it's going to be. They used to, but
  that was removed some time back.

 Even back then you couldn't rely on the typmod value to be supplied;
 it was quite likely to be passed as -1.  The issue is not actually
 with on-disk storage, it is with function/operator arguments and
 results.  Those have never been identified any more closely than by
 giving a type OID.  So for any value that came from a function,
 you won't have a typmod, and you'd better be able to find out all
 you need to know just by inspecting the value itself.  Hence, length
 words.

Hm, so it could be stored on disk without the length header as long as the
length header is added to the in-memory representation? I don't think the type
system has hooks for reading and storing data to disk though.

 This is all pretty off-topic for pgsql-general, isn't it?

[moved to -hackers]

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-28 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Hm, so it could be stored on disk without the length header as long as
 the length header is added to the in-memory representation? I don't
 think the type system has hooks for reading and storing data to disk
 though.

No, it doesn't, and we'd pay a nonzero price for allowing that.
Currently the executor doesn't have to care (much) about whether a
tuple is on-disk or in-memory --- the individual datums look the same
either way.  Allowing them to be different would force a lot of
format conversion steps that currently need not happen.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly