subject:"\[HACKERS\] patch\: Allow the UUID type to accept non\-standard formats"

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-11-03 Thread Peter Eisentraut


Robert Haas wrote:

The attached patch allows uuid_in() to parse a wider variety of
variant input formats for the UUID data type, per the TODO named in
the subject line.


I have committed your patch.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-14 Thread Peter Eisentraut


Dawid Kuroczko wrote:

  2) The '-' is not the only character that people have used. ClearCase uses
'.' and ':' as punctuation.


I would be more in favor of accepting MAC-address style notation AA:BB:CC:DD
also, in that case, but I think its going too far...  So, I am for sticking with
dashes and groups of four :)


Well, speaking of MAC addresses, we already accept a finite set of 
non-standard MAC address formats, so doing something similar with UUID 
should be OK.


I recently figured out that the AA:BB:CC:DD... format for MAC addresses 
is not exactly standard either, so sticking with the standard doesn't 
always help in practice.


We only accept those MAC address formats that we find in practice, 
however, not any combination of digits and delimiters, and I think UUID 
should do the same as well.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-13 Thread Decibel!


On Oct 10, 2008, at 3:40 PM, Robert Haas wrote:

I dislike all own creatures - because nobody will understand so do
some wrong thing - using non standard formats is bad thing. So  
it's is

necessary, then who need it then he found it on pgfoundry. But why
smudge core?


I'm opposed to smudging core, but I'm in favor of this patch.  :-)

Of course, I'm biased, because I wrote it.  But I think that providing
input and output functions that make it easy to read and write common
formats, even if they happen to be non-standard, is useful.



I tend to agree, but I have a hard time swallowing that when it means  
a 2-3% performance penalty for those that aren't using that  
functionality. I could perhaps see adding a function that accepted  
common UUID formats and spit out the standard.


If you could get rid of the performance hit this might be more  
interesting. Perhaps default to assuming a good format and only fail  
back to something else if that doesn't work?

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Robert Haas

> I dislike all own creatures - because nobody will understand so do
> some wrong thing - using non standard formats is bad thing. So it's is
> necessary, then who need it then he found it on pgfoundry. But why
> smudge core?

I'm opposed to smudging core, but I'm in favor of this patch.  :-)

Of course, I'm biased, because I wrote it.  But I think that providing
input and output functions that make it easy to read and write common
formats, even if they happen to be non-standard, is useful.  One
shouldn't go overboard, of course, but the range and variety of ways
that we can format some other datatypes (like date and timestamp) is
vastly greater and includes all sorts of things that are not only
non-standard but flagrantly unreasonable, like to_char(now(),
'MMDD-YY-HH').  This change on the other hand is merely window
dressing, but if it saves someone having to do a trivial format
conversion to complete their data load, I think that's beneficial.

*shrug*

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Pavel Stehule

2008/10/10 Robert Haas <[EMAIL PROTECTED]>:
>> Is it problem do for non standard UUID formats pgfoundry project?
>
> I'm not volunteering set up a pgfoundry project to maintain something
> that can be accomplished with a patch that adds 19 lines of new code
> (and removes 9).  This functionality is useful in core because it will
> Just Work.  If you have to grope through pgfoundry to find it, you
> might as well write your own 19 lines of code.
>

I dislike all own creatures - because nobody will understand so do
some wrong thing - using non standard formats is bad thing. So it's is
necessary, then who need it then he found it on pgfoundry. But why
smudge core?

Pavel

> ...Robert
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Robert Haas

On Fri, Oct 10, 2008 at 3:48 PM, Grzegorz Jaskiewicz
<[EMAIL PROTECTED]> wrote:
> I think it will be as expensive to app to convert UUID to standard format,
> as it would be too postgrsql.
> But if psql does it - everyone would expect it to do it right. You can't
> possibly detect all forms of screwed up design, and expect application to
> pick it up.
> All I say, is I think it would be better to be conservative in this case.
> And funny enough, I only replied to that message - because I know something
> about trying to compensate for non standard types, and I've seen that
> discussion quite few times around here in the past.
>
> Just a friendly opinion.

I don't really think this is worth arguing about.  I'm not trying to
detect all forms of screwed up design - I'm trying to detect minor
variants of the standard format that are commonly used by third party
applications.  If people don't like the patch, just move the item to
the "not wanted" section of the TODO list and let's move onto the next
thing.  Personally, I think it's useful and harmless, but everyone is
entitled to their own opinion and it's certainly not worth anyone,
including me, popping a cork over.

I apologize if I gave the contrary impression.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Robert Haas

> Is it problem do for non standard UUID formats pgfoundry project?

I'm not volunteering set up a pgfoundry project to maintain something
that can be accomplished with a patch that adds 19 lines of new code
(and removes 9).  This functionality is useful in core because it will
Just Work.  If you have to grope through pgfoundry to find it, you
might as well write your own 19 lines of code.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Grzegorz Jaskiewicz



I think it will be as expensive to app to convert UUID to standard  
format, as it would be too postgrsql.
But if psql does it - everyone would expect it to do it right. You  
can't possibly detect all forms of screwed up design, and expect  
application to pick it up.
All I say, is I think it would be better to be conservative in this  
case. And funny enough, I only replied to that message - because I  
know something about trying to compensate for non standard types, and  
I've seen that discussion quite few times around here in the past.


Just a friendly opinion.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Robert Haas

> that only depends on definition of 'common variant'. Will it be just code
> that will accept letters and digits, and trying to make that into UUID ?

You are attacking a straw man.  No one is proposing that.

> I think those who designed their code to produce or accept non standard
> UUID, should work around problems they created in first place.

We're talking about compatibility with widely-used third-party
products, not home brew.  If Coldfusion or Xen whatever other product
uses a non-standard UUID format, we can choose to interoperate with it
gracefully or we can be pedantic and throw an error message.  But I
doubt that Coldfusion is going to change their UUID format just
because PostgreSQL chooses to kick out a syntax error.

> Otherwise, accepting non standard forms of UUIDs is going to be just a first
> step towards making the database produce non standard forms.

Then you can argue against it when someone proposes a patch that does
that.  This one doesn't.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Pavel Stehule

2008/10/10 Tom Lane <[EMAIL PROTECTED]>:
> Mark Mielke <[EMAIL PROTECTED]> writes:
>> Anyways - I only somewhat disagree. I remember the original discussions,
>> and I remember agreeing with the points to keep PostgreSQL UUID support
>> thin and rigid. It's valuable for it to be built-in to the database.
>> It's not necessarily valuable for PostgreSQL to support every UUID
>> version or every format. Supporting additional formats is the direction
>> of supporting every UUID format. Three months from now, somebody is
>> going to propose allowing '-' or ':'. What should the answer be then?
>
> Well, this discussion started with the conventional wisdom about "be
> conservative in what you send and liberal in what you accept".  I'd
> still resist emitting any UUID format other than the RFC-approved one,
> but I don't see anything very wrong in being able to read common
> variants.

Is it problem do for non standard UUID formats pgfoundry project?

Regards
Pavel Stehule
>
>regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Grzegorz Jaskiewicz



On 2008-10-10, at 16:01, Tom Lane wrote:


Well, this discussion started with the conventional wisdom about "be
conservative in what you send and liberal in what you accept".  I'd
still resist emitting any UUID format other than the RFC-approved one,
but I don't see anything very wrong in being able to read common
variants.


that only depends on definition of 'common variant'. Will it be just  
code that will accept letters and digits, and trying to make that into  
UUID ?
I think those who designed their code to produce or accept non  
standard UUID, should work around problems they created in first place.
Otherwise, accepting non standard forms of UUIDs is going to be just a  
first step towards making the database produce non standard forms.


It should be easy and beneficial for someone to fix their own code  
into using standard RFC-approved forms of data.
Next you'll get people asking for varchar speedups, because they would  
use varchar to hold data instead of int, or other appropriate format.
My point is, database shouldn't compensate for bad design decisions in  
client's software.


Just my humble 2 pennies.

--
GJ

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Tom Lane

Mark Mielke <[EMAIL PROTECTED]> writes:
> Anyways - I only somewhat disagree. I remember the original discussions, 
> and I remember agreeing with the points to keep PostgreSQL UUID support 
> thin and rigid. It's valuable for it to be built-in to the database. 
> It's not necessarily valuable for PostgreSQL to support every UUID 
> version or every format. Supporting additional formats is the direction 
> of supporting every UUID format. Three months from now, somebody is 
> going to propose allowing '-' or ':'. What should the answer be then?

Well, this discussion started with the conventional wisdom about "be
conservative in what you send and liberal in what you accept".  I'd
still resist emitting any UUID format other than the RFC-approved one,
but I don't see anything very wrong in being able to read common
variants.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Robert Haas

> Anyways - I only somewhat disagree. I remember the original discussions, and
> I remember agreeing with the points to keep PostgreSQL UUID support thin and
> rigid. It's valuable for it to be built-in to the database. It's not
> necessarily valuable for PostgreSQL to support every UUID version or every
> format. Supporting additional formats is the direction of supporting every
> UUID format. Three months from now, somebody is going to propose allowing
> '-' or ':'. What should the answer be then?

Beats me.  I didn't make the TODO list, I'm just coding to it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Mark Mielke


Robert Haas wrote:

  1) Reduced error checking.
  2) The '-' is not the only character that people have used. ClearCase uses
'.' and ':' as punctuation.
  3) People already have the option of translating the UUID from their
application to a standard format.
  4) As you find below, and is probably possible to improve on, a fixed
format can be parsed more efficient.



Scenario 1.  I have some standard format UUIDs and I want to parse
them.  This change doesn't bother me at all because if I'm parsing
anywhere enough UUIDs for it to matter, the speed of my CPU, disk, and
memory subsystems will vastly outweigh the difference between the two
implementations.  I measured the different between the two by running
them both in a tight loop on a fixed string.  I challenge anyone to
produce a measurable performance distinction by issuing SQL queries.
I doubt that it is possible.
  


Put a few changes of 2%-3% impact together and you get 10% or more. I'm 
not saying you are wrong, but I disagree that performance should be 
sacrificed for everybody without providing substantial benefit to 
everybody. The question is then, does relaxed UUID parsing provide 
substantial benefit to everybody?



Scenario 2. I have some non-standard format UUIDs and I want to parse
them.  This change helps me a lot, because I'm almost positive that
calling regexp_replace() and then uuid_in() is going to be MUCH slower
than just calling uuid_in().  And if I do that then my error checking
will be REALLY weak, unless I write a custom PL function to make sure
that dashes only occur where they're supposed to be, in which case it
will be even slower.
  


You should know the non-standard format of the UUID, and your 
application should be doing the error checking. It might be slower for 
*you*, but *you* are the one with the special needs. That is, unless you 
are representing a significant portion of the population. What 
percentage are you representing?



Scenario 3. I only want standard-format UUIDs to be accepted into my
database.  Any non-standard format UUIDs should be rejected at parse
time.  This change is pretty irritating, because now I have to use
regexp matching or something to make sure I've got the right format,
and it's going to be significantly slower.

My suspicion is that scenario 2 is a lot more common than scenario 3.
  


I prefer strict formats and early failures. I like that PostgreSQL 
refuses to truncate on insertion. If I have a special format, I'm more 
than willing to convert it from the special format to a standard format 
before doing INSERT/UPDATE. What percentage of people out there feel 
that they benefit from pedantic syntax checking? :-)


I don't know.


I don't know which implementation was used for the PostgreSQL core, but any
hard coded constants would allow for the optimizer to generate instructions
that can run in parallel, or that are better aligned to machine words.

2-3% slow down for what gain? It still doesn't handle all cases, and it's
less able to check the format for correctness.



This change is a long way from letting any old thing through as a
UUID.  I'm sure there are lots of crazy ways to write UUIDs, but
everything I found with a quick Google search would be covered by this
patch, so I think that's pretty good.  A key point for me is that it's
hard to imagine this patch accepting anything that was intended to be
something other than a UUID.  (I am sure someone will now write back
and tell me about their favorite non-UUID thing that happens to have
32 hex digits with dashes for separators, but come on.)
  


It's not that long. If you get ColdFusion support(?), somebody else will 
want the ':', and somebody else will want the '-'.


Anyways - I only somewhat disagree. I remember the original discussions, 
and I remember agreeing with the points to keep PostgreSQL UUID support 
thin and rigid. It's valuable for it to be built-in to the database. 
It's not necessarily valuable for PostgreSQL to support every UUID 
version or every format. Supporting additional formats is the direction 
of supporting every UUID format. Three months from now, somebody is 
going to propose allowing '-' or ':'. What should the answer be then?


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Robert Haas

>>   3) People already have the option of translating the UUID from their
>> application to a standard format.
>
> Regexp, the swiss-army knife of data manipulation. ;)
>
> While possible, it really is not that easy and efficient.  At least we should
> accept dashless UUIDs, so instead of tediously reformatting UUID once
> could do s/-//g

We actually already do accept that.

>>   4) As you find below, and is probably possible to improve on, a fixed
>> format can be parsed more efficient.
>
> What I was thinking about is using the same lookup-table style approach
> as encode()/decode() pair uses.  Should be faster than current implementation,
> and skipping over '-' (and even ':' or '.') is even simpler.  I don't
> know internals
> good enough to know how that would work in encodings like UTF16...
>
> See http://doxygen.postgresql.org/encode_8c-source.html#l00107

I thought about this, but it's sort of not worth it.  We're talking
about a function that already executes in something on the order of a
microsecond.  Shaving another 10% off isn't going to help anyone.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Robert Haas

>   1) Reduced error checking.
>   2) The '-' is not the only character that people have used. ClearCase uses
> '.' and ':' as punctuation.
>   3) People already have the option of translating the UUID from their
> application to a standard format.
>   4) As you find below, and is probably possible to improve on, a fixed
> format can be parsed more efficient.

Scenario 1.  I have some standard format UUIDs and I want to parse
them.  This change doesn't bother me at all because if I'm parsing
anywhere enough UUIDs for it to matter, the speed of my CPU, disk, and
memory subsystems will vastly outweigh the difference between the two
implementations.  I measured the different between the two by running
them both in a tight loop on a fixed string.  I challenge anyone to
produce a measurable performance distinction by issuing SQL queries.
I doubt that it is possible.

Scenario 2. I have some non-standard format UUIDs and I want to parse
them.  This change helps me a lot, because I'm almost positive that
calling regexp_replace() and then uuid_in() is going to be MUCH slower
than just calling uuid_in().  And if I do that then my error checking
will be REALLY weak, unless I write a custom PL function to make sure
that dashes only occur where they're supposed to be, in which case it
will be even slower.

Scenario 3. I only want standard-format UUIDs to be accepted into my
database.  Any non-standard format UUIDs should be rejected at parse
time.  This change is pretty irritating, because now I have to use
regexp matching or something to make sure I've got the right format,
and it's going to be significantly slower.

My suspicion is that scenario 2 is a lot more common than scenario 3.

> I don't know which implementation was used for the PostgreSQL core, but any
> hard coded constants would allow for the optimizer to generate instructions
> that can run in parallel, or that are better aligned to machine words.
>
> 2-3% slow down for what gain? It still doesn't handle all cases, and it's
> less able to check the format for correctness.

This change is a long way from letting any old thing through as a
UUID.  I'm sure there are lots of crazy ways to write UUIDs, but
everything I found with a quick Google search would be covered by this
patch, so I think that's pretty good.  A key point for me is that it's
hard to imagine this patch accepting anything that was intended to be
something other than a UUID.  (I am sure someone will now write back
and tell me about their favorite non-UUID thing that happens to have
32 hex digits with dashes for separators, but come on.)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-10 Thread Dawid Kuroczko

On Fri, Oct 10, 2008 at 7:28 AM, Mark Mielke <[EMAIL PROTECTED]> wrote:
> Robert Haas wrote:
>> While we could perhaps accept only those variant formats which we
>> specifically know someone to be using, it seems likely that people
>> will keep moving those pesky dashes around, and we'll likely end up
>> continuing to add more formats and arguing about which ones are widely
>> enough used to deserve being on the list.  So my vote is - as long as
>> they don't put a dash in the middle of a group of four (aka a byte),
>> just let it go.
> I somewhat disagree with supporting other formats. Reasons include:
>
>   1) Reduced error checking.

Hmm, I tend to disagree.  If UUIDs were variable length (different number
of digits), then perhaps yes.  But as all UUIDs have same number of
digits, the dashes inbetween them act as decorators.

>   2) The '-' is not the only character that people have used. ClearCase uses
> '.' and ':' as punctuation.

I would be more in favor of accepting MAC-address style notation AA:BB:CC:DD
also, in that case, but I think its going too far...  So, I am for sticking with
dashes and groups of four :)

>   3) People already have the option of translating the UUID from their
> application to a standard format.

Regexp, the swiss-army knife of data manipulation. ;)

While possible, it really is not that easy and efficient.  At least we should
accept dashless UUIDs, so instead of tediously reformatting UUID once
could do s/-//g

>   4) As you find below, and is probably possible to improve on, a fixed
> format can be parsed more efficient.

What I was thinking about is using the same lookup-table style approach
as encode()/decode() pair uses.  Should be faster than current implementation,
and skipping over '-' (and even ':' or '.') is even simpler.  I don't
know internals
good enough to know how that would work in encodings like UTF16...

See http://doxygen.postgresql.org/encode_8c-source.html#l00107

   Best regards,
   Dawid Kuroczko
-- 
  ..``The essence of real creativity is a certain
 : *Dawid Kuroczko* : playfulness, a flitting from idea to idea
 : [EMAIL PROTECTED] : without getting bogged down by fixated demands.''
 `..'  Sherkaner Underhill, A Deepness in the Sky, V. Vinge

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-09 Thread Mark Mielke


Robert Haas wrote:

While we could perhaps accept only those variant formats which we
specifically know someone to be using, it seems likely that people
will keep moving those pesky dashes around, and we'll likely end up
continuing to add more formats and arguing about which ones are widely
enough used to deserve being on the list.  So my vote is - as long as
they don't put a dash in the middle of a group of four (aka a byte),
just let it go.
  


I somewhat disagree with supporting other formats. Reasons include:

   1) Reduced error checking.
   2) The '-' is not the only character that people have used. 
ClearCase uses '.' and ':' as punctuation.
   3) People already have the option of translating the UUID from their 
application to a standard format.
   4) As you find below, and is probably possible to improve on, a 
fixed format can be parsed more efficient.



Somewhat to my surprise, this implementation appears to be about 2-3%
slower than the one it replaces, as measured using a trivial test
harness.  I would have thought that eliminating a call to strlen() and
an extra copy of the data would have actually picked up some speed,
but it seems not.  Any thoughts on the reason?  In any case, I don't
believe there's any possible use case where a 2-3% slowdown in
uuid_to_string is actually perceptible to the user, since I had to
call it 100 million times in a tight loop to measure it.
  


I don't know which implementation was used for the PostgreSQL core, but 
any hard coded constants would allow for the optimizer to generate 
instructions that can run in parallel, or that are better aligned to 
machine words.


2-3% slow down for what gain? It still doesn't handle all cases, and 
it's less able to check the format for correctness.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-09 Thread Robert Haas

The attached patch allows uuid_in() to parse a wider variety of
variant input formats for the UUID data type, per the TODO named in
the subject line.

Original discussion here:

http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php
http://archives.postgresql.org/pgsql-hackers/2008-02/msg01264.php

The original discussion left unresolved the question of what variant
input formats to accept.  This patch takes the approach of allowing an
optional hyphen after each group of four hex digits.  This will allow
4x-4x-4x-4x-4x-4x-4x-4x (the format that originally prompted the
discussion) as well as things like the Coldfusion format:,
8x-4x-4x-16x:

http://livedocs.adobe.com/coldfusion/6.1/htmldocs/functi54.htm

...and then there's this, which seems to be using 8x-8x-8x-8x:

http://lists.xensource.com/archives/html/xen-changelog/2005-11/msg00557.html

While we could perhaps accept only those variant formats which we
specifically know someone to be using, it seems likely that people
will keep moving those pesky dashes around, and we'll likely end up
continuing to add more formats and arguing about which ones are widely
enough used to deserve being on the list.  So my vote is - as long as
they don't put a dash in the middle of a group of four (aka a byte),
just let it go.

Somewhat to my surprise, this implementation appears to be about 2-3%
slower than the one it replaces, as measured using a trivial test
harness.  I would have thought that eliminating a call to strlen() and
an extra copy of the data would have actually picked up some speed,
but it seems not.  Any thoughts on the reason?  In any case, I don't
believe there's any possible use case where a 2-3% slowdown in
uuid_to_string is actually perceptible to the user, since I had to
call it 100 million times in a tight loop to measure it.

...Robert
Index: doc/src/sgml/datatype.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/datatype.sgml,v
retrieving revision 1.229
diff -c -r1.229 datatype.sgml
*** doc/src/sgml/datatype.sgml	3 Oct 2008 15:37:18 -	1.229
--- doc/src/sgml/datatype.sgml	10 Oct 2008 02:39:18 -
***
*** 3550,3560 
  PostgreSQL also accepts the following
  alternative forms for input:
  use of upper-case digits, the standard format surrounded by
! braces, and omitting the hyphens.  Examples are:
  
  A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
  {a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
  a0eebc999c0b4ef8bb6d6bb9bd380a11
  
  Output is always in the standard form.
 
--- 3550,3563 
  PostgreSQL also accepts the following
  alternative forms for input:
  use of upper-case digits, the standard format surrounded by
! braces, omitting some or all hyphens, adding a hyphen after any
! 	group of four digits.  Examples are:
  
  A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
  {a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
  a0eebc999c0b4ef8bb6d6bb9bd380a11
+ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
+ {a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11}
  
  Output is always in the standard form.
 
Index: src/backend/utils/adt/uuid.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/uuid.c,v
retrieving revision 1.7
diff -c -r1.7 uuid.c
*** src/backend/utils/adt/uuid.c	1 Jan 2008 20:31:21 -	1.7
--- src/backend/utils/adt/uuid.c	10 Oct 2008 02:39:19 -
***
*** 74,133 
  }
  
  /*
!  * We allow UUIDs in three input formats: 8x-4x-4x-4x-12x,
!  * {8x-4x-4x-4x-12x}, and 32x, where "nx" means n hexadecimal digits
!  * (only the first format is used for output). We convert the first
!  * two formats into the latter format before further processing.
   */
  static void
  string_to_uuid(const char *source, pg_uuid_t *uuid)
  {
! 	char		hex_buf[32];	/* not NUL terminated */
! 	int			i;
! 	int			src_len;
  
! 	src_len = strlen(source);
! 	if (src_len != 32 && src_len != 36 && src_len != 38)
! 		goto syntax_error;
! 
! 	if (src_len == 32)
! 		memcpy(hex_buf, source, src_len);
! 	else
  	{
! 		const char *str = source;
! 
! 		if (src_len == 38)
! 		{
! 			if (str[0] != '{' || str[37] != '}')
! goto syntax_error;
! 
! 			str++;/* skip the first character */
! 		}
! 
! 		if (str[8] != '-' || str[13] != '-' ||
! 			str[18] != '-' || str[23] != '-')
! 			goto syntax_error;
! 
! 		memcpy(hex_buf, str, 8);
! 		memcpy(hex_buf + 8, str + 9, 4);
! 		memcpy(hex_buf + 12, str + 14, 4);
! 		memcpy(hex_buf + 16, str + 19, 4);
! 		memcpy(hex_buf + 20, str + 24, 12);
  	}
  
  	for (i = 0; i < UUID_LEN; i++)
  	{
  		char		str_buf[3];
  
! 		memcpy(str_buf, &hex_buf[i * 2], 2);
  		if (!isxdigit((unsigned char) str_buf[0]) ||
  			!isxdigit((unsigned char) str_buf[1]))
  			goto syntax_error;
  
  		str_buf[2] = '\0';
  		uuid->data[i] = (unsigned char) strtoul(str_buf, NULL, 16);
  	}
  
  	return;
  
  syntax_error:
--- 74,122 
  }
  
  /*
!  * We allow UUIDs as a ser

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

[HACKERS] patch: Allow the UUID type to accept non-standard formats

19 matches

Site Navigation

Mail list logo

Footer information