[HACKERS] patch: Allow the UUID type to accept non-standard formats

Robert Haas Thu, 09 Oct 2008 20:10:12 -0700

The attached patch allows uuid_in() to parse a wider variety of
variant input formats for the UUID data type, per the TODO named in
the subject line.


Original discussion here:

http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php
http://archives.postgresql.org/pgsql-hackers/2008-02/msg01264.php

The original discussion left unresolved the question of what variant
input formats to accept.  This patch takes the approach of allowing an
optional hyphen after each group of four hex digits.  This will allow
4x-4x-4x-4x-4x-4x-4x-4x (the format that originally prompted the
discussion) as well as things like the Coldfusion format:,
8x-4x-4x-16x:

http://livedocs.adobe.com/coldfusion/6.1/htmldocs/functi54.htm

...and then there's this, which seems to be using 8x-8x-8x-8x:

http://lists.xensource.com/archives/html/xen-changelog/2005-11/msg00557.html

While we could perhaps accept only those variant formats which we
specifically know someone to be using, it seems likely that people
will keep moving those pesky dashes around, and we'll likely end up
continuing to add more formats and arguing about which ones are widely
enough used to deserve being on the list.  So my vote is - as long as
they don't put a dash in the middle of a group of four (aka a byte),
just let it go.

Somewhat to my surprise, this implementation appears to be about 2-3%
slower than the one it replaces, as measured using a trivial test
harness.  I would have thought that eliminating a call to strlen() and
an extra copy of the data would have actually picked up some speed,
but it seems not.  Any thoughts on the reason?  In any case, I don't
believe there's any possible use case where a 2-3% slowdown in
uuid_to_string is actually perceptible to the user, since I had to
call it 100 million times in a tight loop to measure it.

...Robert

Index: doc/src/sgml/datatype.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/datatype.sgml,v
retrieving revision 1.229
diff -c -r1.229 datatype.sgml
*** doc/src/sgml/datatype.sgml	3 Oct 2008 15:37:18 -0000	1.229
--- doc/src/sgml/datatype.sgml	10 Oct 2008 02:39:18 -0000
***************
*** 3550,3560 ****
      <productname>PostgreSQL</productname> also accepts the following
      alternative forms for input:
      use of upper-case digits, the standard format surrounded by
!     braces, and omitting the hyphens.  Examples are:
  <programlisting>
  A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
  {a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
  a0eebc999c0b4ef8bb6d6bb9bd380a11
  </programlisting>
      Output is always in the standard form.
     </para>
--- 3550,3563 ----
      <productname>PostgreSQL</productname> also accepts the following
      alternative forms for input:
      use of upper-case digits, the standard format surrounded by
!     braces, omitting some or all hyphens, adding a hyphen after any
! 	group of four digits.  Examples are:
  <programlisting>
  A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
  {a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
  a0eebc999c0b4ef8bb6d6bb9bd380a11
+ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
+ {a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11}
  </programlisting>
      Output is always in the standard form.
     </para>
Index: src/backend/utils/adt/uuid.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/uuid.c,v
retrieving revision 1.7
diff -c -r1.7 uuid.c
*** src/backend/utils/adt/uuid.c	1 Jan 2008 20:31:21 -0000	1.7
--- src/backend/utils/adt/uuid.c	10 Oct 2008 02:39:19 -0000
***************
*** 74,133 ****
  }
  
  /*
!  * We allow UUIDs in three input formats: 8x-4x-4x-4x-12x,
!  * {8x-4x-4x-4x-12x}, and 32x, where "nx" means n hexadecimal digits
!  * (only the first format is used for output). We convert the first
!  * two formats into the latter format before further processing.
   */
  static void
  string_to_uuid(const char *source, pg_uuid_t *uuid)
  {
! 	char		hex_buf[32];	/* not NUL terminated */
! 	int			i;
! 	int			src_len;
  
! 	src_len = strlen(source);
! 	if (src_len != 32 && src_len != 36 && src_len != 38)
! 		goto syntax_error;
! 
! 	if (src_len == 32)
! 		memcpy(hex_buf, source, src_len);
! 	else
  	{
! 		const char *str = source;
! 
! 		if (src_len == 38)
! 		{
! 			if (str[0] != '{' || str[37] != '}')
! 				goto syntax_error;
! 
! 			str++;				/* skip the first character */
! 		}
! 
! 		if (str[8] != '-' || str[13] != '-' ||
! 			str[18] != '-' || str[23] != '-')
! 			goto syntax_error;
! 
! 		memcpy(hex_buf, str, 8);
! 		memcpy(hex_buf + 8, str + 9, 4);
! 		memcpy(hex_buf + 12, str + 14, 4);
! 		memcpy(hex_buf + 16, str + 19, 4);
! 		memcpy(hex_buf + 20, str + 24, 12);
  	}
  
  	for (i = 0; i < UUID_LEN; i++)
  	{
  		char		str_buf[3];
  
! 		memcpy(str_buf, &hex_buf[i * 2], 2);
  		if (!isxdigit((unsigned char) str_buf[0]) ||
  			!isxdigit((unsigned char) str_buf[1]))
  			goto syntax_error;
  
  		str_buf[2] = '\0';
  		uuid->data[i] = (unsigned char) strtoul(str_buf, NULL, 16);
  	}
  
  	return;
  
  syntax_error:
--- 74,122 ----
  }
  
  /*
!  * We allow UUIDs as a series of 32 hexadecimal digits with an optional dash
!  * after each group of 4 hexadecimal digits, and optionally surrounded by {}.
!  * (The canonical format 8x-4x-4x-4x-12x, where "nx" means n hexadecimal
!  * digits, is the only one used for output.)
   */
  static void
  string_to_uuid(const char *source, pg_uuid_t *uuid)
  {
! 	const char *src = source;
! 	int	i, braces = 0;
  
! 	if (src[0] == '{')
  	{
! 		++src;
! 		braces = 1;
  	}
  
  	for (i = 0; i < UUID_LEN; i++)
  	{
  		char		str_buf[3];
  
! 		if (src[0] == '\0' || src[1] == '\0')
! 			goto syntax_error;
! 		memcpy(str_buf, src, 2);
  		if (!isxdigit((unsigned char) str_buf[0]) ||
  			!isxdigit((unsigned char) str_buf[1]))
  			goto syntax_error;
  
  		str_buf[2] = '\0';
  		uuid->data[i] = (unsigned char) strtoul(str_buf, NULL, 16);
+ 		src += src[2] == '-' && (i % 2) == 1 && i < UUID_LEN - 1 ? 3 : 2;
  	}
  
+ 	if (braces)
+ 	{
+ 		if (*src!= '}')
+ 			goto syntax_error;
+ 		++src;
+ 	}
+ 
+ 	if (*src != '\0')
+ 		goto syntax_error;
+ 
  	return;
  
  syntax_error:

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch: Allow the UUID type to accept non-standard formats

Reply via email to