The attached patch allows uuid_in() to parse a wider variety of variant input formats for the UUID data type, per the TODO named in the subject line.
Original discussion here: http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php http://archives.postgresql.org/pgsql-hackers/2008-02/msg01264.php The original discussion left unresolved the question of what variant input formats to accept. This patch takes the approach of allowing an optional hyphen after each group of four hex digits. This will allow 4x-4x-4x-4x-4x-4x-4x-4x (the format that originally prompted the discussion) as well as things like the Coldfusion format:, 8x-4x-4x-16x: http://livedocs.adobe.com/coldfusion/6.1/htmldocs/functi54.htm ...and then there's this, which seems to be using 8x-8x-8x-8x: http://lists.xensource.com/archives/html/xen-changelog/2005-11/msg00557.html While we could perhaps accept only those variant formats which we specifically know someone to be using, it seems likely that people will keep moving those pesky dashes around, and we'll likely end up continuing to add more formats and arguing about which ones are widely enough used to deserve being on the list. So my vote is - as long as they don't put a dash in the middle of a group of four (aka a byte), just let it go. Somewhat to my surprise, this implementation appears to be about 2-3% slower than the one it replaces, as measured using a trivial test harness. I would have thought that eliminating a call to strlen() and an extra copy of the data would have actually picked up some speed, but it seems not. Any thoughts on the reason? In any case, I don't believe there's any possible use case where a 2-3% slowdown in uuid_to_string is actually perceptible to the user, since I had to call it 100 million times in a tight loop to measure it. ...Robert
Index: doc/src/sgml/datatype.sgml =================================================================== RCS file: /projects/cvsroot/pgsql/doc/src/sgml/datatype.sgml,v retrieving revision 1.229 diff -c -r1.229 datatype.sgml *** doc/src/sgml/datatype.sgml 3 Oct 2008 15:37:18 -0000 1.229 --- doc/src/sgml/datatype.sgml 10 Oct 2008 02:39:18 -0000 *************** *** 3550,3560 **** <productname>PostgreSQL</productname> also accepts the following alternative forms for input: use of upper-case digits, the standard format surrounded by ! braces, and omitting the hyphens. Examples are: <programlisting> A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11 {a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11} a0eebc999c0b4ef8bb6d6bb9bd380a11 </programlisting> Output is always in the standard form. </para> --- 3550,3563 ---- <productname>PostgreSQL</productname> also accepts the following alternative forms for input: use of upper-case digits, the standard format surrounded by ! braces, omitting some or all hyphens, adding a hyphen after any ! group of four digits. Examples are: <programlisting> A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11 {a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11} a0eebc999c0b4ef8bb6d6bb9bd380a11 + a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11 + {a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11} </programlisting> Output is always in the standard form. </para> Index: src/backend/utils/adt/uuid.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/uuid.c,v retrieving revision 1.7 diff -c -r1.7 uuid.c *** src/backend/utils/adt/uuid.c 1 Jan 2008 20:31:21 -0000 1.7 --- src/backend/utils/adt/uuid.c 10 Oct 2008 02:39:19 -0000 *************** *** 74,133 **** } /* ! * We allow UUIDs in three input formats: 8x-4x-4x-4x-12x, ! * {8x-4x-4x-4x-12x}, and 32x, where "nx" means n hexadecimal digits ! * (only the first format is used for output). We convert the first ! * two formats into the latter format before further processing. */ static void string_to_uuid(const char *source, pg_uuid_t *uuid) { ! char hex_buf[32]; /* not NUL terminated */ ! int i; ! int src_len; ! src_len = strlen(source); ! if (src_len != 32 && src_len != 36 && src_len != 38) ! goto syntax_error; ! ! if (src_len == 32) ! memcpy(hex_buf, source, src_len); ! else { ! const char *str = source; ! ! if (src_len == 38) ! { ! if (str[0] != '{' || str[37] != '}') ! goto syntax_error; ! ! str++; /* skip the first character */ ! } ! ! if (str[8] != '-' || str[13] != '-' || ! str[18] != '-' || str[23] != '-') ! goto syntax_error; ! ! memcpy(hex_buf, str, 8); ! memcpy(hex_buf + 8, str + 9, 4); ! memcpy(hex_buf + 12, str + 14, 4); ! memcpy(hex_buf + 16, str + 19, 4); ! memcpy(hex_buf + 20, str + 24, 12); } for (i = 0; i < UUID_LEN; i++) { char str_buf[3]; ! memcpy(str_buf, &hex_buf[i * 2], 2); if (!isxdigit((unsigned char) str_buf[0]) || !isxdigit((unsigned char) str_buf[1])) goto syntax_error; str_buf[2] = '\0'; uuid->data[i] = (unsigned char) strtoul(str_buf, NULL, 16); } return; syntax_error: --- 74,122 ---- } /* ! * We allow UUIDs as a series of 32 hexadecimal digits with an optional dash ! * after each group of 4 hexadecimal digits, and optionally surrounded by {}. ! * (The canonical format 8x-4x-4x-4x-12x, where "nx" means n hexadecimal ! * digits, is the only one used for output.) */ static void string_to_uuid(const char *source, pg_uuid_t *uuid) { ! const char *src = source; ! int i, braces = 0; ! if (src[0] == '{') { ! ++src; ! braces = 1; } for (i = 0; i < UUID_LEN; i++) { char str_buf[3]; ! if (src[0] == '\0' || src[1] == '\0') ! goto syntax_error; ! memcpy(str_buf, src, 2); if (!isxdigit((unsigned char) str_buf[0]) || !isxdigit((unsigned char) str_buf[1])) goto syntax_error; str_buf[2] = '\0'; uuid->data[i] = (unsigned char) strtoul(str_buf, NULL, 16); + src += src[2] == '-' && (i % 2) == 1 && i < UUID_LEN - 1 ? 3 : 2; } + if (braces) + { + if (*src!= '}') + goto syntax_error; + ++src; + } + + if (*src != '\0') + goto syntax_error; + return; syntax_error:
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers