Re: UUID v7

Andrey M . Borodin Sat, 26 Oct 2024 10:12:18 -0700

Thanks for the review!

> On 18 Oct 2024, at 02:16, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> 
> On Sun, Aug 4, 2024 at 3:51 AM Andrey M. Borodin <x4...@yandex-team.ru> wrote:
>> 
>> 
>> 
>>> On 28 Jul 2024, at 23:44, Andrey M. Borodin <x4...@yandex-team.ru> wrote:
>>> 
>>> PFA version accepting offset interval.
>> 
>> There was a bug: when time was not moving on, I was updating used time by a 
>> nanosecond, instead of 1/4096 of millisecond.
>> V27 fixes that.
>> 
>> Thanks!
> 
> I've reviewed the v27 patch and have some comments:
> 
> ---
> in datatype.sgml:
> 
>    The data type <type>uuid</type> stores Universally Unique Identifiers
>    (UUID) as defined by <ulink
> url="https://datatracker.ietf.org/doc/html/rfc4122";>RFC 4122</ulink>,
>    ISO/IEC 9834-8:2005, and related standards.
> 
> In funcs.sgml:
>   This function extracts the version from a UUID of the variant described by
>   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122";>RFC
> 4122</ulink>.  For
> 
> Maybe these references of RFC4122 need to be updated as well.


Fixed.

> ---
> 'git show --check' raises a warning:

Fixed.

> 
> src/backend/utils/adt/uuid.c:520: trailing whitespace.
> +
> 
> ---
> +
> +   if (PG_NARGS() > 0)
> +   {
> +       Interval *span;
> +       TimestampTz ts = (TimestampTz) (ns / 1000) -
> +           (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY *
> USECS_PER_SEC;
> +       span = PG_GETARG_INTERVAL_P(0);
> +       ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
> +                                                    TimestampTzGetDatum(ts),
> +                                                    
> IntervalPGetDatum(span)));
> +       ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> SECS_PER_DAY * USECS_PER_SEC)
> +           * 1000 + ns % 1000;
> +   }
> 
> We need to add a comment to describe what/why we're doing here.

Done.

> 
> ---
> + * Monotonicity (regarding generation on given backend) is ensured with 
> method
> + * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)"
> 
> Need a period at the end of this sentence.

Fixed.

> 
> ---
> +{ oid => '9896', descr => 'generate UUID version 7',
> +  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
> +  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
> +{ oid => '9897', descr => 'generate UUID version 7',
> +  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
> +  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7' },
> 
> Both functions have the same description but work differently. I think
> it's better to clarify the description of uuidv7() that takes an
> interval.

I've slightly extended the description... not it's 'generate UUID version 7 
with a timestamp shifted on specific interval'. Perhaps, we can come up with 
something better.

> 
> ---
> - oid | proname | oid | proname
> ------+---------+-----+---------
> -(0 rows)
> + oid  | proname | oid  | proname
> +------+---------+------+---------
> + 9896 | uuidv7  | 9897 | uuidv7
> +(1 row)
> 
> I think that we need to change these functions so that this check
> query doesn't return anything, no?

We have 4 options:
0. Remove uuidv7(interval). But it brings imporatne functionality to the table: 
we can avoid contention points while massively insert data.
1. Give different names to uuidv7() and uuidv7(interval).
2. Allow importing pg_node_tree  (see v7 of the patch)
3. Change this query. Comment to this query suggest that it checks for exactly 
this case: same function is declared with different number of arguments.

IMO approach number 3 is best. However, I do not understand why this query 
check was introduced in the first place. Maybe, there are string arguments why 
we should not do same-named functions with different number of arguments.

> 
> ---
> +   if (version == 6)
> +   {
> +       tms = ((uint64) uuid->data[0]) << 52;
> +       tms += ((uint64) uuid->data[1]) << 44;
> +       tms += ((uint64) uuid->data[2]) << 36;
> +       tms += ((uint64) uuid->data[3]) << 28;
> +       tms += ((uint64) uuid->data[4]) << 20;
> +       tms += ((uint64) uuid->data[5]) << 12;
> +       tms += (((uint64) uuid->data[6]) & 0xf) << 8;
> +       tms += ((uint64) uuid->data[7]);
> +
> +       /* convert 100-ns intervals to us, then adjust */
> +       ts = (TimestampTz) (tms / 10) -
> +           ((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) *
> SECS_PER_DAY * USECS_PER_SEC;
> +
> +       PG_RETURN_TIMESTAMPTZ(ts);
> +   }
> 
> It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
> spite of not supporting UUID v6 generation. I think it makes more
> sense to support UUID v6 generation as well, if the need for it is
> high.

RFC urges to use UUIDv7 instead of UUIDv6 when possible. I'm fine with 
providing implementation, it's trivial. PFA patch with implementation.


Best regards, Andrey Borodin.

v28-0001-Implement-UUID-v7.patch
Description: Binary data

Re: UUID v7

Reply via email to