> On 7 Nov 2024, at 12:42, Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> On Wed, Nov 6, 2024 at 10:14 AM Andrey M. Borodin <x4...@yandex-team.ru>
> wrote:
>>
>>
>>
>>> On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4...@yandex-team.ru> wrote:
>>>
>>> <v30-0001-Implement-UUID-v7.patch>
>>
>> Some more thoughts on this patch version:
>>
>> 0. Comment mentioning nanoseconds, while we do not need to carry anything
>> /* Convert TimestampTz back and carry nanoseconds. */
>>
>> 1. There's unnecessary &3 in
>> uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);
>>
>> 2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023
>> are unused. We could use them for overflow. That would slightly increase
>> non-overflowing capacity when generating more than million UUIDs per second
>> on one backend. However, given current performance of our CSPRNG I do not
>> think this feature worth code complexity.
>>
>
> While using only 10 bits microseconds makes the implementation simple,
> I'm not sure if 10 bits is enough to generate UUIDs at microsecond
> granularity without losing monotonicity. Since 10-bit microseconds are
> used as is in rand_a space, 1000 UUIDs can be generated per
> millisecond without losing monotonicity.
We won’t loose monotonicity on one backend. We will just accumulate time shift.
See
+ us = tv.tv_sec * SECS_PER_DAY * USECS_PER_SEC + tv.tv_usec;
+ if (previous_us >= us)
+ us = previous_us + 1;
>
> For example, in my environment, it took 1808 milliseconds to generate
> 1 million UUIDs. This is about 533 UUIDs generated per millisecond.
BTW we can furether improve this performance by buffering CSPRNG. See
v6-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch in this thread.
> As
> UUID generation performance improves, I think 10 bits will not be
> enough.
>
> =# select count(uuidv7()) from generate_series(1, 1_000_000);
> count
> ---------
> 1000000
> (1 row)
>
> Time: 1808.734 ms
>
> I found a similar comment from Sergey Prokhorenko[1]. He also mentioned:
>
>> 4) Microsecond timestamp fraction subtracts 10 bits from random data, which
>> increases the risk of collision. In the counter, almost all bits are
>> initialized with a random number, which reduces the risk of collision.
>
> I feel that it's better to switch to Method 1 or 2 with 12 bits or
> larger counter space.
>
12 bits does not differ much. We can have much longer counters. Before
switching to Method 3 I used 18 bits counter. See version
v24-0001-Implement-UUID-v7.patch
This version is more resilent to generating a lot of UUIDs on one backend while
still not accumulating time shift.
Yet, UUIDs generated on parallel workers will loose some sortability.
Personally, I think both methods are good. I’d even combine them both. But RFC
seems to be not allowing this. BTW if we just continue to use nanoseconds
patch, zero bits will act exactly as counters.
Best regards, Andrey Borodin.