Re: UUID v7

2024-05-08 Thread Andrey M. Borodin



> On 3 May 2024, at 11:18, Andrey M. Borodin  wrote:
> 
> RFC 9562 is not in AUTH48-Done state, it was approved by authors and editor, 
> and now should be published.

It's RFC now.
https://datatracker.ietf.org/doc/rfc9562/


Best regards, Andrey Borodin.



Re: UUID v7

2024-05-04 Thread Andrey M. Borodin



> On 3 May 2024, at 11:18, Andrey M. Borodin  wrote:
> 

Here's the documentation from ClickHouse [0] for their implementation. It's 
identical to provided patch in this thread, with few notable exceptions:

1. Counter is 42 bits, not 18. The counter have no guard bits, every bit is 
initialized with random number on time ticks.
2. By default counter is shared between threads. Alternative function 
generateUUIDv7ThreadMonotonic() provides thread-local counter.

Thanks!


Best regards, Andrey Borodin.

[0] 
https://clickhouse.com/docs/en/sql-reference/functions/uuid-functions#generateUUIDv7



Re: UUID v7

2024-05-03 Thread Andrey M. Borodin


> On 13 Apr 2024, at 11:58, Andrey M. Borodin  wrote:
> 
> New UUID is assigned RFC number 9562, it was aproved by RFC editors and is 
> now in AUTH48 state.

RFC 9562 is not in AUTH48-Done state, it was approved by authors and editor, 
and now should be published.


Best regards, Andrey Borodin.



Re: UUID v7

2024-04-15 Thread Michael Paquier
On Sat, Apr 13, 2024 at 07:07:34PM +, Sergey Prokhorenko wrote:
> I think that for the sake of such an epoch-making thing as UUIDv7 it
> would be worth slightly unfreezing this feature freeze.

A feature freeze is here to freeze things in place.  This comes up
every year, and that won't happen.

> New UUID is assigned RFC number 9562, it was aproved by RFC editors
> and is now in AUTH48 state. This means after final approval by
> authors RFC will be imminently publicised. Most probably, this will
> happen circa 2 weeks after feature freeze :)
> 
> [0] https://www.rfc-editor.org/auth48/rfc9562  

Well, that's life.  It looks like this is waiting for some final
approval, which may take some more time.  I have no idea how long this
usually takes. 
--
Michael


signature.asc
Description: PGP signature


Re: UUID v7

2024-04-13 Thread Sergey Prokhorenko
I think that for the sake of such an epoch-making thing as UUIDv7 it would be 
worth slightly unfreezing this feature freeze.

Best regards, 


Sergey prokhorenkosergeyprokhore...@yahoo.com.au 

On Saturday, 13 April 2024 at 09:58:29 am GMT+3, Andrey M. Borodin 
 wrote:  
 
 

> On 12 Mar 2024, at 20:41, Jelte Fennema-Nio  wrote:
> 
> if e.g.
> the RFC got approved 2 weeks after Postgres its feature freeze

Jelte, you seem to be the visionary! I would consider participating in 
lotteries or betting.
New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now 
in AUTH48 state. This means after final approval by authors RFC will be 
imminently publicised. Most probably, this will happen circa 2 weeks after 
feature freeze :)


Best regards, Andrey Borodin.

[0] https://www.rfc-editor.org/auth48/rfc9562  

Re: UUID v7

2024-04-13 Thread Andrey M. Borodin



> On 12 Mar 2024, at 20:41, Jelte Fennema-Nio  wrote:
> 
> if e.g.
> the RFC got approved 2 weeks after Postgres its feature freeze

Jelte, you seem to be the visionary! I would consider participating in 
lotteries or betting.
New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now 
in AUTH48 state. This means after final approval by authors RFC will be 
imminently publicised. Most probably, this will happen circa 2 weeks after 
feature freeze :)


Best regards, Andrey Borodin.

[0] https://www.rfc-editor.org/auth48/rfc9562



Re: UUID v7

2024-04-06 Thread Sergey Prokhorenko
For every complex problem there is an answer that is clear, simple, and wrong. 
Since the RFC allows microsecond timestamp granularity, the first thing that 
comes to everyone's mind is to insert microsecond granularity into UUIDv7. And 
if the RFC allowed nanosecond timestamp granularity, then they would try to 
insert nanosecond granularity into UUIDv7.
But I am categorically against abandoning the counter under pressure from the 
unfounded proposal to replace the counter with microsecond granularity.
1) The RFC specifies millisecond timestamp granularity by default.
2) All advanced UUIDv7 implementations include a counter:• for JavaScript 
https://www.npmjs.com/package/uuidv7• for Rust https://crates.io/crates/uuid7• 
for Go (Golang) https://pkg.go.dev/github.com/gofrs/uuid#NewV7• for Python 
https://github.com/oittaa/uuid6-python
3) The theoretical performance of generating UUIDv7 without loss of 
monotonicity for microsecond granularity is only 1000 UUIDv7 per millisecond. 
This is very low and insufficient generation performance! But the actual 
generation performance is even worse, since the generation demand is unevenly 
distributed within a millisecond. Therefore, a UUIDv7 will not be generated 
every microsecond.
For a counter 18 bits long, with the most significant bit initialized to zero 
and the remaining bits initialized to a random number, the actual performance 
of generating a UUIDv7 without loss of monotonicity is between 2 to the power 
of 17 = 131072 UUIDv7 per millisecond (if the random number happens to be all 
ones) to 2 to the power of 18 = 262144 UUIDv7 per millisecond (if the random 
number happens to be all zeros). This is more than enough.
4) Microsecond timestamp fraction subtracts 10 bits from random data, which 
increases the risk of collision. In the counter, almost all bits are 
initialized with a random number, which reduces the risk of collision.


The only reasonable use of microsecond granularity is when writing to a 
database table in parallel. However, monotonicity in this case can be ensured 
in another way, namely a single UUIDv7 generator per database table, similar to 
SERIAL 
(https://postgrespro.com/docs/postgresql/16/datatype-numeric#DATATYPE-SERIAL) 
in PostgreSQL.
Best regards,
Sergey prokhorenkosergeyprokhore...@yahoo.com.au 

On Thursday, 4 April 2024 at 09:12:17 pm GMT+3, Andrey M. Borodin 
 wrote:  
 
 
...

At this point we can skip the counter\microseconds entirely, just fill 
everything after unix_ts_ms with randomness. It's still a valid UUIDv7, 
exhibiting much more data locality than UUIDv4. We can adjust this sortability 
measures later.


Best regards, Andrey Borodin.  

Re: UUID v7

2024-04-04 Thread Andrey M. Borodin



> On 4 Apr 2024, at 18:45, Peter Eisentraut  wrote:
> 
> On 26.03.24 18:26, Andrey M. Borodin wrote:
>>> Also, you are initializing 4 bits (I think?) to zero to guard against 
>>> counter rollovers (so it's really just an 8 bit counter?).  But nothing 
>>> checks against such rollovers, so I don't understand the use of that.
>> No, there's only one guard rollover bit.
>> Here: uuid->data[6] = (uuid->data[6] & 0xf7);
>> Bits that are called "guard bits" do not guard anything, they just ensure 
>> counter capacity when it is initialized.
> 
> Uh, I guess I don't understand this at all.  I tried to dig up some 
> information about this, but didn't find anything.  What exactly is the 
> mechanism of these "counter rollover guards"?  If they don't guard anything, 
> what are they supposed to accomplish?
> 

My understanding of guard bits is the following: on every UUID generation, when 
time is advancing, counter bits are initialized with random numbers, except 
guard bits. Guard bits are always initialized with zeroes.

Let's consider we have a 1-byte counter with 4 guard bits and 4 normal bits.
If we generate some UUIDs at the very same millisecond we might have following 
counter values:

0C <--- lower nibble is initialized with random 4 bits C.
0D
0E
0F
10
11
12

If we have no these guard bits we might get random numbers that are immifiately 
at the end of a range of allowed values:

FE <--- first UUID at given millisecond
FF
00 <--- rollover to next millisecond
01


If we have 1 guard bit and 7 normal bits we get at worst 128 values before 
rollover to next millisecond.
If we have 2 guard bits and 6 normal bits this guaranty is extended to 192.
3/5 will guaranty capacity of 224.
But usefulness of every next guard bits decreases, so I think there is a point 
in only one.

That's my understanding of guard bits in the counter. Please correct me if I'm 
wrong.


At this point we can skip the counter\microseconds entirely, just fill 
everything after unix_ts_ms with randomness. It's still a valid UUIDv7, 
exhibiting much more data locality than UUIDv4. We can adjust this sortability 
measures later.


Best regards, Andrey Borodin.



Re: UUID v7

2024-04-04 Thread Peter Eisentraut

On 26.03.24 18:26, Andrey M. Borodin wrote:

Also, you are initializing 4 bits (I think?) to zero to guard against counter 
rollovers (so it's really just an 8 bit counter?).  But nothing checks against 
such rollovers, so I don't understand the use of that.

No, there's only one guard rollover bit.
Here: uuid->data[6] = (uuid->data[6] & 0xf7);
Bits that are called "guard bits" do not guard anything, they just ensure 
counter capacity when it is initialized.


Uh, I guess I don't understand this at all.  I tried to dig up some 
information about this, but didn't find anything.  What exactly is the 
mechanism of these "counter rollover guards"?  If they don't guard 
anything, what are they supposed to accomplish?






Re: UUID v7

2024-03-26 Thread Andrey M. Borodin
Sorry for this long reply. I was looking on refactoring around 
pg_strong_random() and could not decide what to do. Finally, I decided to post 
at least something.

> On 22 Mar 2024, at 19:15, Peter Eisentraut  wrote:
> 
> I have been studying the uuidv() function.
> 
> I find this code extremely hard to follow.
> 
> We don't need to copy all that documentation from the RFC 4122bis document.  
> People can read that themselves.  What I would like to see is easy to find 
> information what from there we are implementing.  Like,
> 
> - UUID version 7
> - fixed-length dedicated counter
> - counter is 18 bits
> - 4 bits are initialized as zero

I've removed table taken from RFC.

> That's more or less all I would need to know what is going on.
> 
> That said, I don't understand why you say it's an 18 bit counter, when you 
> overwrite 6 bits with variant and version.  Then it's just a 12 bit counter?  
> Which is the size of the rand_a field, so that kind of makes sense.  But 12 
> bits is the recommended minimum, and (in this patch) we don't use 
> sub-millisecond timestamp precision, so we should probably use more than the 
> minimum?


No, we use 4 bits in data[6], 8 bits in data[7], and 6 bits data[8]. It's 18 
total. Essentially, we use both partial bytes and one whole byte between.
There was a bug - we used 1 extra byte of random numbers that was not 
necessary, I think that's what lead you to think that we use 12-bit counter.

> Also, you are initializing 4 bits (I think?) to zero to guard against counter 
> rollovers (so it's really just an 8 bit counter?).  But nothing checks 
> against such rollovers, so I don't understand the use of that.

No, there's only one guard rollover bit.
Here: uuid->data[6] = (uuid->data[6] & 0xf7);
Bits that are called "guard bits" do not guard anything, they just ensure 
counter capacity when it is initialized.
Rollover is carried into time tick here: 
++sequence_counter;
if (sequence_counter > 0x3)
{
/* We only have 18-bit counter */
sequence_counter = 0;
previous_timestamp++;
}

I think we might use 10 bits of microseconds and have 8 bits of a counter. 
Effect of a counter won't change much. But I'm not sure if this is allowed per 
RFC.
If time source is coarse-grained it still acts like a random initializer. And 
when it is precise - time is "natural" source of entropy.


> The code code be organized better.  In the not-increment_counter case, you 
> could use two separate pg_strong_random calls: One to initialize rand_b, 
> starting at >data[8], and one to initialize the counter. Then the 
> former could be shared between the two branches, and the code to assign the 
> sequence_counter to the uuid fields could also be shared.

Call to pg_strong_random() is very expensive in builds without ssl (and even 
with ssl too). If we could ammortize random numbers in small buffers - that 
would save a lot of time (see v8-0002-Buffer-random-numbers.patch upthread). 
Or, perhaps, we can ignore cost of two pg_string_random() calls.

> 
> I would also prefer if the normal case (not-increment_counter) were the first 
> if branch.

Done.

> Some other notes on your patch:
> 
> - Your rebase duplicated the documentation of uuid_extract_timestamp and 
> uuid_extract_version.
> 
> - PostgreSQL code uses uint64 etc. instead of uint64_t etc.
> 
> - It seems the added includes
> 
> #include "access/xlog.h"
> #include "utils/builtins.h"
> #include "utils/datetime.h"
> 
> are not needed.
> 
> - The static variables sequence_counter and previous_timestamp could be kept 
> inside the uuidv7() function.

Fixed.

Thanks!


Best regards, Andrey Borodin.


v24-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-03-22 Thread Peter Eisentraut

On 20.03.24 19:08, Andrey M. Borodin wrote:

On 19 Mar 2024, at 13:55, Peter Eisentraut  wrote:

On 16.03.24 18:43, Andrey M. Borodin wrote:

On 15 Mar 2024, at 14:47, Aleksander Alekseev  wrote:

+1 to the idea. I doubt that anyone will miss it.

PFA v22.
Changes:
1. Squashed all editorialisation by Jelte
2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
3. Remove all traces of uuid_extract_variant()


I have committed a subset of this for now, namely the additions of 
uuid_extract_timestamp() and uuid_extract_version().  These seemed mature and 
agreed upon.  You can rebase the rest of your patch on top of that.


Great! Thank you! PFA v23 with rebase on HEAD.


I have been studying the uuidv() function.

I find this code extremely hard to follow.

We don't need to copy all that documentation from the RFC 4122bis 
document.  People can read that themselves.  What I would like to see is 
easy to find information what from there we are implementing.  Like,


- UUID version 7
- fixed-length dedicated counter
- counter is 18 bits
- 4 bits are initialized as zero

That's more or less all I would need to know what is going on.

That said, I don't understand why you say it's an 18 bit counter, when 
you overwrite 6 bits with variant and version.  Then it's just a 12 bit 
counter?  Which is the size of the rand_a field, so that kind of makes 
sense.  But 12 bits is the recommended minimum, and (in this patch) we 
don't use sub-millisecond timestamp precision, so we should probably use 
more than the minimum?


Also, you are initializing 4 bits (I think?) to zero to guard against 
counter rollovers (so it's really just an 8 bit counter?).  But nothing 
checks against such rollovers, so I don't understand the use of that.


The code code be organized better.  In the not-increment_counter case, 
you could use two separate pg_strong_random calls: One to initialize 
rand_b, starting at >data[8], and one to initialize the counter. 
Then the former could be shared between the two branches, and the code 
to assign the sequence_counter to the uuid fields could also be shared.


I would also prefer if the normal case (not-increment_counter) were the 
first if branch.



Some other notes on your patch:

- Your rebase duplicated the documentation of uuid_extract_timestamp and 
uuid_extract_version.


- PostgreSQL code uses uint64 etc. instead of uint64_t etc.

- It seems the added includes

#include "access/xlog.h"
#include "utils/builtins.h"
#include "utils/datetime.h"

are not needed.

- The static variables sequence_counter and previous_timestamp could be 
kept inside the uuidv7() function.






Re: UUID v7

2024-03-22 Thread Sergey Prokhorenko
Another source: Microservices Pattern: Database per service

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
Microservices Pattern: Database per service

A service's database is private to that service
 |

 |

 |




Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 04:58:59 pm GMT+3, Sergey Prokhorenko 
 wrote:  
 
 BTW: Each microservice should have its own database to ensure data isolation 
and independence, enabling better scalability and fault tolerance
Source: Microservices Pattern: Shared database

| 
| 
|  | 
Microservices Pattern: Shared database


 |

 |

 |



| 
| 
|  | 
Microservices Pattern: Shared database


 |

 |

 |







Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko 
 wrote:  
 
 Why not use a single UUID generator for the database table in this case, 
similar to autoincrement?


Sergey Prokhorenko
sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut 
 wrote:  
 
 On 21.03.24 16:21, Jelte Fennema-Nio wrote:
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin  wrote:
>> Timer-based bits contribute to global sortability. But the real timers we 
>> have are not even millisecond adjusted. We can hope for ~few ms variation in 
>> one datacenter or in presence of atomic clocks.
> 
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.

  

Re: UUID v7

2024-03-22 Thread Sergey Prokhorenko
BTW: Each microservice should have its own database to ensure data isolation 
and independence, enabling better scalability and fault tolerance
Source: Microservices Pattern: Shared database

| 
| 
|  | 
Microservices Pattern: Shared database


 |

 |

 |





Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko 
 wrote:  
 
 Why not use a single UUID generator for the database table in this case, 
similar to autoincrement?


Sergey Prokhorenko
sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut 
 wrote:  
 
 On 21.03.24 16:21, Jelte Fennema-Nio wrote:
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin  wrote:
>> Timer-based bits contribute to global sortability. But the real timers we 
>> have are not even millisecond adjusted. We can hope for ~few ms variation in 
>> one datacenter or in presence of atomic clocks.
> 
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.



Re: UUID v7

2024-03-22 Thread Sergey Prokhorenko
Why not use a single UUID generator for the database table in this case, 
similar to autoincrement?


Sergey Prokhorenko
sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut 
 wrote:  
 
 On 21.03.24 16:21, Jelte Fennema-Nio wrote:
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin  wrote:
>> Timer-based bits contribute to global sortability. But the real timers we 
>> have are not even millisecond adjusted. We can hope for ~few ms variation in 
>> one datacenter or in presence of atomic clocks.
> 
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.

  

Re: UUID v7

2024-03-22 Thread Peter Eisentraut

On 21.03.24 16:21, Jelte Fennema-Nio wrote:

On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin  wrote:

Timer-based bits contribute to global sortability. But the real timers we have 
are not even millisecond adjusted. We can hope for ~few ms variation in one 
datacenter or in presence of atomic clocks.


I think the main benefit of using microseconds would not be
sortability between servers, but sortability between backends.


There is that, and there are also multiple backend workers for one session.





Re: UUID v7

2024-03-22 Thread Sergey Prokhorenko
I think it's better to leave Andrey's patch as is, and add another function in 
the future with a customizable UUIDv7 structure for special use cases. The 
structure description can be in JSON format. See this discussion.


Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 09:54:07 am GMT+3, Andrey M. Borodin 
 wrote:  
 
 

> On 21 Mar 2024, at 20:21, Jelte Fennema-Nio  wrote:
> 
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin  wrote:
>> Timer-based bits contribute to global sortability. But the real timers we 
>> have are not even millisecond adjusted. We can hope for ~few ms variation in 
>> one datacenter or in presence of atomic clocks.
> 
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends. 

Oh, that’s an interesting practical feature!
Se, essentially counter is a theoretical guaranty of sortability in one 
backend, while microseconds are practical sortability between backends.

> However, I don't really think it is incredibly important to get the
> "perfect" approach to filling in rand_a/rand_b right now. As long as
> we don't document what we do, we can choose to change the method
> without breaking backwards compatibility. Because either approach
> results in valid UUIDv7s.

Makes sense to me. I think both methods would be much better than UUIDv4 for 
practical reasons. And even not using extra bits at all (fill them with random 
numbers) would work for 0.999 cases.


Best regards, Andrey Borodin.  

Re: UUID v7

2024-03-22 Thread Andrey M. Borodin



> On 21 Mar 2024, at 20:21, Jelte Fennema-Nio  wrote:
> 
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin  wrote:
>> Timer-based bits contribute to global sortability. But the real timers we 
>> have are not even millisecond adjusted. We can hope for ~few ms variation in 
>> one datacenter or in presence of atomic clocks.
> 
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends. 

Oh, that’s an interesting practical feature!
Se, essentially counter is a theoretical guaranty of sortability in one 
backend, while microseconds are practical sortability between backends.

> However, I don't really think it is incredibly important to get the
> "perfect" approach to filling in rand_a/rand_b right now. As long as
> we don't document what we do, we can choose to change the method
> without breaking backwards compatibility. Because either approach
> results in valid UUIDv7s.

Makes sense to me. I think both methods would be much better than UUIDv4 for 
practical reasons. And even not using extra bits at all (fill them with random 
numbers) would work for 0.999 cases.


Best regards, Andrey Borodin.



Re: UUID v7

2024-03-21 Thread Jelte Fennema-Nio
On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin  wrote:
> Timer-based bits contribute to global sortability. But the real timers we 
> have are not even millisecond adjusted. We can hope for ~few ms variation in 
> one datacenter or in presence of atomic clocks.

I think the main benefit of using microseconds would not be
sortability between servers, but sortability between backends. With
the current counter approach between backends we only have sortability
at the millisecond level.

However, I don't really think it is incredibly important to get the
"perfect" approach to filling in rand_a/rand_b right now. As long as
we don't document what we do, we can choose to change the method
without breaking backwards compatibility. Because either approach
results in valid UUIDv7s.




Re: UUID v7

2024-03-20 Thread Andrey M. Borodin


> On 19 Mar 2024, at 13:55, Peter Eisentraut  wrote:
> 
> On 16.03.24 18:43, Andrey M. Borodin wrote:
>>> On 15 Mar 2024, at 14:47, Aleksander Alekseev  
>>> wrote:
>>> 
>>> +1 to the idea. I doubt that anyone will miss it.
>> PFA v22.
>> Changes:
>> 1. Squashed all editorialisation by Jelte
>> 2. Fixed my erroneous comments on using Method 2 (we are using method 1 
>> instead)
>> 3. Remove all traces of uuid_extract_variant()
> 
> I have committed a subset of this for now, namely the additions of 
> uuid_extract_timestamp() and uuid_extract_version().  These seemed mature and 
> agreed upon.  You can rebase the rest of your patch on top of that.

Great! Thank you! PFA v23 with rebase on HEAD.

> I have started a separate discussion to learn about the precision we can 
> expect from gettimeofday().

Even in presence of real microsecond-enabled and portable timer using 
microseconds does not seem to me an optimal way of utilising UUID bits.

Timer-based bits contribute to global sortability. But the real timers we have 
are not even millisecond adjusted. We can hope for ~few ms variation in one 
datacenter or in presence of atomic clocks.

Time-based bits contribute to global uniqueness, but certainly they are not 
that effective as counter bits.

Time-based bits do not provide local sortability guarantees: some UUIDs might 
get same microseconds or be affected by leap backwards.

I think that microseconds are good only for hardware-specific solutions, not 
for something that runs on variety of platforms, OSes, devices.


Best regards, Andrey Borodin.


v23-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-03-19 Thread Peter Eisentraut

On 16.03.24 18:43, Andrey M. Borodin wrote:

On 15 Mar 2024, at 14:47, Aleksander Alekseev  wrote:

+1 to the idea. I doubt that anyone will miss it.


PFA v22.

Changes:
1. Squashed all editorialisation by Jelte
2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
3. Remove all traces of uuid_extract_variant()


I have committed a subset of this for now, namely the additions of 
uuid_extract_timestamp() and uuid_extract_version().  These seemed 
mature and agreed upon.  You can rebase the rest of your patch on top of 
that.


I have started a separate discussion to learn about the precision we can 
expect from gettimeofday().






Re: UUID v7

2024-03-16 Thread Andrey M. Borodin


> On 15 Mar 2024, at 14:47, Aleksander Alekseev  
> wrote:
> 
> +1 to the idea. I doubt that anyone will miss it.

PFA v22.

Changes:
1. Squashed all editorialisation by Jelte
2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
3. Remove all traces of uuid_extract_variant()

Thanks!


Best regards, Andrey Borodin.


v22-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-03-15 Thread Aleksander Alekseev
Hi,

> > So maybe we don't need the _extract_variant function?
>
> I think it's the best possible solution. The variant has no value besides 
> detecting if a version can be extracted.

+1 to the idea. I doubt that anyone will miss it.

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-03-14 Thread Andrey M. Borodin



> On 14 Mar 2024, at 20:10, Peter Eisentraut  wrote:
> 
> I think the behavior of uuid_extract_var(iant) is wrong.  The code
> takes just two bits to return, but the draft document is quite clear
> that the variant is 4 bits (see Table 1).
 Well, it was correct only for implemented variant. I've made version that 
 implements full table 1 from section 4.1.
>>> I think we are still interpreting this differently.  I think 
>>> uuid_extract_variant should just return whatever is in those four bits. 
>>> Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", 
>>> which I don't think it is correct.  It should return 0 through 15.
>> We will return "do not care" bits. This bits can confuse someone. E.g. for 
>> varaint 0b10 we can return 8, 9, 10 and 11 randomly. Is it OK? BTW for some 
>> reason document lists number 1-15, but your are correct that range is 0-15.
> 
> I agree it's confusing.  Before I studied the RFC 4122bis project, I didn't 
> even know about variant vs. version.  I think overall people will find this 
> more confusing than useful.  If you just want to know, "is this UUID of the 
> kind specified in RFC 4122", you can query it with uuid_extract_version(x) IS 
> NOT NULL.  So maybe we don't need the _extract_variant function?

I think it's the best possible solution. The variant has no value besides 
detecting if a version can be extracted.


Best regards, Andrey Borodin.



Re: UUID v7

2024-03-14 Thread Peter Eisentraut

On 14.03.24 12:25, Andrey M. Borodin wrote:

I think the behavior of uuid_extract_var(iant) is wrong.  The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that 
implements full table 1 from section 4.1.

I think we are still interpreting this differently.  I think uuid_extract_variant should 
just return whatever is in those four bits. Your function comment says "Can return 
only 0, 0b10, 0b110 and 0b111.", which I don't think it is correct.  It should 
return 0 through 15.

We will return "do not care" bits. This bits can confuse someone. E.g. for 
varaint 0b10 we can return 8, 9, 10 and 11 randomly. Is it OK? BTW for some reason 
document lists number 1-15, but your are correct that range is 0-15.


I agree it's confusing.  Before I studied the RFC 4122bis project, I 
didn't even know about variant vs. version.  I think overall people will 
find this more confusing than useful.  If you just want to know, "is 
this UUID of the kind specified in RFC 4122", you can query it with 
uuid_extract_version(x) IS NOT NULL.  So maybe we don't need the 
_extract_variant function?






Re: UUID v7

2024-03-14 Thread Andrey M. Borodin



> On 14 Mar 2024, at 16:07, Peter Eisentraut  wrote:
> 
> On 10.03.24 13:59, Andrey M. Borodin wrote:
>>> The functions uuid_extract_ver and uuid_extract_var could be named
>>> uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
>>> to tell them apart, with only one letter different.
>> Renamed.
> 
> Another related comment: Throughout your patch, swap the order of 
> uuid_extract_variant and uuid_extract_version.  First, this makes more sense 
> because version is subordinate to variant, and also it makes it alphabetical.
I will do it soon.
> 
>>> I think the behavior of uuid_extract_var(iant) is wrong.  The code
>>> takes just two bits to return, but the draft document is quite clear
>>> that the variant is 4 bits (see Table 1).
>> Well, it was correct only for implemented variant. I've made version that 
>> implements full table 1 from section 4.1.
> 
> I think we are still interpreting this differently.  I think 
> uuid_extract_variant should just return whatever is in those four bits. Your 
> function comment says "Can return only 0, 0b10, 0b110 and 0b111.", which I 
> don't think it is correct.  It should return 0 through 15.
We will return "do not care" bits. This bits can confuse someone. E.g. for 
varaint 0b10 we can return 8, 9, 10 and 11 randomly. Is it OK? BTW for some 
reason document lists number 1-15, but your are correct that range is 0-15.

> 
>>> I would have expected that, since gettimeofday() provides microsecond
>>> precision, we'd put the extra precision into "rand_a" as per Section 6.2 
>>> method 3.
>> I had chosen method 2 over method 3 as most portable. Can we be sure how 
>> many bits (after reading milliseconds) are there across different OSes?
> 
> I think this should have been researched.  If we don't know how many bits we 
> have, how do we know we have enough for milliseconds?  I think we should at 
> least have some kind of idea, if we are going to have this conversation.

Bits for milliseconds are strictly defined by the document: there are always 48 
bits, independently from clock resolution.
But I don't think it's main problem for Method 3. Method 1 actually guarantees 
strictly increasing order of UUIDs generated by single backend. Method 3 can 
generate a lot of unsorted data in case of time leaping backward.

BTW Kyzer (in an off-list discussion) and Sergey confirmed that implemented 
method from the patch actually is Method 1.


Best regards, Andrey Borodin.



Re: UUID v7

2024-03-14 Thread Peter Eisentraut

On 10.03.24 13:59, Andrey M. Borodin wrote:

The functions uuid_extract_ver and uuid_extract_var could be named
uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
to tell them apart, with only one letter different.


Renamed.


Another related comment: Throughout your patch, swap the order of 
uuid_extract_variant and uuid_extract_version.  First, this makes more 
sense because version is subordinate to variant, and also it makes it 
alphabetical.



I think the behavior of uuid_extract_var(iant) is wrong.  The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).


Well, it was correct only for implemented variant. I've made version that 
implements full table 1 from section 4.1.


I think we are still interpreting this differently.  I think 
uuid_extract_variant should just return whatever is in those four bits. 
Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", 
which I don't think it is correct.  It should return 0 through 15.



I would have expected that, since gettimeofday() provides microsecond
precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 
3.


I had chosen method 2 over method 3 as most portable. Can we be sure how many 
bits (after reading milliseconds) are there across different OSes?


I think this should have been researched.  If we don't know how many 
bits we have, how do we know we have enough for milliseconds?  I think 
we should at least have some kind of idea, if we are going to have this 
conversation.






Re: UUID v7

2024-03-12 Thread Jelte Fennema-Nio
On Tue, 12 Mar 2024 at 18:18, Sergey Prokhorenko
 wrote:
> Andrey and you simply did not read the RFC a little further down in the text:

You're totally right, sorry about that. Maybe it would be good to move
those subsections around a bit in the RFC though, so that anything
related to only one method is included in the section for that method.




Re: UUID v7

2024-03-12 Thread Sergey Prokhorenko
Hi Jelte,
I am one of the contributors to this RFC.

Andrey's patch corresponds exactly to Fixed-Length Dedicated Counter Bits 
(Method 1).

Andrey and you simply did not read the RFC a little further down in the text:
__

The following sub-topics cover topics related solely with creating reliable 
fixed-length dedicated counters:
   
   - Fixed-Length Dedicated Counter Seeding:
  -
Implementations utilizing the fixed-length counter method randomly initialize 
the counter with each new timestamp tick. However, when the timestamp has not 
increased, the counter is instead incremented by the desired increment logic. 
When utilizing a randomly seeded counter alongside Method 1, the random value 
MAY be regenerated with each counter increment without impacting sortability. 
The downside is that Method 1 is prone to overflows if a counter of adequate 
length is not selected or the random data generated leaves little room for the 
required number of increments. Implementations utilizing fixed-length counter 
method MAY also choose to randomly initialize a portion of the counter rather 
than the entire counter. For example, a 24 bit counter could have the 23 bits 
in least-significant, right-most, position randomly initialized. The remaining 
most significant, left-most counter bit is initialized as zero for the sole 
purpose of guarding against counter rollovers.

  - 
   - Fixed-Length Dedicated Counter Length:
  - Select a counter bit-length that can properly handle the level of 
timestamp precision in use. For example, millisecond precision generally 
requires a larger counter than a timestamp with nanosecond precision. General 
guidance is that the counter SHOULD be at least 12 bits but no longer than 42 
bits. Care must be taken to ensure that the counter length selected leaves room 
for sufficient entropy in the random portion of the UUID after the counter. 
This entropy helps improve the unguessability characteristics of UUIDs created 
within the batch.
  - The following sub-topics cover rollover handling with either type of 
counter method:   

  - ...
  - 
   - Counter Rollover Handling:
  -
Counter rollovers MUST be handled by the application to avoid sorting issues. 
The general guidance is that applications that care about absolute monotonicity 
and sortability should freeze the counter and wait for the timestamp to advance 
which ensures monotonicity is not broken. Alternatively, implementations MAY 
increment the timestamp ahead of the actual time and reinitialize the counter.


Sergey prokhorenkosergeyprokhore...@yahoo.com.au 

On Tuesday, 12 March 2024 at 06:36:13 pm GMT+3, Jelte Fennema-Nio 
 wrote:  
 
 On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin  wrote:
> Sorry for this long and vague explanation, if it still seems too uncertain we 
> can have a chat or something like that. I don't think this number picking 
> stuff deserve to be commented, because it still is quite close to random. RFC 
> gives us too much freedom of choice.

I thought your explanation was quite clear and I agree that this
approach makes the most sense. I sent an email to the RFC authors to
ask for their feedback with you (Andrey) in the CC, because even
though it makes the most sense it does not comply with the either of
method 1 or 2 as described in the RFC.
  

Re: UUID v7

2024-03-12 Thread Jelte Fennema-Nio
On Tue, 12 Mar 2024 at 07:32, Michael Paquier  wrote:
> Sure, there is no problem in discussing a patch to implement a
> behavior.  But I disagree about taking a risk in merging something
> that could become non-compliant with the approved RFC, if the draft is
> approved at the end, of course.  This just strikes me as a bad idea.

I agree that we shouldn't release UUIDv7 support if the RFC describing
that is not yet approved. But I do think it would be a shame if e.g.
the RFC got approved 2 weeks after Postgres its feature freeze. Which
would then mean we'd have to wait another 1.5 years before actually
using uuidv7. Would it be a reasonable compromise to still merge the
patch for PG17 (assuming the code is good to merge with regards to the
current draft RFC), but revert the commit if the RFC is not approved
before some deadline before the release date (e.g. before the first
release candidate)?




Re: UUID v7

2024-03-12 Thread Jelte Fennema-Nio
On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin  wrote:
> Sorry for this long and vague explanation, if it still seems too uncertain we 
> can have a chat or something like that. I don't think this number picking 
> stuff deserve to be commented, because it still is quite close to random. RFC 
> gives us too much freedom of choice.

I thought your explanation was quite clear and I agree that this
approach makes the most sense. I sent an email to the RFC authors to
ask for their feedback with you (Andrey) in the CC, because even
though it makes the most sense it does not comply with the either of
method 1 or 2 as described in the RFC.




Re: UUID v7

2024-03-12 Thread Michael Paquier
On Tue, Mar 12, 2024 at 11:10:37AM +0500, Andrey M. Borodin wrote:
> On 12 Mar 2024, at 10:53, Michael Paquier  wrote:
>> On 22 Jan 2024, at 09:22, Nikolay Samokhvalov  wrote:
>> 
>> And many libraries are already including implementation of UUIDv7 – here are 
>> some examples:
>> 
>> - https://www.npmjs.com/package/uuidv7
>> - https://crates.io/crates/uuidv7
>> - https://github.com/google/uuid/pull/139
> 
> So at least reviewing patch and agreeing on chosen methods and constants 
> makes sense.

Sure, there is no problem in discussing a patch to implement a
behavior.  But I disagree about taking a risk in merging something
that could become non-compliant with the approved RFC, if the draft is
approved at the end, of course.  This just strikes me as a bad idea.
--
Michael


signature.asc
Description: PGP signature


Re: UUID v7

2024-03-12 Thread Andrey M. Borodin



> On 12 Mar 2024, at 10:53, Michael Paquier  wrote:
> 
> It does not strike me as a good idea to rush an implementation without
> a specification officially approved because there is always a risk of
> shipping something that's non-compliant into core.  But perhaps I am
> missing something on the RFC side?

Upthread one of document’s authors commented:

> On 14 Feb 2023, at 19:13, Kyzer Davis (kydavis)  wrote:
> 
> The point is 99% of the work since adoption by the IETF has been ironing out 
> RFC4122's problems and nothing major related to UUIDv6/7/8 which are all in a 
> very good state.

And also


> On 22 Jan 2024, at 09:22, Nikolay Samokhvalov  wrote:
> 
> And many libraries are already including implementation of UUIDv7 – here are 
> some examples:
> 
> - https://www.npmjs.com/package/uuidv7
> - https://crates.io/crates/uuidv7
> - https://github.com/google/uuid/pull/139

So at least reviewing patch and agreeing on chosen methods and constants makes 
sense.


Best regards, Andrey Borodin.



Re: UUID v7

2024-03-11 Thread Michael Paquier
On Mon, Mar 11, 2024 at 11:27:43PM +0500, Andrey M. Borodin wrote:
> Sorry for this long and vague explanation, if it still seems too
> uncertain we can have a chat or something like that. I don't think
> this number picking stuff deserve to be commented, because it still
> is quite close to random. RFC gives us too much freedom of choice.

Speaking about the RFC, I can see that there is a draft but nothing
formal yet.  The last one I can see is v14 from last November:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14

It does not strike me as a good idea to rush an implementation without
a specification officially approved because there is always a risk of
shipping something that's non-compliant into core.  But perhaps I am
missing something on the RFC side?
--
Michael


signature.asc
Description: PGP signature


Re: UUID v7

2024-03-11 Thread Andrey M. Borodin



> On 11 Mar 2024, at 20:56, Jelte Fennema-Nio  wrote:
> 
> Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Thanks!

> Now with the added comments, one thing pops out to me: The comments
> mention that we use "Monotonic Random", but when I read the spec that
> explicitly recommends against using an increment of 1 when using
> monotonic random. I feel like if we use an increment of 1, we're
> better off going for the "Fixed-Length Dedicated Counter Bits" method
> (i.e. change the code to start the counter at 0). See patch 0005 for
> an example of that change.
> 
> I'm also wondering if we really want to use the extra rand_b bits for
> this. The spec says we MAY, but it does remove the amount of
> randomness in our UUIDs.

Method 1 is a just a Method 2 with specifically picked constants.
But I'll have to use some hand-wavy wordings...

UUID consists of these 128 bits:
a. Mandatory 2 var and 4 ver bits.
b. Flexible but strongly recommended 48 bits unix_ts_ms. These bits contribute 
to global sortability of values generated at frequency less than 1KHz.
c. Counter bits:
c1. Initialised with 0 on any time tick.
c2. Initialised with randomness.
c3*. bit width of a counter step (*not counted in 128 bit capacity, can be 
non-integral)
d. Randomness bits.

Method 1 is when c2=0. My implementation of method 2 uses c1=1, c2=17

Consider all UUIDs generated at any given milliseconds. Probability of a 
collision of two UUIDs generated at frequency less than 1KHz is p = 2^-(c2+d)
Capacity of a counter has expected value of c = 2^(c1)*2^(c2-1)/2^c3
To guess next UUID you can correctly pick one of u = 2^(d+c3)

First, observe that c3 contributes unguessability at exactly same scale as 
decreases counter capacity. There is no difference between using bits in d 
directly, or in c3. There is no point in non-zero c3. Every bit that could be 
given to c3 can equally be given to d.

Second, observe that c2 bits contribute to both collision protection and 
counter capacity! And when the time ticks, c2 also contribute to 
unguessability! So, technically, we should consider using all available bits as 
c2 bits.

How many c1 bits do we need? I've chosen one - to prevent occasional counter 
capacity reduction.

If c1 = 1, we can distribute 73 bits between c2 and d. I've chosen c2 = 17 and 
d = 56 as an arbitrary compromise between capacity of one backend per ms and 
prevention of global collision.
This compromise is mostly dictated by maximum frequency of UUID generation by 
one backend, I've chosen 200MHz as a sane value.


This compromise is much easier when you do not have 74 spare bits, this crazy 
amount of information forgives almost any mistake. Imagine you have to 
distribute 10 bits between c2 and d. And you try to prevent collision between 
10 independent devices which need capacity to generate IDs with frequency of 
10KHz each and keep sortability. You would have something like c1=1, c2=3,d=6.

Sorry for this long and vague explanation, if it still seems too uncertain we 
can have a chat or something like that. I don't think this number picking stuff 
deserve to be commented, because it still is quite close to random. RFC gives 
us too much freedom of choice.

Thanks!


Best regards, Andrey Borodin.



Re: UUID v7

2024-03-11 Thread Jelte Fennema-Nio
Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Now with the added comments, one thing pops out to me: The comments
mention that we use "Monotonic Random", but when I read the spec that
explicitly recommends against using an increment of 1 when using
monotonic random. I feel like if we use an increment of 1, we're
better off going for the "Fixed-Length Dedicated Counter Bits" method
(i.e. change the code to start the counter at 0). See patch 0005 for
an example of that change.

I'm also wondering if we really want to use the extra rand_b bits for
this. The spec says we MAY, but it does remove the amount of
randomness in our UUIDs.


v21-0001-Implement-UUID-v7.patch
Description: Binary data


v21-0003-Run-pgindent.patch
Description: Binary data


v21-0005-Change-to-Fixed-Length-Dedicated-Counter-Bits.patch
Description: Binary data


v21-0004-Fix-comments-a-bit.patch
Description: Binary data


v21-0002-Fix-typos.patch
Description: Binary data


Re: UUID v7

2024-03-11 Thread Aleksander Alekseev
Hi,

> Oops, CFbot expectedly found a problem...
> Sorry for the noise, this version, I hope, will pass all the tests.
> Thanks!
>
> Best regards, Andrey Borodin.

I had some issues applying v19 against the current `master` branch.
PFA the rebased and minorly tweaked v20.

The patch LGTM. I think it could be merged unless there are any open
issues left. I don't think so, but maybe I missed something.

-- 
Best regards,
Aleksander Alekseev


v20-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-03-10 Thread Andrey M. Borodin


> On 10 Mar 2024, at 17:59, Andrey M. Borodin  wrote:
> 
> I tried to "make docs", but it gives me gazilion of errors... Is there an 
> easy way to see resulting HTML?

Oops, CFbot expectedly found a problem...
Sorry for the noise, this version, I hope, will pass all the tests.
Thanks!


Best regards, Andrey Borodin.


v19-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-03-10 Thread Andrey M. Borodin
Hi Peter,

thank you for so thoughtful review.

> On 6 Mar 2024, at 12:13, Peter Eisentraut  wrote:
> 
> I have various comments on this patch:
> 
> 
> - doc/src/sgml/func.sgml
> 
> The documentation of the new functions should be broken up a bit.
> It's all one paragraph now.  At least make it several paragraphs, or
> possibly tables or something else.
I've split functions to generate UUIDs from functions to extract stuff.

> 
> Avoid listing the functions twice: Once before the description and
> then again in the description.  That's just going to get out of date.
> The first listing is not necessary, I think.

Fixed.

> The return values in the documentation should use the public-facing
> type names, like "timestamp with time zone" and "smallint".

Fixed.

> The descriptions of the UUID generation functions use handwavy
> language in their descriptions, like "It provides much better data
> locality" or "unacceptable security or business intelligence
> implications", which isn't useful.  Either we cut that all out and
> just say, it creates a UUIDv7, done, look elsewhere for more
> information, or we provide some more concretely useful details.

I've removed all that stuff entirely.

> We shouldn't label a link as "IETF standard" when it's actually a
> draft.

Fixed.

Well, all my modifications of documentation are kind of blind... I tried to 
"make docs", but it gives me gazilion of errors... Is there an easy way to see 
resulting HTML?


> - src/include/catalog/pg_proc.dat
> 
> The description of uuidv4 should be "generate UUID version 4", so that
> it parallels uuidv7.

Fixed.

> The description of uuid_extract_time says 'extract timestamp from UUID
> version 7', the implementation is not limited to version 7.

Fixed.

> I think uuid_extract_time should be named uuid_extract_timestamp,
> because it extracts a timestamp, not a time.

Renamed.

> The functions uuid_extract_ver and uuid_extract_var could be named
> uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
> to tell them apart, with only one letter different.

Renamed.

> - src/test/regress/sql/uuid.sql
> 
> Why are the tests using the input format '{...}', which is not the
> standard one?

Fixed.

> - src/backend/utils/adt/uuid.c
> 
> All this new code should have more comments.  There is a lot of bit
> twiddling going on, and I suppose one is expected to follow along in
> the RFC?  At least each function should have a header comment, so one
> doesn't have to check in pg_proc.dat what it's supposed to do.

I've added some header comment. One big comment is attached to v7, I tried to 
take parts mostly from RFC. Yet there are a lot of my additions that now need 
review...

> I'm suspicious that these functions all appear to return null for
> erroneous input, rather than raising errors.  I think at least some
> explanation for this should be recorded somewhere.

The input is not erroneous per se.
But the fact that
# select 1/0;
ERROR:  division by zero
makes me consider throwing an error. There was some argumentation upthread for 
not throwing error though, but now I cannot find it... maybe I accepted this 
behaviour as more user-friendly.

> I think the behavior of uuid_extract_var(iant) is wrong.  The code
> takes just two bits to return, but the draft document is quite clear
> that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that 
implements full table 1 from section 4.1.

> The uuidv7 function could really use a header comment that explains
> the choices that were made.  The RFC draft provides various options
> that implementations could use; we should describe which ones we
> chose.

Done.

> 
> I would have expected that, since gettimeofday() provides microsecond
> precision, we'd put the extra precision into "rand_a" as per Section 6.2 
> method 3.

I had chosen method 2 over method 3 as most portable. Can we be sure how many 
bits (after reading milliseconds) are there across different OSes? Even if we 
put extra 10 bits of timestamp, we cannot extract safely them.
These bits could promote inter-backend stortability. E.i. when many backends 
generate data fast - this data is still somewhat ordered even within 1ms. But I 
think benefits of this sortability are outweighed by portability(unknown real 
resolution), simplicity(we don't store microseconds, thus do not try to extract 
them).
All this arguments are weak, but if one method would be strictly better than 
another - there would be only one method.

> 
> You use some kind of counter, but could you explain which method that
> counter implements?
I described counter in uuidv7() header.

> 
> I don't see any acknowledgment of issues relating to concurrency or
> restarts.  Like, how do we prevent duplicates being generated by
> concurrent sessions or between restarts?  Maybe the counter or random
> stuff does that, but it's not explained.

I think restart takes more than 1ms, so this is covered with time 

Re: UUID v7

2024-03-05 Thread Peter Eisentraut

On 30.01.24 14:35, Andrey M. Borodin wrote:

On 30 Jan 2024, at 15:33, Junwang Zhao  wrote:

It's always good to add a newline at the end of a  source file, though
this might be nitpicky.


Thanks, also fixed warning found by CFBot.


I have various comments on this patch:


- doc/src/sgml/func.sgml

The documentation of the new functions should be broken up a bit.
It's all one paragraph now.  At least make it several paragraphs, or
possibly tables or something else.

Avoid listing the functions twice: Once before the description and
then again in the description.  That's just going to get out of date.
The first listing is not necessary, I think.

The return values in the documentation should use the public-facing
type names, like "timestamp with time zone" and "smallint".

The descriptions of the UUID generation functions use handwavy
language in their descriptions, like "It provides much better data
locality" or "unacceptable security or business intelligence
implications", which isn't useful.  Either we cut that all out and
just say, it creates a UUIDv7, done, look elsewhere for more
information, or we provide some more concretely useful details.

We shouldn't label a link as "IETF standard" when it's actually a
draft.


- src/include/catalog/pg_proc.dat

The description of uuidv4 should be "generate UUID version 4", so that
it parallels uuidv7.

The description of uuid_extract_time says 'extract timestamp from UUID
version 7', the implementation is not limited to version 7.

I think uuid_extract_time should be named uuid_extract_timestamp,
because it extracts a timestamp, not a time.

The functions uuid_extract_ver and uuid_extract_var could be named
uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
to tell them apart, with only one letter different.


- src/test/regress/sql/uuid.sql

Why are the tests using the input format '{...}', which is not the
standard one?


- src/backend/utils/adt/uuid.c

All this new code should have more comments.  There is a lot of bit
twiddling going on, and I suppose one is expected to follow along in
the RFC?  At least each function should have a header comment, so one
doesn't have to check in pg_proc.dat what it's supposed to do.

I'm suspicious that these functions all appear to return null for
erroneous input, rather than raising errors.  I think at least some
explanation for this should be recorded somewhere.

I think the behavior of uuid_extract_var(iant) is wrong.  The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

The uuidv7 function could really use a header comment that explains
the choices that were made.  The RFC draft provides various options
that implementations could use; we should describe which ones we
chose.

I would have expected that, since gettimeofday() provides microsecond
precision, we'd put the extra precision into "rand_a" as per Section 6.2 
method 3.


You use some kind of counter, but could you explain which method that
counter implements?

I don't see any acknowledgment of issues relating to concurrency or
restarts.  Like, how do we prevent duplicates being generated by
concurrent sessions or between restarts?  Maybe the counter or random
stuff does that, but it's not explained.





Re: UUID v7

2024-01-30 Thread Sergey Prokhorenko
typo:
being carried to time step

should be:being carried to timestemp

Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Tuesday, 30 January 2024 at 04:35:45 pm GMT+3, Andrey M. Borodin 
 wrote:  
 
 

> On 30 Jan 2024, at 15:33, Junwang Zhao  wrote:
> 
> It's always good to add a newline at the end of a  source file, though
> this might be nitpicky.

Thanks, also fixed warning found by CFBot.


Best regards, Andrey Borodin.
  

Re: UUID v7

2024-01-30 Thread Andrey M. Borodin


> On 30 Jan 2024, at 15:33, Junwang Zhao  wrote:
> 
> It's always good to add a newline at the end of a  source file, though
> this might be nitpicky.

Thanks, also fixed warning found by CFBot.


Best regards, Andrey Borodin.


v17-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-01-30 Thread Junwang Zhao
Hi Andrey,

On Tue, Jan 30, 2024 at 5:56 PM Andrey M. Borodin  wrote:
>
>
>
> > On 30 Jan 2024, at 12:28, Sergey Prokhorenko 
> >  wrote:
> >
> >
> > I think this phrase is outdated: "This function can optionally accept a 
> > timestamp used instead of current time.
> > This allows implementation of k-way sotable identifiers.”
> Fixed.
>
> > This phrase is wrong: "Both functions return a version 4 (random) UUID.”
> This applies to functions gen_random_uuid() and uuidv4().
> >
> > For this phrase the reason is unclear and the phrase is most likely 
> > incorrect:
> > if large batches of UUIDs are generated at the
> > +   same time it's possible that some UUIDs will store a time that is 
> > slightly later
> > +   than their actual generation time
>
> I’ve rewritten this phrase, hope it’s more clear now.
>
>
> Best regards, Andrey Borodin.

+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+ pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+ uint16_t result;
+ result = uuid->data[8] >> 6;
+
+ PG_RETURN_UINT16(result);
+}
\ No newline at end of file

It's always good to add a newline at the end of a  source file, though
this might be nitpicky.

-- 
Regards
Junwang Zhao




Re: UUID v7

2024-01-30 Thread Andrey M. Borodin


> On 30 Jan 2024, at 12:28, Sergey Prokhorenko  
> wrote:
> 
> 
> I think this phrase is outdated: "This function can optionally accept a 
> timestamp used instead of current time.
> This allows implementation of k-way sotable identifiers.”
Fixed.

> This phrase is wrong: "Both functions return a version 4 (random) UUID.”
This applies to functions gen_random_uuid() and uuidv4().
> 
> For this phrase the reason is unclear and the phrase is most likely incorrect:
> if large batches of UUIDs are generated at the
> +   same time it's possible that some UUIDs will store a time that is 
> slightly later
> +   than their actual generation time

I’ve rewritten this phrase, hope it’s more clear now.


Best regards, Andrey Borodin.


v16-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-01-29 Thread Sergey Prokhorenko
Andrey,

I think this phrase is outdated: "This function can optionally accept a 
timestamp used instead of current time.This allows implementation of k-way 
sotable identifiers."
This phrase is wrong: "Both functions return a version 4 (random) UUID."
For this phrase the reason is unclear and the phrase is most likely incorrect:
if large batches of UUIDs are generated at the+   same time it's possible that 
some UUIDs will store a time that is slightly later+   than their actual 
generation time

Sergey Prokhorenko

sergeyprokhore...@yahoo.com.au 

On Tuesday, 30 January 2024 at 09:55:04 am GMT+3, Andrey M. Borodin 
 wrote:  
 
 

> On 30 Jan 2024, at 01:38, Jelte Fennema-Nio  wrote:
> 
> Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
> But following the spec seems more important than a nice feature to me.

PFA v15. Changes: removed timestamp argument, incorporated Jelte’s 
documentation addons.

Thanks!


Best regards, Andrey Borodin.
  

Re: UUID v7

2024-01-29 Thread Andrey M. Borodin


> On 30 Jan 2024, at 01:38, Jelte Fennema-Nio  wrote:
> 
> Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
> But following the spec seems more important than a nice feature to me.

PFA v15. Changes: removed timestamp argument, incorporated Jelte’s 
documentation addons.

Thanks!


Best regards, Andrey Borodin.


v15-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-01-29 Thread Sergey Prokhorenko
Andrey,
 I understand and agree with your goals. But instead of dangerous universal 
functions, it is better to develop safe highly specialized functions that 
implement only these goals.
There should not be a function uuidv7(T) from an arbitrary timestamp, but there 
should be a special function that implements your algorithm: uuidv8(now() + '1 
century' * random(0,10)).
I replaced 1 day with 1 century because the spread of 1 day is too small. Over 
time, records will be inserted between existing records, which is undesirable.
Similarly, if we need to calculate the partition id, then we do not need to use 
the uuid_extract_time() function to provide the extracted timestamp, the 
accuracy of which cannot be guaranteed. Instead, we need to give exactly the 
partition id, calculated using the uuidv7 timestamp. For example, partitions 
may have approximately a month interval between each other.
As for the documentation, it must be indicated that the UUIDv7 structure is not 
timestamp + random, but timestamp + randomly seeded counter + random, like in 
all advanced implementations.

Sergey Prokhorenko
sergeyprokhore...@yahoo.com.au
__ 

On Monday, 29 January 2024 at 09:32:54 pm GMT+3, Andrey M. Borodin 
 wrote:  
 
 

> On 25 Jan 2024, at 22:04, Sergey Prokhorenko  
> wrote:
> 
> Aleksander,
> 
> In this case the documentation must state that the functions 
> uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that 
> developers may use these functions with caution at their own risk, and these 
> functions are not recommended for production environment.

Refining documentation is good. However, saying that these functions are not 
recommended for production must be based on some real threats.

> 
> The function uuidv7(T) is not better than uuid_extract_time(). Careless 
> developers may well pass any business date into this function: document date, 
> registration date, payment date, reporting date, start date of the current 
> month, data download date, and even a constant. This would be a profanation 
> of UUIDv7 with very negative consequences.

Even if the developer pass constant time to uuidv7(T) they will get what they 
asked for - unique identifier. Moreover - it still will be keeping locality. 
There will be no negative consequences at all.
On the contrary, experienced developer can leverage parameter when data 
locality should be reduced. If you have serveral streams of data, you might 
want to introduce some shift in reduce contention.
For example, you can generate uuidv7(now() + '1 day' * random(0,10)). This will 
split 1 contention point to 10 and increase ingestion performance 10x-fold.

> On 29 Jan 2024, at 18:58, Junwang Zhao  wrote:
> 
> If other timestamp sources or
> a custom timestamp epoch are required, UUIDv8 MUST be used.

Well, yeah. RFC says this... in 4 capital letters :) I believe it's kind of a 
big deficiency that k-way sortable identifiers are not implementable on top of 
UUIDv7. Well, let's go without this function. UUIDv7 is still an improvement 
over previous versions.


Jelte, your documentation corrections looks good to me, I'll include them in 
next version.

Thanks!


Best regards, Andrey Borodin.  

Re: UUID v7

2024-01-29 Thread Jelte Fennema-Nio
On Mon, 29 Jan 2024 at 19:32, Andrey M. Borodin  wrote:
> Even if the developer pass constant time to uuidv7(T) they will get what they 
> asked for - unique identifier. Moreover - it still will be keeping locality. 
> There will be no negative consequences at all.

It will be significantly "less unique" than if they wouldn't pass a
constant time. Basically it would become a UUIDv4, but with 74 bits of
random data instead of 122. That might not be enough anymore to
"guarantee" uniqueness. I guess that's why it is required to use
UUIDv8 in these cases, because correct usage is now a requirement for
assuming uniqueness. And for UUIDv8 the spec says this:

> UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.

> > On 29 Jan 2024, at 18:58, Junwang Zhao  wrote:
> >
> > If other timestamp sources or
> > a custom timestamp epoch are required, UUIDv8 MUST be used.
>
> Well, yeah. RFC says this... in 4 capital letters :)

As an FYI, there is an RFC that defines these keywords that's why they
are capital letters: https://www.ietf.org/rfc/rfc2119.txt

> I believe it's kind of a big deficiency that k-way sortable identifiers are 
> not implementable on top of UUIDv7. Well, let's go without this function. 
> UUIDv7 is still an improvement over previous versions.

Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
But following the spec seems more important than a nice feature to me.




Re: UUID v7

2024-01-29 Thread Andrey M. Borodin



> On 25 Jan 2024, at 22:04, Sergey Prokhorenko  
> wrote:
> 
> Aleksander,
> 
> In this case the documentation must state that the functions 
> uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that 
> developers may use these functions with caution at their own risk, and these 
> functions are not recommended for production environment.

Refining documentation is good. However, saying that these functions are not 
recommended for production must be based on some real threats.

> 
> The function uuidv7(T) is not better than uuid_extract_time(). Careless 
> developers may well pass any business date into this function: document date, 
> registration date, payment date, reporting date, start date of the current 
> month, data download date, and even a constant. This would be a profanation 
> of UUIDv7 with very negative consequences.

Even if the developer pass constant time to uuidv7(T) they will get what they 
asked for - unique identifier. Moreover - it still will be keeping locality. 
There will be no negative consequences at all.
On the contrary, experienced developer can leverage parameter when data 
locality should be reduced. If you have serveral streams of data, you might 
want to introduce some shift in reduce contention.
For example, you can generate uuidv7(now() + '1 day' * random(0,10)). This will 
split 1 contention point to 10 and increase ingestion performance 10x-fold.

> On 29 Jan 2024, at 18:58, Junwang Zhao  wrote:
> 
> If other timestamp sources or
> a custom timestamp epoch are required, UUIDv8 MUST be used.

Well, yeah. RFC says this... in 4 capital letters :) I believe it's kind of a 
big deficiency that k-way sortable identifiers are not implementable on top of 
UUIDv7. Well, let's go without this function. UUIDv7 is still an improvement 
over previous versions.


Jelte, your documentation corrections looks good to me, I'll include them in 
next version.

Thanks!


Best regards, Andrey Borodin.



Re: UUID v7

2024-01-29 Thread Junwang Zhao
On Mon, Jan 29, 2024 at 7:38 PM Jelte Fennema-Nio  wrote:
>
> tl;dr I believe we should remove the uuidv7(timestamp) function from
> this patchset.
>
> On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
>  wrote:
> > In this case the documentation must state that the functions 
> > uuid_extract_time() and uuidv7(T) are against the RFC requirements, and 
> > that developers may use these functions with caution at their own risk, and 
> > these functions are not recommended for production environment.
> >
> > The function uuidv7(T) is not better than uuid_extract_time(). Careless 
> > developers may well pass any business date into this function: document 
> > date, registration date, payment date, reporting date, start date of the 
> > current month, data download date, and even a constant. This would be a 
> > profanation of UUIDv7 with very negative consequences.
>
> After re-reading the RFC more diligently, I'm inclined to agree with
> Sergey that uuidv7(timestamp) is quite problematic. And I would even
> say that we should not provide uuidv7(timestamp) at all, and instead
> should only provide uuidv7(). Providing an explicit timestamp for
> UUIDv7 is explicitly against the spec (in my reading):
>
> > Implementations acquire the current timestamp from a reliable
> > source to provide values that are time-ordered and continually
> > increasing.  Care must be taken to ensure that timestamp changes
> > from the environment or operating system are handled in a way that
> > is consistent with implementation requirements.  For example, if
> > it is possible for the system clock to move backward due to either
> > manual adjustment or corrections from a time synchronization
> > protocol, implementations need to determine how to handle such
> > cases.  (See Altering, Fuzzing, or Smearing below.)
> >
> > ...
> >
> > UUID version 1 and 6 both utilize a Gregorian epoch timestamp
> > while UUIDv7 utilizes a Unix Epoch timestamp.  If other timestamp
> > sources or a custom timestamp epoch are required, UUIDv8 MUST be
> > used.
> >
> > ...
> >
> > Monotonicity (each subsequent value being greater than the last) is
> > the backbone of time-based sortable UUIDs.
>
> By allowing users to provide a timestamp we're not using a continually
> increasing timestamp for our UUIDv7 generation, and thus it would not
> be a valid UUIDv7 implementation.
>
> I do agree with others however, that being able to pass in an
> arbitrary timestamp for UUID generation would be very useful. For
> example to be able to partition by the timestamp in the UUID and then
> being able to later load data for an older timestamp and have it be
> added to to the older partition. But it's possible to do that while
> still following the spec, by using a UUIDv8 instead of UUIDv7. So for
> this usecase we could make a helper function that generates a UUIDv8
> using the same format as a UUIDv7, but allows storing arbitrary
> timestamps. You might say, why not sligthly change UUIDv7 then? Well
> mainly because of this critical sentence in the RFC:
>
> > UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.
>
> That would allow us to say that using this UUIDv8 helper requires
> careful usage and checks if uniqueness is required.
>
> So I believe we should remove the uuidv7(timestamp) function from this 
> patchset.

Agreed, the RFC section 6.1[1] has the following statements:

```
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while
UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or
a custom timestamp epoch are required, UUIDv8 MUST be used.
```

In contrib/uuid-ossp, uuidv1 does not allow the user to supply a
custom timestamp,
so I think it should be the same for uuidv6 and uuidv7.

And I have the same feeling that we should not consider v6 and v8 in
this patch.


[1]: 
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#section-6.1-2.4.1

>
> I don't see a problem with including uuid_extract_time though. Afaict
> the only thing the RFC says about extracting timestamps is that the
> RFC does not give a requirement or guarantee about how close the
> stored timestamp is to the actual time:
>
> > Implementations MAY alter the actual timestamp.  Some examples
> > include security considerations around providing a real clock
> > value within a UUID, to correct inaccurate clocks, to handle leap
> > seconds, or instead of dividing a number of microseconds by 1000
> > to obtain a millisecond value; dividing by 1024 (or some other
> > value) for performance reasons.  This specification makes no
> > requirement or guarantee about how close the clock value needs to
> > be to the actual time.
>
> I see no reason why we cannot make stronger guarantees about the
> timestamps that we use to generate UUIDs with our uuidv7() function.
> And then we can update the documentation for
> uuid_extract_time to something like this:
>
> > This function extracts a timestamptz from UUID versions 1, 6 and 7. For 

Re: UUID v7

2024-01-29 Thread Jelte Fennema-Nio
On Thu, 25 Jan 2024 at 13:31, Aleksander Alekseev
 wrote:
> PFA v14.

+uuidv4 () uuid
+
+   Both functions return a version 4 (random) UUID. This is the most commonly
+   used type of UUID and is appropriate when random distribution of keys does
+   not affect performance of an application.
+
+uuidv7 () uuid
+
+   This function returns a version 7 (time-ordered + random) UUID. This UUID
+   version should be used when application prefers locality of identifiers.
+

I think it would be good to explain the tradeoffs between uuidv4 and
uuidv7 a bit better. How about changing the docs to something like
this:

uuidv4 () uuid

Both functions return a version 4 (random) UUID. UUIDv4 is one of the
most commonly used types of UUID. It is appropriate when random
distribution of keys does not affect performance of an application or
when exposing the generation time of a UUID has unacceptable security
or business intelligence implications.

uuidv7 () uuid

This function returns a version 7 (time-ordered + random) UUID. It
provides much better data locality than UUIDv4, which can greatly
improve performance when UUID is used in a BTREE index (the default
index type in PostgreSQL). To achieve this data locality, UUIDv7
embeds its own generation time into the UUID. If exposing such a
timestamp has unacceptable security or business intelligence
implications, then uuidv4() should be used instead.





Re: UUID v7

2024-01-29 Thread Jelte Fennema-Nio
tl;dr I believe we should remove the uuidv7(timestamp) function from
this patchset.

On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
 wrote:
> In this case the documentation must state that the functions 
> uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that 
> developers may use these functions with caution at their own risk, and these 
> functions are not recommended for production environment.
>
> The function uuidv7(T) is not better than uuid_extract_time(). Careless 
> developers may well pass any business date into this function: document date, 
> registration date, payment date, reporting date, start date of the current 
> month, data download date, and even a constant. This would be a profanation 
> of UUIDv7 with very negative consequences.

After re-reading the RFC more diligently, I'm inclined to agree with
Sergey that uuidv7(timestamp) is quite problematic. And I would even
say that we should not provide uuidv7(timestamp) at all, and instead
should only provide uuidv7(). Providing an explicit timestamp for
UUIDv7 is explicitly against the spec (in my reading):

> Implementations acquire the current timestamp from a reliable
> source to provide values that are time-ordered and continually
> increasing.  Care must be taken to ensure that timestamp changes
> from the environment or operating system are handled in a way that
> is consistent with implementation requirements.  For example, if
> it is possible for the system clock to move backward due to either
> manual adjustment or corrections from a time synchronization
> protocol, implementations need to determine how to handle such
> cases.  (See Altering, Fuzzing, or Smearing below.)
>
> ...
>
> UUID version 1 and 6 both utilize a Gregorian epoch timestamp
> while UUIDv7 utilizes a Unix Epoch timestamp.  If other timestamp
> sources or a custom timestamp epoch are required, UUIDv8 MUST be
> used.
>
> ...
>
> Monotonicity (each subsequent value being greater than the last) is
> the backbone of time-based sortable UUIDs.

By allowing users to provide a timestamp we're not using a continually
increasing timestamp for our UUIDv7 generation, and thus it would not
be a valid UUIDv7 implementation.

I do agree with others however, that being able to pass in an
arbitrary timestamp for UUID generation would be very useful. For
example to be able to partition by the timestamp in the UUID and then
being able to later load data for an older timestamp and have it be
added to to the older partition. But it's possible to do that while
still following the spec, by using a UUIDv8 instead of UUIDv7. So for
this usecase we could make a helper function that generates a UUIDv8
using the same format as a UUIDv7, but allows storing arbitrary
timestamps. You might say, why not sligthly change UUIDv7 then? Well
mainly because of this critical sentence in the RFC:

> UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.

That would allow us to say that using this UUIDv8 helper requires
careful usage and checks if uniqueness is required.

So I believe we should remove the uuidv7(timestamp) function from this patchset.

I don't see a problem with including uuid_extract_time though. Afaict
the only thing the RFC says about extracting timestamps is that the
RFC does not give a requirement or guarantee about how close the
stored timestamp is to the actual time:

> Implementations MAY alter the actual timestamp.  Some examples
> include security considerations around providing a real clock
> value within a UUID, to correct inaccurate clocks, to handle leap
> seconds, or instead of dividing a number of microseconds by 1000
> to obtain a millisecond value; dividing by 1024 (or some other
> value) for performance reasons.  This specification makes no
> requirement or guarantee about how close the clock value needs to
> be to the actual time.

I see no reason why we cannot make stronger guarantees about the
timestamps that we use to generate UUIDs with our uuidv7() function.
And then we can update the documentation for
uuid_extract_time to something like this:

> This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
> versions and variants this function returns NULL. The extracted timestamp
> does not necessarily equate to the time of UUID generation. How close it is
> to the actual time depends on the implementation that generated to UUID.
> The uuidv7() function provided PostgreSQL will normally store the actual time 
> of
> generation to in the UUID, but if large batches of UUIDs are generated at the
> same time it's possible that some UUIDs will store a time that is slightly 
> later
> than their actual generation time.




Re: UUID v7

2024-01-28 Thread Sergey Prokhorenko
By the way, the Go language has also already implemented a function for UUIDv7: 
https://pkg.go.dev/github.com/gofrs/uuid#NewV7




Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Thursday, 25 January 2024 at 12:49:46 am GMT+3, Sergey Prokhorenko 
 wrote:  
 
 That's right! There is no point in waiting for the official approval of the 
new RFC, which obviously will not change anything. I have been a contributor to 
this RFC for several years, and I can testify that every aspect imaginable has 
been thoroughly researched and agreed upon. Nothing new will definitely appear 
in the new RFC.


Sergey prokhorenkosergeyprokhore...@yahoo.com.au 

On Monday, 22 January 2024 at 07:22:32 am GMT+3, Nikolay Samokhvalov 
 wrote:  
 
 On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin  wrote:



> On 19 Jan 2024, at 13:25, Andrey Borodin  wrote:
> 
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list


Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and 
functions work well. I especially like that fact that we keep 
uuid_extract_time(..) here – this is a great thing to have for time-based 
partitioning, and in many cases we will be able to decide not to have a 
creation column timestamp (e.g., "created_at") at all, saving 8 bytes.
The docs and comments look great too.
Overall, the patch looks mature enough. It would be great to have it in pg17. 
Yes, the RFC is not fully finalized yet, but it's very close. And many 
libraries are already including implementation of UUIDv7 – here are some 
examples:
- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139
Nik

Re: UUID v7

2024-01-25 Thread Sergey Prokhorenko
Aleksander,

In this case the documentation must state that the functions 
uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that 
developers may use these functions with caution at their own risk, and these 
functions are not recommended for production environment.

The function uuidv7(T) is not better than uuid_extract_time(). Careless 
developers may well pass any business date into this function: document date, 
registration date, payment date, reporting date, start date of the current 
month, data download date, and even a constant. This would be a profanation of 
UUIDv7 with very negative consequences.

Sergey prokhorenkosergeyprokhore...@yahoo.com.au 

On Thursday, 25 January 2024 at 03:06:50 pm GMT+3, Aleksander Alekseev 
 wrote:  
 
 Hi,

> Postgres always was a bit hackerish, allowing slightly more then is safe. 
> I.e. you can define immutable function that is not really immutable, turn off 
> autovacuum or fsync. Why bother with safety guards here?
> My opinion is that we should have this function to extract timestamp. Even if 
> it can return strange values for imprecise RFC implementation.

Completely agree.

Users that don't like or don't need it can pretend there are no
uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide
them however, users that need them will end up writing their own
probably buggy and not compatible implementations. That would be much
worse.

> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).
>
> OK, it seems like we have some consensus on ERRORing..
>
> Do we have any other open items? Does v13 address all open items? Maybe let’s 
> compose better error message?
>
> +1 for erroring when ts is outside range.
>
> v13 looks good for me. I think we have reached a optimal compromise.

Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

-- 
Best regards,
Aleksander Alekseev
  

Re: UUID v7

2024-01-25 Thread Aleksander Alekseev
Hi,

> Andrey, many thanks for the updated patch.
>
> LGTM, cfbot is happy and I don't think we have any open items left. So
> changing CF entry status back to RfC.

PFA v14. I changed:

```
elog(ERROR, "Time argument of UUID v7 cannot exceed 6 bytes");
```

... to:

```
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("Time argument of UUID v7 is outside of the valid range")));
```

Which IMO tells a bit more to the average user and is translatable.

> At a quick glance, the patch needs improving English, IMO.

Agree. We could use some help from a native English speaker for this.

-- 
Best regards,
Aleksander Alekseev


v14-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-01-25 Thread Aleksander Alekseev
Hi,

> Postgres always was a bit hackerish, allowing slightly more then is safe. 
> I.e. you can define immutable function that is not really immutable, turn off 
> autovacuum or fsync. Why bother with safety guards here?
> My opinion is that we should have this function to extract timestamp. Even if 
> it can return strange values for imprecise RFC implementation.

Completely agree.

Users that don't like or don't need it can pretend there are no
uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide
them however, users that need them will end up writing their own
probably buggy and not compatible implementations. That would be much
worse.

> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).
>
> OK, it seems like we have some consensus on ERRORing..
>
> Do we have any other open items? Does v13 address all open items? Maybe let’s 
> compose better error message?
>
> +1 for erroring when ts is outside range.
>
> v13 looks good for me. I think we have reached a optimal compromise.

Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-01-25 Thread Przemysław Sztoch

Andrey M. Borodin wrote on 25.01.2024 07:51:



On 25 Jan 2024, at 09:40, Nikolay Samokhvalov  wrote:

 From a practical point of view, these two things are extremely important to 
have to support partitioning. It is better to implement limitations than throw 
them away.

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. 
you can define immutable function that is not really immutable, turn off 
autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if 
it can return strange values for imprecise RFC implementation.



On 25 Jan 2024, at 02:15, Jelte Fennema-Nio  wrote:

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).


OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s 
compose better error message?

+1 for erroring when ts is outside range.

v13 looks good for me. I think we have reached a optimal compromise.

--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: UUID v7

2024-01-25 Thread Sergey Prokhorenko
I am against turning the DBMS into another C++, in which they do not so much 
design something new as fix bugs in production after a crash.
As for partitioning, I already wrote to Andrey Borodin that we need a special 
function to generate a partition id using the UUIDv7 timestamp or even 
simultaneously with the generation of the timestamp. For example, every month 
(or so, since precision is not needed here) a new partition is created. Here's 
a good example: 
https://elixirforum.com/t/partitioning-postgres-tables-by-timestamp-based-uuids/60916
But without a separate function for extracting the entire timestamp from the 
UUID! Let's solve this specific problem, and not give the developers a grenade 
with the safety removed. Many developers have already decided to store the 
timestamp in UUIDv7, so as not to create a separate created_at field. Then they 
will delete table records with the old timestamp, etc. Horrible mistakes are 
simply guaranteed.

Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Thursday, 25 January 2024 at 09:51:58 am GMT+3, Andrey M. Borodin 
 wrote:  
 
 

> On 25 Jan 2024, at 09:40, Nikolay Samokhvalov  wrote:
> 
> From a practical point of view, these two things are extremely important to 
> have to support partitioning. It is better to implement limitations than 
> throw them away.

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. 
you can define immutable function that is not really immutable, turn off 
autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if 
it can return strange values for imprecise RFC implementation.


> On 25 Jan 2024, at 02:15, Jelte Fennema-Nio  wrote:
> 
> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).


OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s 
compose better error message?


Best regards, Andrey Borodin.  

Re: UUID v7

2024-01-24 Thread Andrey M. Borodin



> On 25 Jan 2024, at 09:40, Nikolay Samokhvalov  wrote:
> 
> From a practical point of view, these two things are extremely important to 
> have to support partitioning. It is better to implement limitations than 
> throw them away.

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. 
you can define immutable function that is not really immutable, turn off 
autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if 
it can return strange values for imprecise RFC implementation.


> On 25 Jan 2024, at 02:15, Jelte Fennema-Nio  wrote:
> 
> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).


OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s 
compose better error message?


Best regards, Andrey Borodin.



Re: UUID v7

2024-01-24 Thread Nikolay Samokhvalov
On Wed, Jan 24, 2024 at 8:40 PM Nikolay Samokhvalov  wrote:

> On Wed, Jan 24, 2024 at 1:52 PM Sergey Prokhorenko <
> sergeyprokhore...@yahoo.com.au> wrote:
>
>> That's right! There is no point in waiting for the official approval of
>> the new RFC, which obviously will not change anything. I have been a
>> contributor to this RFC
>> 
>> for several years, and I can testify that every aspect imaginable has been
>> thoroughly researched and agreed upon. Nothing new will definitely
>> appear in the new RFC.
>>
>
> From a practical point of view, these two things are extremely important
> to have to support partitioning. It is better to implement limitations than
> throw them away.
>
> Without them, this functionality will be of a very limited use in
> databases. We need to think about large tables – which means partitioning.
>

apologies -- this was a response to another email from you:

> "Other people" think that extracting the timestamp from UUIDv7 in
violation of the new RFC, and generating UUIDv7 from the timestamp were
both terrible and poorly thought out ideas. The authors of the new RFC had
very good reasons to prohibit this. And the problems you face are the best
confirmation of the correctness of the new RFC. It’s better to throw all
this gag out of the official patch. Don't tempt developers to break the new
RFC with these error-producing functions.


Re: UUID v7

2024-01-24 Thread Nikolay Samokhvalov
On Wed, Jan 24, 2024 at 1:52 PM Sergey Prokhorenko <
sergeyprokhore...@yahoo.com.au> wrote:

> That's right! There is no point in waiting for the official approval of
> the new RFC, which obviously will not change anything. I have been a
> contributor to this RFC
> 
> for several years, and I can testify that every aspect imaginable has been
> thoroughly researched and agreed upon. Nothing new will definitely appear
> in the new RFC.
>

>From a practical point of view, these two things are extremely important to
have to support partitioning. It is better to implement limitations than
throw them away.

Without them, this functionality will be of a very limited use in
databases. We need to think about large tables – which means partitioning.

Nik


Re: UUID v7

2024-01-24 Thread Sergey Prokhorenko
That's right! There is no point in waiting for the official approval of the new 
RFC, which obviously will not change anything. I have been a contributor to 
this RFC for several years, and I can testify that every aspect imaginable has 
been thoroughly researched and agreed upon. Nothing new will definitely appear 
in the new RFC.


Sergey prokhorenkosergeyprokhore...@yahoo.com.au 

On Monday, 22 January 2024 at 07:22:32 am GMT+3, Nikolay Samokhvalov 
 wrote:  
 
 On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin  wrote:



> On 19 Jan 2024, at 13:25, Andrey Borodin  wrote:
> 
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list


Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and 
functions work well. I especially like that fact that we keep 
uuid_extract_time(..) here – this is a great thing to have for time-based 
partitioning, and in many cases we will be able to decide not to have a 
creation column timestamp (e.g., "created_at") at all, saving 8 bytes.
The docs and comments look great too.
Overall, the patch looks mature enough. It would be great to have it in pg17. 
Yes, the RFC is not fully finalized yet, but it's very close. And many 
libraries are already including implementation of UUIDv7 – here are some 
examples:
- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139
Nik  

Re: UUID v7

2024-01-24 Thread Sergey Prokhorenko
"Other people" think that extracting the timestamp from UUIDv7 in violation of 
the new RFC, and generating UUIDv7 from the timestamp were both terrible and 
poorly thought out ideas. The authors of the new RFC had very good reasons to 
prohibit this. And the problems you face are the best confirmation of the 
correctness of the new RFC. It’s better to throw all this gag out of the 
official patch. Don't tempt developers to break the new RFC with these 
error-producing functions.


Sergey prokhorenkosergeyprokhore...@yahoo.com.au 

On Wednesday, 24 January 2024 at 04:30:02 pm GMT+3, Aleksander Alekseev 
 wrote:  
 
 Hi,

> Function to extract timestamp does not provide any guarantees at all. 
> Standard states this, see Kyzer answers upthread.
> Moreover, standard urges against relying on that if uuidX was generated 
> before uuidY, then uuidX happen, but does not guaranty that.
> All what is guaranteed is the uniqueness at certain conditions.
>
> > Otherwise you can calculate crc64(X) or sha256(X)
> > internally in order to generate an unique ID and claim that it's fine.
> >
> > Values that violate named invariants should be rejected with an error.
>
> Think about the value that you pass to uuid generation function as an 
> entropy. It’s there to ensure uniqueness and promote ordering (but not 
> guarantee).

If the standard doesn't guarantee something it doesn't mean it forbids
us to give stronger guarantees. I'm convinced that these guarantees
will be useful in real-world applications, at least the ones acting
exclusively within Postgres.

This being said, I understand your point of view too. Let's see what
other people think.

-- 
Best regards,
Aleksander Alekseev
  

Re: UUID v7

2024-01-24 Thread Jelte Fennema-Nio
On Wed, 24 Jan 2024 at 21:47, Marcos Pegoraro  wrote:
>
> I understand your point, but
> '2000-01-01' :: timestamp and '1900-01-01' :: timestamp are both valid 
> timestamps.
>
> So looks strange if user can do
> select uuidv7(TIMESTAMP '2000-01-01')
> but cannot do
> select uuidv7(TIMESTAMP '1900-01-01')



I think that would be okay honestly. I don't think there's any
reasonable value for the uuid when a timestamp is given outside of the
date range that the uuid7 "algorithm" supports.

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).




Re: UUID v7

2024-01-24 Thread Marcos Pegoraro
I understand your point, but
'2000-01-01' :: timestamp and '1900-01-01' :: timestamp are both valid
timestamps.

So looks strange if user can do
select uuidv7(TIMESTAMP '2000-01-01')
but cannot do
select uuidv7(TIMESTAMP '1900-01-01')

Regards
Marcos


Em qua., 24 de jan. de 2024 às 14:51, Andrey Borodin 
escreveu:

>
>
> > On 24 Jan 2024, at 22:00, Marcos Pegoraro  wrote:
> >
> > Is enough from 1970 ?
> Per standard unix_ts_ms field is a number of milliseconds from UNIX start
> date 1970-01-01.
>
> > How about if user wants to have an UUID of his birth date ?
>
> I've claimed my
> 0078c135-bd00-70b1-865a-63c3741922a5
>
> But again, UUIDs are not designed to store timestamp. They are unique and
> v7 promote data locality via time-ordering.
>
>
> Best regards, Andrey Borodin.


Re: UUID v7

2024-01-24 Thread Andrey Borodin



> On 24 Jan 2024, at 22:00, Marcos Pegoraro  wrote:
> 
> Is enough from 1970 ?
Per standard unix_ts_ms field is a number of milliseconds from UNIX start date 
1970-01-01.

> How about if user wants to have an UUID of his birth date ?

I've claimed my
0078c135-bd00-70b1-865a-63c3741922a5

But again, UUIDs are not designed to store timestamp. They are unique and v7 
promote data locality via time-ordering.


Best regards, Andrey Borodin.



Re: UUID v7

2024-01-24 Thread Marcos Pegoraro
 Is enough from 1970 ?
How about if user wants to have an UUID of his birth date ?

regards
Marcos

Em qua., 24 de jan. de 2024 às 13:54, Andrey M. Borodin <
x4...@yandex-team.ru> escreveu:

>
>
> > On 24 Jan 2024, at 20:46, Aleksander Alekseev 
> wrote:
> >
> > Only the
> > fact that timestamp from the far past generates UUID from the future
> > bothers me.
>
> PFA implementation of guard checks, but I'm afraid that this can cause
> failures in ID generation unexpected to the user...
> See tests
>
> +-- errors in edge cases of UUID v7
> +SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval
> '0ms');
> +SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); --
> ERROR expected
> +SELECT 1 FROM
> uuidv7(uuid_extract_time('--7FFF-B000-'));
> +SELECT
> uuidv7(uuid_extract_time('--7FFF-B000-')+'1ms'); --
> ERROR expected
>
> Range is from 1970-01-01 00:00:00 to 10889-08-02 05:31:50.655. I'm not
> sure we should give this information in error message...
> Thanks!
>
>
> Best regards, Andrey Borodin.
>


Re: UUID v7

2024-01-24 Thread Andrey M. Borodin


> On 24 Jan 2024, at 20:46, Aleksander Alekseev  
> wrote:
> 
> Only the
> fact that timestamp from the far past generates UUID from the future
> bothers me.

PFA implementation of guard checks, but I'm afraid that this can cause failures 
in ID generation unexpected to the user...
See tests

+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '0ms');
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); -- 
ERROR expected
+SELECT 1 FROM 
uuidv7(uuid_extract_time('--7FFF-B000-'));
+SELECT 
uuidv7(uuid_extract_time('--7FFF-B000-')+'1ms'); -- 
ERROR expected

Range is from 1970-01-01 00:00:00 to 10889-08-02 05:31:50.655. I'm not sure we 
should give this information in error message...
Thanks!


Best regards, Andrey Borodin.


v13-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-01-24 Thread Aleksander Alekseev
Hi,

> Also, please not that uuidv7(time+1us) and uuidv7(time) will have the same 
> internal timestamp, so despite time+1us > time, still second uuid will be 
> greater.
>
> Both invariants you proposed cannot be reasonably guaranteed. Upholding any 
> of them greatly reduces usability of UUID v7.

Again, personally I don't insist on the 1us precision [1]. Only the
fact that timestamp from the far past generates UUID from the future
bothers me.

[1]: 
https://postgr.es/m/CAJ7c6TPCSprWwVNdOB%3D%3DpgKZPqO5q%3DHRgmU7zmYqz9Dz5ffVYw%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-01-24 Thread Andrey M. Borodin



> On 24 Jan 2024, at 18:29, Aleksander Alekseev  
> wrote:
> 
> Hi,
> 
>> Function to extract timestamp does not provide any guarantees at all. 
>> Standard states this, see Kyzer answers upthread.
>> Moreover, standard urges against relying on that if uuidX was generated 
>> before uuidY, then uuidX> happen, but does not guaranty that.
>> All what is guaranteed is the uniqueness at certain conditions.
>> 
>>> Otherwise you can calculate crc64(X) or sha256(X)
>>> internally in order to generate an unique ID and claim that it's fine.
>>> 
>>> Values that violate named invariants should be rejected with an error.
>> 
>> Think about the value that you pass to uuid generation function as an 
>> entropy. It’s there to ensure uniqueness and promote ordering (but not 
>> guarantee).
> 
> If the standard doesn't guarantee something it doesn't mean it forbids
> us to give stronger guarantees.
No, the standard makes these guarantees impossible.
If we insist that uuid_extract_time(uuidv7(time))==time, we won't be able to 
generate uuidv7 most of the time. uuidv7(now()) will always ERROR-out.
Standard implies more coarse-grained timestamp that we have.

Also, please not that uuidv7(time+1us) and uuidv7(time) will have the same 
internal timestamp, so despite time+1us > time, still second uuid will be 
greater.

Both invariants you proposed cannot be reasonably guaranteed. Upholding any of 
them greatly reduces usability of UUID v7.


Best regards, Andrey Borodin.



Re: UUID v7

2024-01-24 Thread Aleksander Alekseev
Hi,

> Function to extract timestamp does not provide any guarantees at all. 
> Standard states this, see Kyzer answers upthread.
> Moreover, standard urges against relying on that if uuidX was generated 
> before uuidY, then uuidX happen, but does not guaranty that.
> All what is guaranteed is the uniqueness at certain conditions.
>
> > Otherwise you can calculate crc64(X) or sha256(X)
> > internally in order to generate an unique ID and claim that it's fine.
> >
> > Values that violate named invariants should be rejected with an error.
>
> Think about the value that you pass to uuid generation function as an 
> entropy. It’s there to ensure uniqueness and promote ordering (but not 
> guarantee).

If the standard doesn't guarantee something it doesn't mean it forbids
us to give stronger guarantees. I'm convinced that these guarantees
will be useful in real-world applications, at least the ones acting
exclusively within Postgres.

This being said, I understand your point of view too. Let's see what
other people think.

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-01-24 Thread Aleksander Alekseev
Hi,

> Values that violate named invariants should be rejected with an error.

To clarify, I don't think we should bother about the precision part.
"Equals" in the example above means "equal within UUIDv7 precision",
same for "more" and "less". However, years 2977 BC and 5943 AC are
clearly not equal, thus 2977 BC should be rejected as an invalid value
for UUIDv7.

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-01-24 Thread Andrey M. Borodin



> On 24 Jan 2024, at 18:02, Aleksander Alekseev  
> wrote:
> 
> Hi,
> 
>> UUIDv7 range does not correspond to timestamp range. But it’s purpose is not 
>> in storing timestamp, but in being unique identifier. So I don’t think it 
>> worth throwing an error when overflowing value is given. BTW if you will 
>> subtract some nanoseconds - you will not get back timestamp you put into 
>> UUID too.
>> UUID does not store timpestamp, it only uses it to generate an identifier. 
>> Some value can be extracted back, but with limited precision, limited range 
>> and only if UUID was generated precisely by the specification in standard 
>> (and standard allows deviation! Most of implementation try to tradeoff 
>> something).
> 
> I don't claim that UUIDv7 purpose is storing timestamps, but I think
> the invariant:
> 
> ```
> uuid_extract_time(uidv7(X)) == X
> ```
> 
> and (!) even more importantly:
> 
> ```
> if X > Y then uuidv7(X) > uuidv7(Y)
> ```
> 
> ... should hold.
Function to extract timestamp does not provide any guarantees at all. Standard 
states this, see Kyzer answers upthread.
Moreover, standard urges against relying on that if uuidX was generated before 
uuidY, then uuidX Otherwise you can calculate crc64(X) or sha256(X)
> internally in order to generate an unique ID and claim that it's fine.
> 
> Values that violate named invariants should be rejected with an error.

Think about the value that you pass to uuid generation function as an entropy. 
It’s there to ensure uniqueness and promote ordering (but not guarantee).


Best regards, Andrey Borodin.



Re: UUID v7

2024-01-24 Thread Aleksander Alekseev
Hi,

> UUIDv7 range does not correspond to timestamp range. But it’s purpose is not 
> in storing timestamp, but in being unique identifier. So I don’t think it 
> worth throwing an error when overflowing value is given. BTW if you will 
> subtract some nanoseconds - you will not get back timestamp you put into UUID 
> too.
> UUID does not store timpestamp, it only uses it to generate an identifier. 
> Some value can be extracted back, but with limited precision, limited range 
> and only if UUID was generated precisely by the specification in standard 
> (and standard allows deviation! Most of implementation try to tradeoff 
> something).

I don't claim that UUIDv7 purpose is storing timestamps, but I think
the invariant:

```
uuid_extract_time(uidv7(X)) == X
```

and (!) even more importantly:

```
if X > Y then uuidv7(X) > uuidv7(Y)
```

... should hold. Otherwise you can calculate crc64(X) or sha256(X)
internally in order to generate an unique ID and claim that it's fine.

Values that violate named invariants should be rejected with an error.

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-01-24 Thread Andrey M. Borodin



> On 24 Jan 2024, at 17:31, Aleksander Alekseev  
> wrote:
> 
> Hi,
> 
>> Cfbot also seems to be happy with the patch so I'm changing the CF
>> entry status to RfC.
> 
> I've found a bug:
> 
> ```
> =# select now() - interval '5000 years';
>?column?
> 
> 2977-01-24 15:29:01.779462+02:30:17 BC
> 
> Time: 0.957 ms
> 
> =# select uuidv7(now() - interval '5000 years');
>uuidv7
> --
> 720c1868-0764-7677-99cd-265b84ea08b9
> 
> =# select uuid_extract_time('720c1868-0764-7677-99cd-265b84ea08b9');
> uuid_extract_time
> 
> 5943-08-26 21:30:44.836+03
> ```

UUIDv7 range does not correspond to timestamp range. But it’s purpose is not in 
storing timestamp, but in being unique identifier. So I don’t think it worth 
throwing an error when overflowing value is given. BTW if you will subtract 
some nanoseconds - you will not get back timestamp you put into UUID too.
UUID does not store timpestamp, it only uses it to generate an identifier. Some 
value can be extracted back, but with limited precision, limited range and only 
if UUID was generated precisely by the specification in standard (and standard 
allows deviation! Most of implementation try to tradeoff something).


Best regards, Andrey Borodin.



Re: UUID v7

2024-01-24 Thread Aleksander Alekseev
Hi,

> Cfbot also seems to be happy with the patch so I'm changing the CF
> entry status to RfC.

I've found a bug:

```
=# select now() - interval '5000 years';
?column?

 2977-01-24 15:29:01.779462+02:30:17 BC

Time: 0.957 ms

=# select uuidv7(now() - interval '5000 years');
uuidv7
--
 720c1868-0764-7677-99cd-265b84ea08b9

=# select uuid_extract_time('720c1868-0764-7677-99cd-265b84ea08b9');
 uuid_extract_time

 5943-08-26 21:30:44.836+03
```

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-01-22 Thread Aleksander Alekseev
Hi,

> But now (after big timeseries project with multiple time zones and DST 
> problems) I think differently.
> Even though timestamp and timestamptz are practically the same, timestamptz 
> should be used to store the time in UTC.
> Choosing timestamp is more likely to lead to problems and misunderstandings 
> than timestamptz.

As somebody who contributed TZ support to TimescaleDB I'm more or less
aware about the pros and cons of Timestamp and TimestampTz :)
Engineering is all about compromises. I can imagine a project where it
makes sense to use only TimestampTz for the entire database, and the
opposite - when it's better to use only UTC and Timestamp. In this
particular case I was merely concerned that the particular choice
could be confusing for the users but I think I changed my mind by now,
see below.

>> Here's v12. Changes:
>> 1. Documentation improvements
>> 2. Code comments
>> 3. Better commit message and reviews list
>
>
> Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and 
> functions work well. I especially like that fact that we keep 
> uuid_extract_time(..) here – this is a great thing to have for time-based 
> partitioning, and in many cases we will be able to decide not to have a 
> creation column timestamp (e.g., "created_at") at all, saving 8 bytes.
>
> The docs and comments look great too.
>
> Overall, the patch looks mature enough. It would be great to have it in pg17. 
> Yes, the RFC is not fully finalized yet, but it's very close. And many 
> libraries are already including implementation of UUIDv7 – here are some 
> examples:
>
> - https://www.npmjs.com/package/uuidv7
> - https://crates.io/crates/uuidv7
> - https://github.com/google/uuid/pull/139

Thanks!

After playing with v12 I'm inclined to agree that it's RfC.

I only have a couple of silly nitpicks:

- It could make sense to decompose the C implementation of uuidv7() in
two functions, for readability.
- It could make sense to get rid of curly braces in SQL tests when
calling uuid_extract_ver() and uuid_extract_ver(), for consistency.

I'm not going to insist on these changes though and prefer leaving it
to the author and the committer to decide.

Also I take back what I said above about using Timestamp instead of
TimestampTz. I forgot that Timestamps are implicitly casted to
TimestampTz's, so users preferring Timestamps can do this:

```
=# select uuidv7('2024-01-22 12:34:56' :: timestamp);
uuidv7
--
 018d3085-de00-77c1-9e7b-7b04ddb9ebb9
```

Cfbot also seems to be happy with the patch so I'm changing the CF
entry status to RfC.

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-01-21 Thread Nikolay Samokhvalov
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Manually tested uuidv7(), uuid_extract_time() – they work as expected. The 
basic docs provided look clear.

I haven't checked the tests though and possible edge cases, so leaving it as 
"needs review" waiting for more reviewers

Re: UUID v7

2024-01-21 Thread Nikolay Samokhvalov
On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin 
wrote:

>
>
> > On 19 Jan 2024, at 13:25, Andrey Borodin  wrote:
> >
> > Also, I've added some documentation on all functions.
>
> Here's v12. Changes:
> 1. Documentation improvements
> 2. Code comments
> 3. Better commit message and reviews list
>

Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and
functions work well. I especially like that fact that we keep
uuid_extract_time(..) here – this is a great thing to have for time-based
partitioning, and in many cases we will be able to decide not to have a
creation column timestamp (e.g., "created_at") at all, saving 8 bytes.

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in
pg17. Yes, the RFC is not fully finalized yet, but it's very close. And
many libraries are already including implementation of UUIDv7 – here are
some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

Nik


Re: UUID v7

2024-01-19 Thread Andrey Borodin


> On 19 Jan 2024, at 13:25, Andrey Borodin  wrote:
> 
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list

Best regards, Andrey Borodin.



v12-0001-Implement-UUID-v7.patch
Description: Binary data


Re: UUID v7

2024-01-19 Thread Aleksander Alekseev
Hi,

> No.
>
> Timestamp and TimestampTz are absolutely the same thing. The only
> difference is how they are shown to the user. TimestampTz uses session
> context in order to be displayed in the TZ chosen by the user. Thus
> typically it is somewhat more confusing to the users and thus I asked
> whether there was a good reason to choose TimestampTz over Timestamp.
>
>
> Theoretically, you're right. But look at this example:
>
> SET timezone TO 'Europe/Warsaw';
> SELECT extract(epoch from '2024-01-18 9:27:30'::timestamp), extract(epoch 
> from '2024-01-18 9:27:30'::timestamptz);
>
>  date_part  | date_part
> +
>  1705570050 | 1705566450
> (1 row)
>
> In my opinion, timestamptz gives greater guarantees that the time internally 
> is in UTC and the user gets the time in his/her time zone.

I believe you didn't notice, but this example just proves my point.

In this case you have two timestamps that are different _internally_,
but the way they are _shown_ is the same because the first one is in
UTC and the second one in your local session timezone, Europe/Warsaw.
extract(epoch ...) extract UNIX epoch, i.e. relies on the _internal_
representation. This is why you got different results.

This demonstrates that TimestampTz is a permanent source of confusion
for the users and the reason why personally I would prefer if UUIDv7
always used Timestamp (no Tz). TimestampTz can be converted to
TimestampTz by users who need them and have experience using them.

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-01-19 Thread Andrey Borodin


> On 19 Jan 2024, at 08:24, David G. Johnston  
> wrote:
> 
> 
> You are mixing POSIX and ISO-8601 conventions and, as noted in our appendix, 
> they disagree on the direction that is positive.

Thanks! Now everything seems on its place.

I want to include in the patch following tests:
-- extract UUID v1, v6 and v7 timestamp
SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, 
February 22, 2022 2:22:22.00 PM GMT+05:00';
SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, 
February 22, 2022 2:22:22.00 PM GMT+05:00';
SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, 
February 22, 2022 2:22:22.00 PM GMT+05:00';

How do you think, will it be stable all across buildfarm? Or should we change 
anything to avoid false positives inferred from different timestamp parsing?


> On 19 Jan 2024, at 07:58, Lukas Fittl  wrote:
> 
> Note how calling the uuidv7 function again after having called it with a 
> fixed future timestamp, returns the future timestamp, even though it should 
> return the current time.

Thanks for the review.
Well, that was intentional. But now I see it's kind of confusing behaviour. 
I've changed it to more expected version.

Also, I've added some documentation on all functions.


Best regards, Andrey Borodin.


v11-0001-Implement-UUID-v7-as-per-IETF-draft.patch
Description: Binary data


Re: UUID v7

2024-01-18 Thread David G. Johnston
On Thu, Jan 18, 2024 at 11:31 AM Andrey Borodin 
wrote:

>
> Now I'm completely lost in time... I've set local time to NY (UTC-5).
>
> postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' -
> TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM
> GMT-05:00';
>  ?column?
> --
>  10:00:00
> (1 row)
>
>
You are mixing POSIX and ISO-8601 conventions and, as noted in our
appendix, they disagree on the direction that is positive.

https://www.postgresql.org/docs/current/datetime-posix-timezone-specs.html

The offset fields specify the hours, and optionally minutes and seconds,
difference from UTC. They have the format hh[:mm[:ss]] optionally with a
leading sign (+ or -). The positive sign is used for zones west of
Greenwich. (Note that this is the opposite of the ISO-8601 sign convention
used elsewhere in PostgreSQL.)

David J.


Re: UUID v7

2024-01-18 Thread Lukas Fittl
On Thu, Jan 18, 2024 at 5:18 AM Andrey Borodin  wrote:

> Current patch version attached. I've addressed all other requests:
> function renames, aliases, multiple functions instead of optional params,
> cleaner catalog definitions, not throwing error when [var,ver,time] value
> is unknown.
> What is left: deal with timezones, improve documentation.
>

I've done a test of the v10 patch, and ran into an interesting behavior
when passing in a timestamp to the function (which, as a side note, is
actually very useful to have as a feature, to support creating time-based
range partitions on UUIDv7 fields):

postgres=# SELECT uuid_extract_time(uuidv7());
 uuid_extract_time
---
 2024-01-18 18:49:00.01-08
(1 row)

postgres=# SELECT uuid_extract_time(uuidv7('2024-04-01'));
   uuid_extract_time

 2024-04-01 00:00:00-07
(1 row)

postgres=# SELECT uuid_extract_time(uuidv7());
   uuid_extract_time

 2024-04-01 00:00:00-07
(1 row)

Note how calling the uuidv7 function again after having called it with a
fixed future timestamp, returns the future timestamp, even though it should
return the current time.

I believe this is caused by incorrectly re-using the cached
previous_timestamp. In the second call here (with a fixed future
timestamp), we end up setting ts and tms to 2024-04-01, with
increment_counter = false, which leads us to set previous_timestamp to the
passed in timestamp (else branch of the second if in uuidv7). When we then
call the function again without an argument, we end up getting a new
timestamp from gettimeofday, but because we try to detect backwards leaps,
we set increment_counter to true, and thus end up reusing the previous
(future) timestamp here:

/* protection from leap backward */
tms = previous_timestamp;

Not sure how to fix this, but clearly something is amiss here.

Thanks,
Lukas

-- 
Lukas Fittl


Re: UUID v7

2024-01-18 Thread Przemysław Sztoch

We are not allowed to consider any time other than UTC.

You need to write to the authors of the standard. I suppose this is a 
mistake.


I know from experience that errors in such standards most often appear 
in examples.

Nobody detects them at first.
Everyone reads and checks ideas, not calculations.
Then developers during implementation tears out their hair.

Andrey Borodin wrote on 1/18/2024 4:39 PM:



On 18 Jan 2024, at 19:20, Aleksander Alekseev  wrote:

Timestamp and TimestampTz are absolutely the same thing.

My question is not about Postgres data types. I'm asking about examples in the 
standard.

There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated 
on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
It's exaplained to be 16455577420ns after 1582-10-15 00:00:00 UTC.

But 16455577420ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 
UTC. And that was 2022-02-23 00:22:22 in UTC-05.


Best regards, Andrey Borodin.


--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: UUID v7

2024-01-18 Thread Przemysław Sztoch

Aleksander Alekseev wrote on 1/18/2024 3:20 PM:

Hi,


Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

timstamptz internally always store UTC.
I believe that in SQL, when operating with time in UTC, you should always use 
timestamptz.
timestamp is theoretically the same thing. But internally it does not convert 
time to UTC and will lead to incorrect use.

No.

Timestamp and TimestampTz are absolutely the same thing. The only
difference is how they are shown to the user. TimestampTz uses session
context in order to be displayed in the TZ chosen by the user. Thus
typically it is somewhat more confusing to the users and thus I asked
whether there was a good reason to choose TimestampTz over Timestamp.



Theoretically, you're right. But look at this example:

SET timezone TO 'Europe/Warsaw';
SELECT extract(epoch from '2024-01-18 9:27:30'::timestamp), 
extract(epoch from '2024-01-18 9:27:30'::timestamptz);


 date_part  | date_part
+
 1705570050 | 1705566450
(1 row)

In my opinion, timestamptz gives greater guarantees that the time 
internally is in UTC and the user gets the time in his/her time zone.


In the case of timestamp, it is never certain whether it keeps time in 
UTC or in the local zone.


In the case of argument's type, there would be no problem because we 
could create two functions.

Of course timestamp would be treated the same as timestamptz.
But here we have a problem with the function return type, which can only 
be one. And since the time returned is in UTC, it should be timestamptz.


--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: UUID v7

2024-01-18 Thread Przemysław Sztoch
Using localtime would be absurd. Especially since time goes back during 
summer time change.
I believe our implementation should use UTC. No one forbids us from 
assuming that our local time for generating uuid is UTC.


Andrey Borodin wrote on 1/18/2024 2:17 PM:



On 17 Jan 2024, at 02:19, Jelte Fennema-Nio  wrote:

I want to ask Kyzer or Brad, I hope they will see this message. I'm working on 
the patch for time extraction for v1, v6 and v7.

Do I understand correctly, that UUIDs contain local time, not UTC time? For examples in 
[0] I see that "A.6. Example of a UUIDv7 Value" I see that February 22, 2022 
2:22:22.00 PM GMT-05:00 results in unix_ts_ms = 0x017F22E279B0, which is not UTC, but 
local time.
Is it intentional? Section "5.1. UUID Version 1" states otherwise.

If so, I should swap signatures of functions from TimestampTz to Timestamp.
I'm hard-coding examples from this standard to tests, so I want to be precise...

If I follow the standard I see this in tests:
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') at time zone 
'GMT-05';
+ timezone
+--
+ Wed Feb 23 00:22:22 2022
+(1 row)

Current patch version attached. I've addressed all other requests: function 
renames, aliases, multiple functions instead of optional params, cleaner 
catalog definitions, not throwing error when [var,ver,time] value is unknown.
What is left: deal with timezones, improve documentation.


Best regards, Andrey Borodin.

[0] 
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#name-example-of-a-uuidv1-value


--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: UUID v7

2024-01-18 Thread Sergey Prokhorenko
Hi Andrey,

You'd better generate a test UUIDv7 for midnight 1 Jan 1970 UTC. In this case, 
the timestamp in UUIDv7 according to the new RFC must be filled with zeros. By 
extracting the timestamp from this test UUIDv7, you should get exactly midnight 
1 Jan 1970 UTC.
I also recommend this article: https://habr.com/ru/articles/772954/


Sergey Prokhorenko
sergeyprokhore...@yahoo.com.au 

On Thursday, 18 January 2024 at 09:31:16 pm GMT+3, Andrey Borodin 
 wrote:  
 
 

> On 18 Jan 2024, at 20:39, Andrey Borodin  wrote:
> 
> But 16455577420ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 
> UTC. And that was 2022-02-23 00:22:22 in UTC-05.


'2022-02-22 19:22:22 UTC' is exactly that moment which was encoded into example 
UUIDs. It's not '2022-02-23 00:22:22 in UTC-05' as I thought.
I got confused by "at timezone" changes which in fact removes timezone 
information. And that's per SQL standard...

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' - TIMESTAMP 
WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
 ?column? 
--
 10:00:00
(1 row)

postgres=# select TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 
2:22:22.00 PM GMT-05:00';
      timestamptz      

 2022-02-22 04:22:22-05
(1 row)

I cannot wrap my mind around it... Any pointers would be appreciated.
I'm certain that code extracted UTC time correctly, I just want a reliable test 
that verifies timestamp constant (+ I understand what is going on).


Best regards, Andrey Borodin.  

Re: UUID v7

2024-01-18 Thread Andrey Borodin



> On 18 Jan 2024, at 20:39, Andrey Borodin  wrote:
> 
> But 16455577420ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 
> UTC. And that was 2022-02-23 00:22:22 in UTC-05.


'2022-02-22 19:22:22 UTC' is exactly that moment which was encoded into example 
UUIDs. It's not '2022-02-23 00:22:22 in UTC-05' as I thought.
I got confused by "at timezone" changes which in fact removes timezone 
information. And that's per SQL standard...

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' - TIMESTAMP 
WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
 ?column? 
--
 10:00:00
(1 row)

postgres=# select TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 
2:22:22.00 PM GMT-05:00';
  timestamptz   

 2022-02-22 04:22:22-05
(1 row)

I cannot wrap my mind around it... Any pointers would be appreciated.
I'm certain that code extracted UTC time correctly, I just want a reliable test 
that verifies timestamp constant (+ I understand what is going on).


Best regards, Andrey Borodin.



Re: UUID v7

2024-01-18 Thread Sergey Prokhorenko
Hi Andrey,

Aleksander Alekseev wrote: "If this is the case, I think the example is indeed 
wrong". 

This is one of the reasons why I was categorically against any examples of 
implementation in the new RFC. The examples have been very poorly studied and 
discussed, and therefore it is better not to use them at all. But the text of 
the RFC itself clearly refers to UTC, and not at all about local time: "UUID 
version 7 features a time-ordered value field derived from the widely 
implemented and well known Unix Epoch timestamp source, the number of 
milliseconds since midnight 1 Jan 1970 UTC, leap seconds excluded". The main 
reason for using UTC is so that UUIDv7's, generated approximately 
simultaneously in different time zones, are correctly ordered in time when they 
get into one database.


Sergey Prokhorenko
sergeyprokhore...@yahoo.com.au 

On Thursday, 18 January 2024 at 07:22:05 pm GMT+3, Aleksander Alekseev 
 wrote:  
 
 Hi Andrey,

> > Timestamp and TimestampTz are absolutely the same thing.
> My question is not about Postgres data types. I'm asking about examples in 
> the standard.
>
> There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be 
> generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
> It's exaplained to be 16455577420ns after 1582-10-15 00:00:00 UTC.
>
> But 16455577420ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 
> UTC. And that was 2022-02-23 00:22:22 in UTC-05.

Not 100% sure which text you are referring to exactly, but I'm
guessing it's section B.2 of [1]

"""
This example UUIDv7 test vector utilizes a well-known 32 bit Unix
epoch with additional millisecond precision to fill the first 48 bits
[...]
The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00
represented as 0x17F22E279B0 or 1645557742000
"""

If this is the case, I think the example is indeed wrong:

```
=# select extract(epoch from 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT-05:00' :: timestamptz)*1000;
      ?column?
--
 1645521742000.00
(1 row)
```

And the difference between the value in the text and the actual value
is 10 hours as you pointed out.

Also you named the date 1582-10-15 00:00:00 UTC. Maybe you actually
meant 1970-01-01 00:00:00 UTC?

[1]: 
https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html

-- 
Best regards,
Aleksander Alekseev
  

Re: UUID v7

2024-01-18 Thread Aleksander Alekseev
Hi Andrey,

> > Timestamp and TimestampTz are absolutely the same thing.
> My question is not about Postgres data types. I'm asking about examples in 
> the standard.
>
> There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be 
> generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
> It's exaplained to be 16455577420ns after 1582-10-15 00:00:00 UTC.
>
> But 16455577420ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 
> UTC. And that was 2022-02-23 00:22:22 in UTC-05.

Not 100% sure which text you are referring to exactly, but I'm
guessing it's section B.2 of [1]

"""
This example UUIDv7 test vector utilizes a well-known 32 bit Unix
epoch with additional millisecond precision to fill the first 48 bits
[...]
The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00
represented as 0x17F22E279B0 or 1645557742000
"""

If this is the case, I think the example is indeed wrong:

```
=# select extract(epoch from 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT-05:00' :: timestamptz)*1000;
   ?column?
--
 1645521742000.00
(1 row)
```

And the difference between the value in the text and the actual value
is 10 hours as you pointed out.

Also you named the date 1582-10-15 00:00:00 UTC. Maybe you actually
meant 1970-01-01 00:00:00 UTC?

[1]: 
https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-01-18 Thread Andrey Borodin



> On 18 Jan 2024, at 19:20, Aleksander Alekseev  
> wrote:
> 
> Timestamp and TimestampTz are absolutely the same thing.
My question is not about Postgres data types. I'm asking about examples in the 
standard.

There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be 
generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
It's exaplained to be 16455577420ns after 1582-10-15 00:00:00 UTC.

But 16455577420ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 
UTC. And that was 2022-02-23 00:22:22 in UTC-05.


Best regards, Andrey Borodin.



Re: UUID v7

2024-01-18 Thread Aleksander Alekseev
Hi,

> Another question: how did you choose between using TimestampTz and
> Timestamp types? I realize that internally it's all the same. Maybe
> Timestamp will be slightly better since the way it is displayed
> doesn't depend on the session settings. Many people I talked to find
> this part of TimestampTz confusing.
>
> timstamptz internally always store UTC.
> I believe that in SQL, when operating with time in UTC, you should always use 
> timestamptz.
> timestamp is theoretically the same thing. But internally it does not convert 
> time to UTC and will lead to incorrect use.

No.

Timestamp and TimestampTz are absolutely the same thing. The only
difference is how they are shown to the user. TimestampTz uses session
context in order to be displayed in the TZ chosen by the user. Thus
typically it is somewhat more confusing to the users and thus I asked
whether there was a good reason to choose TimestampTz over Timestamp.

-- 
Best regards,
Aleksander Alekseev




Re: UUID v7

2024-01-18 Thread Andrey Borodin


> On 17 Jan 2024, at 02:19, Jelte Fennema-Nio  wrote:

I want to ask Kyzer or Brad, I hope they will see this message. I'm working on 
the patch for time extraction for v1, v6 and v7.

Do I understand correctly, that UUIDs contain local time, not UTC time? For 
examples in [0] I see that "A.6. Example of a UUIDv7 Value" I see that February 
22, 2022 2:22:22.00 PM GMT-05:00 results in unix_ts_ms = 0x017F22E279B0, which 
is not UTC, but local time.
Is it intentional? Section "5.1. UUID Version 1" states otherwise.

If so, I should swap signatures of functions from TimestampTz to Timestamp.
I'm hard-coding examples from this standard to tests, so I want to be precise...

If I follow the standard I see this in tests:
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') at time zone 
'GMT-05';
+ timezone 
+--
+ Wed Feb 23 00:22:22 2022
+(1 row)

Current patch version attached. I've addressed all other requests: function 
renames, aliases, multiple functions instead of optional params, cleaner 
catalog definitions, not throwing error when [var,ver,time] value is unknown.
What is left: deal with timezones, improve documentation.


Best regards, Andrey Borodin.

[0] 
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#name-example-of-a-uuidv1-value


v10-0001-Implement-UUID-v7-as-per-IETF-draft.patch
Description: Binary data


Re: UUID v7

2024-01-16 Thread Jelte Fennema-Nio
On Tue, 16 Jan 2024 at 22:02, Przemysław Sztoch  wrote:
> But replace uuidv7_extract_time(uuid) with uuid_extract_time(uuid) - function 
> should be able extract timestamp from v1/v6/v7

I'm fine with this.

> I would highly recommend to add:
> uuidv5(namespace uuid, name text) -> uuid
> using uuid_generate_v5 from uuid-ossp extension 
> (https://www.postgresql.org/docs/current/uuid-ossp.html)
> There is an important version and it should be included into the main PG code.

I think adding more uuid versions would probably be desirable. But I
don't think it makes sense to clutter this patchset with that. I feel
like on this uuidv7 patchset we've had enough discussion that it could
reasonably get into PG17, but I think adding even more uuid versions
to this patchset would severely reduce the chances of that happening.




Re: UUID v7

2024-01-16 Thread Przemysław Sztoch

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

timstamptz internally always store UTC.
I believe that in SQL, when operating with time in UTC, you should 
always use timestamptz.
timestamp is theoretically the same thing. But internally it does not 
convert time to UTC and will lead to incorrect use.

--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: UUID v7

2024-01-16 Thread Przemysław Sztoch

Jelte Fennema-Nio wrote on 1/16/2024 9:25 PM:

On Tue, 16 Jan 2024 at 19:17, Andrey M. Borodin  wrote:
So currently my preference for the function names would be:

- uuidv4() -> alias for gen_random_uuid()
- uuidv7()
- uuidv7(timestamptz)
- uuid_extract_ver(uuid)
- uuid_extract_var(uuid)
- uuidv7_extract_time(uuid)

+1
But replaceuuidv7_extract_time(uuid)with uuid_extract_time(uuid) - 
function should be able extract timestamp from v1/v6/v7


I would highly recommend to add:
uuidv5(namespace uuid, name text) -> uuid
using uuid_generate_v5 from uuid-ossp extension 
(https://www.postgresql.org/docs/current/uuid-ossp.html)
There is an important version and it should be included into the main PG 
code.


Jelte: Please propose the name of the function that will convert uuid 
from version 1 to 6.
v6 is almost as good as v7 for indexes. And v6 allows you to convert 
from v1 which some people use.

--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: UUID v7

2024-01-16 Thread Jelte Fennema-Nio
On Tue, 16 Jan 2024 at 19:17, Andrey M. Borodin  wrote:
> Jelte, what is your opinion on naming the function which extracts timestamp 
> from UUID v7?

I looked at a few more datatypes: json, jsonb & hstore. The get_
prefix is not used there at all, so I'm still opposed to that. But
they seem to use either an _to_ or an _extract_ infix. _to_ is then
used for conversion of the whole object, and _extract_ is used to
extract a subset. So I think _extract_ would fit well here.

On Fri, 5 Jan 2024 at 11:57, Sergey Prokhorenko
 wrote:
> When naming functions, I would advise using the shorter abbreviation uuidv7 
> from the new version of the RFC instead of uuid_v7.

I also agree with that, uuid_v7 looks weird to my eyes. The RFC also
abbreviates them as UUIDv7 (without a space).

The more I look at it the more I also think the gen_ prefix is quite
strange, and I already thought the gen_random_uuid name was quite
weird. But now that we will also have a uuidv7 I think it's even
stranger that one uses the name from the RFC.

The name of gen_random_uuid was taken verbatim from pgcrypto, without
any discussion on the list[0]:

> Here is a proposed patch for this. I did a fair bit of looking around
> in other systems for a naming pattern but didn't find anything
> consistent. So I ended up just taking the function name and code from
> pgcrypto.


So currently my preference for the function names would be:

- uuidv4() -> alias for gen_random_uuid()
- uuidv7()
- uuidv7(timestamptz)
- uuid_extract_ver(uuid)
- uuid_extract_var(uuid)
- uuidv7_extract_time(uuid)

[0]: 
https://www.postgresql.org/message-id/flat/6a65610c-46fc-2323-6b78-e8086340a325%402ndquadrant.com#76e40e950a44aa8b6844297e8d2efe2c




Re: UUID v7

2024-01-16 Thread Przemysław Sztoch

Andrey Borodin wrote on 1/16/2024 1:15 PM:

Sergey, Przemysław, Jelte, thanks for your feedback.
Here's v9. Changes:
1. Swapped type of the argument to timestamptz in gen_uuid_v7()

Please update docs part about optional timestamp argument.

2. Renamed get_uuid_v7_time() to uuid_v7_time()

Pleaserename uuid_v7_time to uuid_time() and add support for v1 and v6.
If version is incompatible then return NULL.

3. Added  uuid_ver() and uuid_var().

Looks good.
But for me, throwing an error is problematic. Wouldn't it be better to 
return -1.

What do you think?
Best regards, Andrey Borodin.

--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: UUID v7

2024-01-16 Thread Jelte Fennema-Nio
On Tue, 16 Jan 2024 at 15:44, Andrey M. Borodin  wrote:
> > On 16 Jan 2024, at 18:00, Aleksander Alekseev  
> > wrote:
> > Not 100% sure what this is for. Any chance this could be part of another 
> > patch?
> Nope, it’s necessary there. Without these changes catalog functions cannot 
> have defaults for arguments. These defaults have type pg_node_tree which has 
> no-op in function.

That seems like the wrong way to make that work then. How about
instead we define the same function name twice, once with and once
without a timestamp argument. That's how this is done for other
functions that are overloaded in pg_catalog.




Re: UUID v7

2024-01-16 Thread Andrey M. Borodin



> On 16 Jan 2024, at 21:49, Sergey Prokhorenko  
> wrote:
> 
> It is not clear how to interpret uuid_v7_time():
>   • uuid_v7 to time() (extracting the timestamp)
>   • time() to uuid_v7  (generation of the uuid_v7)
> It is worth improving the naming, for example, adding prepositions.

Previously, Jelte had some thoughts on idiomatic function names.

Jelte, what is your opinion on naming the function which extracts timestamp 
from UUID v7?
Of cause, it would be great to hear opinion from anyone else.


Best regards, Andrey Borodin.



  1   2   >