subject:"\[HACKERS\] Change sort order on UUIDs\?"

Re: [HACKERS] Change sort order on UUIDs?

2007-07-16 Thread Bruce Momjian


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Michael Glaesemann wrote:
> 
> On Jun 14, 2007, at 19:04 , [EMAIL PROTECTED] wrote:
> 
> > For UUID, I
> > would value random access before sequential performance. Why would
> > anybody scan UUID through the index in "sequential" order?
> 
> AIUI, to allow UUID columns to be indexed using BTREE, there needs to  
> be some ordering defined. So regardless of what this ordering is,  
> doesn't there need to be some order? And as a (primary?) purpose of  
> UUIDs is to be (universally) unique, and the implementation of  
> uniqueness constraints in PostgreSQL is based on BTREE indexes, this  
> makes the necessity of ordering doubly so. Or have I missed something?
> 
> Michael Glaesemann
> grzm seespotcode net
> 
> 
> 
> ---(end of broadcast)---
> TIP 7: You can help support the PostgreSQL project by donating at
> 
> http://www.postgresql.org/about/donate

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Change sort order on UUIDs?

2007-06-15 Thread mark

If index lookup speed or packing truly was the primary concern, people
would use a suitably sized SEQUENCE. They would not use UUID.

I believe the last time I calculated this, the result was that you
could fit 50% more entries in the index if you use a 32-bit sequence
number instead of a 128-bit UUID.

I've been mixed on my personal use of UUID. The main benefits of UUID
for me have been:

   1) Different sites can generate UUID values without communicating
  according to a well known standard formula, and have an excellent
  chance of always generating unique values for a particular application
  that can be later merged together without conflict.

   2) The numbers are difficult to predict by an end user of your
  application. If exposing your row id to the world, a UUID is
  more obscure and difficult to predict than a sequence number.

The rest of the benefits are slim or non-existent. It's not more
efficient to store or access. It's not easier to process, memorize or
type. Any significance of values, such as UUIDv1 embedded a time and
mac address are of only limited value, as it relies on the generator
following the expected algorithm, and the generators having the same
time (with different allowances for error).

For example, if one truly wished to have an efficient lookup key, and
record creation time, why not use two fields? 1 x 32-bit sequence number,
and 1 x 64-bit timestamp. This gives portability and makes the purpose of
the fields clear. It gives flexibility to create the index on either/or.

For read-only data, I've taken to using the SHA1 sum of the data as
the unique id instead of UUID or SEQUENCE. Works pretty good... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Change sort order on UUIDs?

2007-06-15 Thread mark

On Fri, Jun 15, 2007 at 11:05:01AM -0400, Robert Wojciechowski wrote:
> Also, treating UUIDs as time based is completely valid -- that is the
> point of version 1 UUIDs. They have quite a few advantages over random UUIDs.

It's a leap from extracting the UUID as time, to sorting by UUID for
results, or an assumption that reduced fragmentation would increase
performance for regular loads.

> Ordering random UUIDs as if they were time-based will still result in
> random access on the b-tree, so again luckily this just relies on how
> you generate the UUIDs not how you order them.

Therefore fragmented by your assertion, therefore a performance
problem. If your premise is correct, the conclusion that it leads to
is that UUIDv1 will perform better under PostgreSQL.

> So far this ordering method seems to satisfy both needs, but my
> performance analysis will reveal more.

The word "satisfy" above is loosely defined. The existing model
"satisfies" both needs. You believe you have a case that would prove
that the existing model is sub-optimal. You would like to make a
change which you believe would optimize your own use, at the minimal
expense of other users. The minimal expense comes from the more
complicated comparator function, and the confusion of any who see
their non-UUIDv1 UUID's sorted by some apparently arbitrary scheme
that seems to have a history of assuming that UUIDv1 will be used.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Change sort order on UUIDs?

2007-06-15 Thread mark

On Fri, Jun 15, 2007 at 09:40:29AM -0500, Michael Glaesemann wrote:
> On Jun 14, 2007, at 19:04 , [EMAIL PROTECTED] wrote:
> >For UUID, I
> >would value random access before sequential performance. Why would
> >anybody scan UUID through the index in "sequential" order?
> AIUI, to allow UUID columns to be indexed using BTREE, there needs to  
> be some ordering defined. So regardless of what this ordering is,  
> doesn't there need to be some order? And as a (primary?) purpose of  
> UUIDs is to be (universally) unique, and the implementation of  
> uniqueness constraints in PostgreSQL is based on BTREE indexes, this  
> makes the necessity of ordering doubly so. Or have I missed something?

The BTREE needs a comparator function, yes. The BTREE comparator
function need not match any expectation of the caller.

For example, if Robert is correct, that indexes on UUID will become
fragmented over time with the current comparator, then he is providing
only a half solution. He will have non fragmented time-based UUID
index, while continuing to allow everybody else to have fragmented
non time-based UUID. The logic does not make sense. Either fragmentation
is an issue, or it is not. Concurrency is either an issue, or it is not.

I do not believe that it has been demonstrated that there is a
fragmentation issue, nor do I believe it has been shown what effect
would occur on concurrency.

Personally, my preference is for the BTREE comparator to be the fastest
comparison possible, to allow the quickest scanning of the index entries.
I don't believe fragmentation is a serious issue, and I believe there are
concurrency benefits to inserting into different index pages, rather than
always adding to the end.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Change sort order on UUIDs?

2007-06-15 Thread Robert Wojciechowski

> I suggest that treating the UUID as anything other than a unique
> random value is a mistake. There should be no assumptions by users
> with regard to how the order is displayed. 

You can always use random UUIDs -- that's a choice in UUID generation.
When dealing with random UUIDs you also (by the very nature of them
being random) would never order by column as it just doesn't make sense.
Nothing changes as long as you generate random UUIDs.

Also, treating UUIDs as time based is completely valid -- that is the
point of version 1 
UUIDs. They have quite a few advantages over random UUIDs.

> Also, as UUID generation
> based on time is always in sequence, it seems to me that sorting by
> UUID time would have the effect of inserts always being to the end of
> the index. While this might pack tightly, wouldn't this hurt
> concurrency? Random access vs sequential performance. For UUID, I
> would value random access before sequential performance. Why would
> anybody scan UUID through the index in "sequential" order?
> 

Ordering random UUIDs as if they were time-based will still result in
random access on the b-tree, so again luckily this just relies on how
you generate the UUIDs not how you order them.

So far this ordering method seems to satisfy both needs, but my
performance analysis will reveal more.

-- Robert

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Change sort order on UUIDs?

2007-06-15 Thread Robert Wojciechowski

> > When done that way, you're going to see a lot of index B-tree
> > fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based
UUIDs,
> > as described above. With random (version 4) or hashed based (version
3
> > or 5) UUIDs there's nothing that can be done to improve the
situation,
> > obviously.
> 
> Is this based on empirical results or just a theory? I'm asking
because
> it's actually a common technique to reverse the natural index key to 
> construct basically exactly this situation -- for performance reasons.

> The idea is that low order bits have higher cardinality and that that 
> can *improve* btree performance by avoiding contention.
> 
> I'm not sure how much I believe in the effectiveness of that strategy
> myself or for that matter whether it's universally applicable or only
> useful in certain types of loads.
> 
> I'm not saying you're wrong, but I'm not sure it's a simple open and
shut
> case either.
> 

What hinted me off to this was the same problem occurring in SQL Server,
where changing the behavior gave them a much lower I/O load with
replication (which utilizes UUIDs). 

A blog post from an MS developer at http://tinyurl.com/2xy5jn talks
about how this has allowed a tighter index as well as not requiring
random searches on the b-tree, significantly reducing I/O.

A performance analysis at http://tinyurl.com/2ysora has a table
comparing using an integer, UUID and sequential UUID (when the system
orders UUIDs sequentially by time, like SQL Server already does).

Obviously this is not SQL Server we're dealing with, but I can see many
of the same issues unavoidable and equally impactful as we both use the
same index data structure. I'll

That being said, I'm going to perform tests today or this weekend on
different loads to see how PostgreSQL would be affected by this change.
I'll be very interested to see the results of random b-tree searches on
every insert vs the contention from sequentially generated UUIDs.

-- Robert

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Change sort order on UUIDs?

2007-06-15 Thread Michael Glaesemann



On Jun 14, 2007, at 19:04 , [EMAIL PROTECTED] wrote:


For UUID, I
would value random access before sequential performance. Why would
anybody scan UUID through the index in "sequential" order?


AIUI, to allow UUID columns to be indexed using BTREE, there needs to  
be some ordering defined. So regardless of what this ordering is,  
doesn't there need to be some order? And as a (primary?) purpose of  
UUIDs is to be (universally) unique, and the implementation of  
uniqueness constraints in PostgreSQL is based on BTREE indexes, this  
makes the necessity of ordering doubly so. Or have I missed something?


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Change sort order on UUIDs?

2007-06-14 Thread mark

On Thu, Jun 14, 2007 at 03:38:44PM -0400, Robert Wojciechowski wrote:
> I've been testing the new UUID functionality in 8.3dev and noticed that
> UUIDs are sorted using memcmp in their default in-memory layout, which
> is:
>  struct uuid {
>  uint32_ttime_low;
>  uint16_ttime_mid;
>  uint16_ttime_hi_and_version;
>  uint8_t clock_seq_hi_and_reserved;
>  uint8_t clock_seq_low;
>  uint8_t node[_UUID_NODE_LEN];
>  };
> When done that way, you're going to see a lot of index B-tree
> fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,
> as described above. With random (version 4) or hashed based (version 3
> or 5) UUIDs there's nothing that can be done to improve the situation,
> obviously.

I suggest that treating the UUID as anything other than a unique
random value is a mistake. There should be no assumptions by users
with regard to how the order is displayed. Also, as UUID generation
based on time is always in sequence, it seems to me that sorting by
UUID time would have the effect of inserts always being to the end of
the index. While this might pack tightly, wouldn't this hurt
concurrency? Random access vs sequential performance. For UUID, I
would value random access before sequential performance. Why would
anybody scan UUID through the index in "sequential" order?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Change sort order on UUIDs?

2007-06-14 Thread Gregory Stark

"Robert Wojciechowski" <[EMAIL PROTECTED]> writes:

> That was my thinking; that it would speed up (bulk) inserts causing
> fewer page splits.

Ah, I understand better now. hm. high data density would be good for reading.
But I think the case for inserting is actually quite mixed. If you have lots
of processes trying to insert you'll actually get poorer performance because
they'll all have to get access to the same page. Worse, you'll probably have a
unique index.

> The other thing this modification does is allow ORDER BY to order by
> time when possible, which is a nice default behavior as well, yes?

I think that actually is quite a nice effect. Certainly the loss of it is one
of the big practical disadvantages of using UUIDs over a sequence.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Change sort order on UUIDs?

2007-06-14 Thread Gregory Stark

"Robert Wojciechowski" <[EMAIL PROTECTED]> writes:

> When done that way, you're going to see a lot of index B-tree
> fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,
> as described above. With random (version 4) or hashed based (version 3
> or 5) UUIDs there's nothing that can be done to improve the situation,
> obviously.

Is this based on empirical results or just a theory? I'm asking because it's
actually a common technique to reverse the natural index key to construct
basically exactly this situation -- for performance reasons. The idea is that
low order bits have higher cardinality and that that can *improve* btree
performance by avoiding contention.

I'm not sure how much I believe in the effectiveness of that strategy myself
or for that matter whether it's universally applicable or only useful in
certain types of loads.

I'm not saying you're wrong, but I'm not sure it's a simple open and shut case
either. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Change sort order on UUIDs?

2007-06-14 Thread Robert Wojciechowski

> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> > I believe he means that with his modified comparison function, when
> > inserting a series of UUIDs with increasing time-fields, the index
keys
> > are always inserted to the rightmost page, which gives a more
tightly
> > packed index than scattered inserts all-around the index.
> 
> Hm.  Still, given that that benefit would only accrue for one version
of
> uuid generation, it's a pretty weak argument.
> 
> The concrete reason for not changing it is that the sort ordering of
> uuids would then look quite unnatural compared to the display format.
> Which would provoke confusion and bug reports...
> 
>   regards, tom lane

If it improves non-user controllable indexing behavior, doesn't
negatively affect the indexing of random/hash based UUIDs, and only
seems to affect ordering for the display format, it seems worth it to
me.

A paragraph in the documentation stating how UUIDs are sorted seems to
satisfy the visual ordering concern, which is more than what Microsoft
is doing (I had to dig for a blog post to find this out.)

In addition it would be very odd to sort random/hashed GUIDs and expect
anything that in meaningful, anyway. If the user wants to see a UUID
lexographically sorted, they could also cast the column to text like so:

  select uuid_column from uuid_test order by uuid_column::text;

... which produces the desired output for visual analysis if that was
desired while still retaining all the other benefits.

I'll continue thinking about any other downsides to this tonight, too.

-- Robert

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Change sort order on UUIDs?

2007-06-14 Thread Robert Wojciechowski

> >> I've been testing the new UUID functionality in 8.3dev and noticed
that
> >> UUIDs are sorted using memcmp in their default in-memory layout,
> >> ...
> >> When done that way, you're going to see a lot of index B-tree
> >> fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based
UUIDs,
> >
> > This claim seems like nonsense.  Btrees don't care about the
ordering
> > details of what they index.
> 
> I believe he means that with his modified comparison function, when
> inserting a series of UUIDs with increasing time-fields, the index
keys
> are always inserted to the rightmost page, which gives a more tightly
> packed index than scattered inserts all-around the index.
> 

That was my thinking; that it would speed up (bulk) inserts causing
fewer page splits.

I'm also using my own contrib module that uses FreeBSD's uuid_create
generating DCE 1.1 UUIDs, which keeps the state of the UUID generator in
the kernel. The 8.3 contrib module based on uuid-ossp seems to 1) not
compile on FreeBSD (conflicts with uuid.h from the OS) and 2) randomizes
the clock sequence as there is no state stored between invocations.

The other thing this modification does is allow ORDER BY to order by
time when possible, which is a nice default behavior as well, yes?

-- Robert

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Change sort order on UUIDs?

2007-06-14 Thread Tom Lane

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> I believe he means that with his modified comparison function, when 
> inserting a series of UUIDs with increasing time-fields, the index keys 
> are always inserted to the rightmost page, which gives a more tightly 
> packed index than scattered inserts all-around the index.

Hm.  Still, given that that benefit would only accrue for one version of
uuid generation, it's a pretty weak argument.

The concrete reason for not changing it is that the sort ordering of
uuids would then look quite unnatural compared to the display format.
Which would provoke confusion and bug reports...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Change sort order on UUIDs?

2007-06-14 Thread Heikki Linnakangas


Tom Lane wrote:

"Robert Wojciechowski" <[EMAIL PROTECTED]> writes:

I've been testing the new UUID functionality in 8.3dev and noticed that
UUIDs are sorted using memcmp in their default in-memory layout,
...
When done that way, you're going to see a lot of index B-tree
fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,


This claim seems like nonsense.  Btrees don't care about the ordering
details of what they index.


I believe he means that with his modified comparison function, when 
inserting a series of UUIDs with increasing time-fields, the index keys 
are always inserted to the rightmost page, which gives a more tightly 
packed index than scattered inserts all-around the index.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Change sort order on UUIDs?

2007-06-14 Thread Tom Lane

"Robert Wojciechowski" <[EMAIL PROTECTED]> writes:
> I've been testing the new UUID functionality in 8.3dev and noticed that
> UUIDs are sorted using memcmp in their default in-memory layout,
> ...
> When done that way, you're going to see a lot of index B-tree
> fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,

This claim seems like nonsense.  Btrees don't care about the ordering
details of what they index.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Change sort order on UUIDs?

2007-06-14 Thread Robert Wojciechowski

I've been testing the new UUID functionality in 8.3dev and noticed that
UUIDs are sorted using memcmp in their default in-memory layout, which
is:

 

 struct uuid {

 uint32_ttime_low;

 uint16_ttime_mid;

 uint16_ttime_hi_and_version;

 uint8_t clock_seq_hi_and_reserved;

 uint8_t clock_seq_low;

 uint8_t node[_UUID_NODE_LEN];

 };

 

When done that way, you're going to see a lot of index B-tree
fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,
as described above. With random (version 4) or hashed based (version 3
or 5) UUIDs there's nothing that can be done to improve the situation,
obviously.

 

So I went down the path of changing the pgsql sorting order to instead
sort by, from most significant to least:

 

1)  Node (MAC address),

2)  clock sequence, then

3)  time.

 

The implementation is as follows:

 

/* internal uuid compare function */

static int

uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2)

{

  int result;

 

  /* node */

  if ((result = memcmp(&arg1->data[10], &arg2->data[10], 6)) != 0)

return result;

 

  /* clock_seq_hi_and_reserved, clock_seq_low */

  if ((result = memcmp(&arg1->data[8], &arg2->data[8], 2)) != 0)

return result;

 

  /* time_hi_and_version */

  if ((result = memcmp(&arg1->data[6], &arg2->data[6], 2)) != 0)

return result;

 

  /* time_mid */

  if ((result = memcmp(&arg1->data[4], &arg2->data[4], 2)) != 0)

return result;

 

  /* time_low */

  return memcmp(&arg1->data[0], &arg2->data[0], 4);

}

 

This results in much less fragmentation and reduced page hits when
indexing a UUID column. When multiple UUID generators with different
node values contribute to a single table concurrently, it should also
result in better performance than if they sorted the way they do now or
by time first.

 

Sorting UUIDs when they are random/hashed with memcmp seems pretty darn
useless in all scenarios and performs poorly on indexes. This method is
equally poor with random/hashed UUIDs, but much better with version 1
time based UUIDs.

 

What do you guys think about changing the default behavior of pgsql to
compare UUIDs this way?

 

-- Robert

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

Re: [HACKERS] Change sort order on UUIDs?

[HACKERS] Change sort order on UUIDs?

16 matches

Site Navigation

Mail list logo

Footer information