Re: UUID.compareTo broken?
On 04/10/2014 08:21 PM, Steven Schlansker wrote: On Apr 9, 2014, at 2:21 AM, Paul Sandoz paul.san...@oracle.com wrote: On Apr 8, 2014, at 9:15 PM, Mike Duigou mike.dui...@oracle.com wrote: That seems a terribly broken usage of UUID for 128 bit numbers or a pair of signed 64 bit numbers :-) Part of me thinks we should not be supporting such broken usage. Might be worth getting some usage data from grepcode or maven central. I’m guilty of doing this at a point in the past. We used it to intermix multiple sources of data - a few that used ‘long’ IDs and a few that used ‘real’ UUIDs. We took prefixes that had flags never generated by UUID libraries (the “reserved for compatibility with Microsoft” ones IIRC) and slapped those with a long to make pseudo-UUIDs. That way everything was a UUID and we were guaranteed to never see collisions within our dataset. We never expected them to give sensible meanings to the various getters e.g. timestamp() but we did expect the UUID class to work generally. We never relied on any particular ordering. We could provide static methods to return appropriate comparators for version 1 and version 2 UUIDs--I've actually written them before for other projects. It would be nice to just have one compareTo that does the right thing based of the UUID types being compared. If it were up to me only the time and DCE UUIDs would be comparable, there's no ordering relationship for other versions. I think it is fine for random UUIDs to be comparable with each other. The comparators I've considered adding would only allow comparisons within the same version. Yes, although for a general comparator the primary sort key could be the version value. +1 to a “sort first by version then version-specific ordering” — it gives you the best of both worlds IMO. I think the natural ordering for UUIDs must be able to create an ordering over all possible UUIDs, no matter the version or even if it is valid or not. If you read UUIDs from an external source you have no way to understand what version they are or aren’t. Imagine a process that loads data from a database or text file into a TreeMap - it would be awful if a change in the UUID generation scheme on the far side caused the Comparator you used to no longer function. Hi, Code that relies on UUIDs to have a natural order, say chronological, is relying on being given the particular type of UUIDs that have the time built-in. When given mixed-type or non-time-based UUIDs, such code will break. The purpose of UUID schemes is generating globally unique identifiers, not interpreting them. Various types exist just because it's practical to generate UUIDs differently in different contexts. Programs should not try to extract business information from UUIDs (except probably for hunting down virus authors: http://en.wikipedia.org/wiki/Melissa_%28computer_virus%29). So I think a good general-purpose compareTo() method should try to discourage such usage for example by reversing the bits of the UUID value before comparing. This would also make algorithms that order UUIDs for facilitating quick access (B-Tree, Red-Black-Tree, ...) happier. Try the following benchmark (with options: -wi 5 -i 5 -t 1 -f 1): public class TreeMapInsertionBench { private static final int N = 10; @GenerateMicroBenchmark() @BenchmarkMode(Mode.AverageTime) public static void testTreeMapSequentialInsert(BlackHole bh) { MapInteger, Boolean map = new TreeMap(); for (int i = 0; i N; i++) { map.put(new Integer(i), Boolean.TRUE); } bh.consume(map); } @GenerateMicroBenchmark() @BenchmarkMode(Mode.AverageTime) public static void testTreeMapReversedBitsInsert(BlackHole bh) { MapInteger, Boolean map = new TreeMap(); for (int i = 0; i N; i++) { map.put(new Integer(Integer.reverse(i)), Boolean.TRUE); } bh.consume(map); } } ...and the results will be: Benchmark Mode Samples Mean Mean errorUnits o.s.TreeMapInsertionBench.testTreeMapReversedBitsInsert avgt 58.2080.100ms/op o.s.TreeMapInsertionBench.testTreeMapSequentialInsert avgt 5 10.4981.164ms/op That's just an opinion. Regards, Peter
Re: UUID.compareTo broken?
On Apr 11, 2014, at 8:54 AM, Peter Levart peter.lev...@gmail.com wrote: Hi, Code that relies on UUIDs to have a natural order, say chronological, is relying on being given the particular type of UUIDs that have the time built-in. When given mixed-type or non-time-based UUIDs, such code will break. The purpose of UUID schemes is generating globally unique identifiers, not interpreting them. Various types exist just because it's practical to generate UUIDs differently in different contexts. Good point, the bias is towards producers, but still i can imagine time-based UUIDs are quite useful for consumers and i think implicitly such users would reasonably expect them to be ordered in some manner based on the time. Programs should not try to extract business information from UUIDs (except probably for hunting down virus authors: http://en.wikipedia.org/wiki/Melissa_%28computer_virus%29). So I think a good general-purpose compareTo() method should try to discourage such usage for example by reversing the bits of the UUID value before comparing. This would also make algorithms that order UUIDs for facilitating quick access (B-Tree, Red-Black-Tree, ...) happier. Interesting, i suppose if those N values are randomly shuffled then added it may get close to reversed bit result, i should probably try it :-) Paul.
Re: UUID.compareTo broken?
On Apr 9, 2014, at 2:21 AM, Paul Sandoz paul.san...@oracle.com wrote: On Apr 8, 2014, at 9:15 PM, Mike Duigou mike.dui...@oracle.com wrote: That seems a terribly broken usage of UUID for 128 bit numbers or a pair of signed 64 bit numbers :-) Part of me thinks we should not be supporting such broken usage. Might be worth getting some usage data from grepcode or maven central. I’m guilty of doing this at a point in the past. We used it to intermix multiple sources of data - a few that used ‘long’ IDs and a few that used ‘real’ UUIDs. We took prefixes that had flags never generated by UUID libraries (the “reserved for compatibility with Microsoft” ones IIRC) and slapped those with a long to make pseudo-UUIDs. That way everything was a UUID and we were guaranteed to never see collisions within our dataset. We never expected them to give sensible meanings to the various getters e.g. timestamp() but we did expect the UUID class to work generally. We never relied on any particular ordering. We could provide static methods to return appropriate comparators for version 1 and version 2 UUIDs--I've actually written them before for other projects. It would be nice to just have one compareTo that does the right thing based of the UUID types being compared. If it were up to me only the time and DCE UUIDs would be comparable, there's no ordering relationship for other versions. I think it is fine for random UUIDs to be comparable with each other. The comparators I've considered adding would only allow comparisons within the same version. Yes, although for a general comparator the primary sort key could be the version value. +1 to a “sort first by version then version-specific ordering” — it gives you the best of both worlds IMO. I think the natural ordering for UUIDs must be able to create an ordering over all possible UUIDs, no matter the version or even if it is valid or not. If you read UUIDs from an external source you have no way to understand what version they are or aren’t. Imagine a process that loads data from a database or text file into a TreeMap - it would be awful if a change in the UUID generation scheme on the far side caused the Comparator you used to no longer function.
Re: UUID.compareTo broken?
On Apr 8, 2014, at 9:15 PM, Mike Duigou mike.dui...@oracle.com wrote: For the case of incorrect signed comparison is it sticking around because there is code dependent on the current broken behaviour? Probably even if the dependence is implicit such as expecting a particular iteration order or do you think it would be possible to change that? I requested fixing the compareTo long ago (back when I was a mere user of the JDK) and was told that it could not be changed. Any more details on that? i have my suspicious that code dependent compareTo may not break if the total ordering changes, since many cases are likely to be for randomly generated UUIDs. I agree that most code would probably be fine. I worry though that there will be cases that either explicitly or implicitly depend upon current ordering. Surprisingly there a lot of cases where UUIDs are created using the UUID(long, long) constructor with non-random data such as incrementing numbers: UUID myUUID = new UUID(SOME_CONSTANT, atomicSerial.incrementAndGet()); That seems a terribly broken usage of UUID for 128 bit numbers or a pair of signed 64 bit numbers :-) Part of me thinks we should not be supporting such broken usage. Might be worth getting some usage data from grepcode or maven central. This usage exists, I believe, because we haven't provided direct support for MAC version UUIDs. Unfortunately the UUID(long,long) constructor does not check that the UUID constructed is even valid and usually they are not. This usage, while egregious, is common. These people would probably be impacted by changes in ordering. For correct usage the lsb long will for practical purposes always be -ve, since the msb is 1, as per the variant [1]. For time-based (MAC) UUIDs the comparison is totally broken as the time_low field bits will cycle often (~ every 3.5 mins by my calculation), so i doubt anyone would be using the comparator for such UUIDs to derive any meaningful order, for example, try this: UUID a = UUID.fromString(a938c470-bfc1-11e3-8a33-0800200c9a66); UUID b = UUID.fromString(14857ca0-bfc2-11e3-8a33-0800200c9a66); UUID c = UUID.fromString(d7a34910-bfc2-11e3-8a33-0800200c9a66); System.out.println(a.timestamp() b.timestamp()); System.out.println(b.timestamp() c.timestamp()); System.out.println(a.timestamp() c.timestamp()); System.out.println(a.compareTo(b)); System.out.println(b.compareTo(c)); System.out.println(a.compareTo(c)); We could provide static methods to return appropriate comparators for version 1 and version 2 UUIDs--I've actually written them before for other projects. It would be nice to just have one compareTo that does the right thing based of the UUID types being compared. If it were up to me only the time and DCE UUIDs would be comparable, there's no ordering relationship for other versions. I think it is fine for random UUIDs to be comparable with each other. The comparators I've considered adding would only allow comparisons within the same version. Yes, although for a general comparator the primary sort key could be the version value. I also note that UUID does not currently support version 5, SHA-1, which it should. I am hoping to do other performance work on UUID within the scope of Java 9. Adding additional Comparators would fit right in with that. OK, although it might be nice to bash this on the head sooner? as it might be possible to get into an 8u release depending on what we conclude about backwards compatibility. Any change to the ordering of the compareTo would have to happen in a major version if at all. I am very reluctant to change this behaviour when providing alternative comparators might just be a better choice. Here is an alternative, deprecate UUID and create a new UUID class that is not broken. If we cannot fix the existing class lets just recognize it for what it is, a defective implementation for 128 bit numbers. Paul. [1] https://tools.ietf.org/html/rfc4122#section-4.1.1
Re: UUID.compareTo broken?
On Apr 7, 2014, at 7:23 PM, Mike Duigou mike.dui...@oracle.com wrote: The issue is that the comparison is done upon the signed most significant and least significant long values. At minimum UUIDs should be compared as if they were 128-bit unsigned values. Thanks. Beyond that, version (which is a type not a version) 1 and 2 UUIDs (time based and DCE), should have a more sophisticated comparison which orders the UUIDs by their creation time. This is one of those cases where sloppiness/laziness/ignorance in the original implementation version sticks around forever. For the case of incorrect signed comparison is it sticking around because there is code dependent on the current broken behaviour? or do you think it would be possible to change that? i have my suspicious that code dependent compareTo may not break if the total ordering changes, since many cases are likely to be for randomly generated UUIDs. We could provide static methods to return appropriate comparators for version 1 and version 2 UUIDs--I've actually written them before for other projects. It would be nice to just have one compareTo that does the right thing based of the UUID types being compared. I also note that UUID does not currently support version 5, SHA-1, which it should. I am hoping to do other performance work on UUID within the scope of Java 9. Adding additional Comparators would fit right in with that. OK, although it might be nice to bash this on the head sooner? as it might be possible to get into an 8u release depending on what we conclude about backwards compatibility. Paul.
Re: UUID.compareTo broken?
On Apr 8, 2014, at 1:03 AM, Paul Sandoz paul.san...@oracle.com wrote: On Apr 7, 2014, at 7:23 PM, Mike Duigou mike.dui...@oracle.com wrote: I also note that UUID does not currently support version 5, SHA-1, which it should. I am hoping to do other performance work on UUID within the scope of Java 9. Adding additional Comparators would fit right in with that. OK, although it might be nice to bash this on the head sooner? as it might be possible to get into an 8u release depending on what we conclude about backwards compatibility. If random-end-user opinions count for anything, I would strongly +1 for an 8u update with this. We are still porting around routines to bypass UUID’s from/toString because they are embarrassingly inefficient, and it would be nice to ditch that code before 9 ships :)
Re: UUID.compareTo broken?
I am targeting to have the performance improvements you contributed to UUID into 8u40 (around the end of the year). I expect to contribute the work into OpenJDK 9 in June-Julyish. Mike On Apr 8 2014, at 09:34 , Steven Schlansker stevenschlans...@gmail.com wrote: On Apr 8, 2014, at 1:03 AM, Paul Sandoz paul.san...@oracle.com wrote: On Apr 7, 2014, at 7:23 PM, Mike Duigou mike.dui...@oracle.com wrote: I also note that UUID does not currently support version 5, SHA-1, which it should. I am hoping to do other performance work on UUID within the scope of Java 9. Adding additional Comparators would fit right in with that. OK, although it might be nice to bash this on the head sooner? as it might be possible to get into an 8u release depending on what we conclude about backwards compatibility. If random-end-user opinions count for anything, I would strongly +1 for an 8u update with this. We are still porting around routines to bypass UUID’s from/toString because they are embarrassingly inefficient, and it would be nice to ditch that code before 9 ships :)
Re: UUID.compareTo broken?
On Apr 8 2014, at 01:03 , Paul Sandoz paul.san...@oracle.com wrote: On Apr 7, 2014, at 7:23 PM, Mike Duigou mike.dui...@oracle.com wrote: The issue is that the comparison is done upon the signed most significant and least significant long values. At minimum UUIDs should be compared as if they were 128-bit unsigned values. Thanks. Beyond that, version (which is a type not a version) 1 and 2 UUIDs (time based and DCE), should have a more sophisticated comparison which orders the UUIDs by their creation time. This is one of those cases where sloppiness/laziness/ignorance in the original implementation version sticks around forever. For the case of incorrect signed comparison is it sticking around because there is code dependent on the current broken behaviour? Probably even if the dependence is implicit such as expecting a particular iteration order or do you think it would be possible to change that? I requested fixing the compareTo long ago (back when I was a mere user of the JDK) and was told that it could not be changed. i have my suspicious that code dependent compareTo may not break if the total ordering changes, since many cases are likely to be for randomly generated UUIDs. I agree that most code would probably be fine. I worry though that there will be cases that either explicitly or implicitly depend upon current ordering. Surprisingly there a lot of cases where UUIDs are created using the UUID(long, long) constructor with non-random data such as incrementing numbers: UUID myUUID = new UUID(SOME_CONSTANT, atomicSerial.incrementAndGet()); This usage exists, I believe, because we haven't provided direct support for MAC version UUIDs. Unfortunately the UUID(long,long) constructor does not check that the UUID constructed is even valid and usually they are not. This usage, while egregious, is common. These people would probably be impacted by changes in ordering. We could provide static methods to return appropriate comparators for version 1 and version 2 UUIDs--I've actually written them before for other projects. It would be nice to just have one compareTo that does the right thing based of the UUID types being compared. If it were up to me only the time and DCE UUIDs would be comparable, there's no ordering relationship for other versions. The comparators I've considered adding would only allow comparisons within the same version. I also note that UUID does not currently support version 5, SHA-1, which it should. I am hoping to do other performance work on UUID within the scope of Java 9. Adding additional Comparators would fit right in with that. OK, although it might be nice to bash this on the head sooner? as it might be possible to get into an 8u release depending on what we conclude about backwards compatibility. Any change to the ordering of the compareTo would have to happen in a major version if at all. I am very reluctant to change this behaviour when providing alternative comparators might just be a better choice. Mike
Re: UUID.compareTo broken?
The issue is that the comparison is done upon the signed most significant and least significant long values. At minimum UUIDs should be compared as if they were 128-bit unsigned values. Beyond that, version (which is a type not a version) 1 and 2 UUIDs (time based and DCE), should have a more sophisticated comparison which orders the UUIDs by their creation time. This is one of those cases where sloppiness/laziness/ignorance in the original implementation version sticks around forever. We could provide static methods to return appropriate comparators for version 1 and version 2 UUIDs--I've actually written them before for other projects. I also note that UUID does not currently support version 5, SHA-1, which it should. I am hoping to do other performance work on UUID within the scope of Java 9. Adding additional Comparators would fit right in with that. Mike On Apr 7 2014, at 03:49 , Paul Sandoz paul.san...@oracle.com wrote: Hi, https://github.com/cowtowncoder/java-uuid-generator JDK's java.util.UUID has flawed implementation of compareTo(), which uses naive comparison of 64-bit values. Anyone familiar with the JDK UUID implementation care to comment? Paul.