Re: UUID.compareTo broken?

2014-04-11 Thread Peter Levart


On 04/10/2014 08:21 PM, Steven Schlansker wrote:

On Apr 9, 2014, at 2:21 AM, Paul Sandoz paul.san...@oracle.com wrote:


On Apr 8, 2014, at 9:15 PM, Mike Duigou mike.dui...@oracle.com wrote:

That seems a terribly broken usage of UUID for 128 bit numbers or a pair of 
signed 64 bit numbers :-)

Part of me thinks we should not be supporting such broken usage. Might be worth 
getting some usage data from grepcode or maven central.


I’m guilty of doing this at a point in the past.  We used it to intermix 
multiple sources of data - a few that used ‘long’ IDs and a few that used 
‘real’ UUIDs.  We took prefixes that had flags never generated by UUID 
libraries (the “reserved for compatibility with Microsoft” ones IIRC) and 
slapped those with a long to make pseudo-UUIDs.  That way everything was a UUID 
and we were guaranteed to never see collisions within our dataset.

We never expected them to give sensible meanings to the various getters e.g. 
timestamp() but we did expect the UUID class to work generally.
We never relied on any particular ordering.


We could provide static methods to return appropriate comparators for version 1 
and version 2 UUIDs--I've actually written them before for other projects.


It would be nice to just have one compareTo that does the right thing based of 
the UUID types being compared.

If it were up to me only the time and DCE UUIDs would be comparable, there's no 
ordering relationship for other versions.

I think it is fine for random UUIDs to be comparable with each other.



The comparators I've considered adding would only allow comparisons within the 
same version.

Yes, although for a general comparator the primary sort key could be the 
version value.

+1 to a “sort first by version then version-specific ordering” — it gives you 
the best of both worlds IMO.

I think the natural ordering for UUIDs must be able to create an ordering over 
all possible UUIDs, no matter the version or even if it is valid or not.  If 
you read UUIDs from an external source you have no way to understand what 
version they are or aren’t.  Imagine a process that loads data from a database 
or text file into a TreeMap - it would be awful if a change in the UUID 
generation scheme on the far side caused the Comparator you used to no longer 
function.



Hi,

Code that relies on UUIDs to have a natural order, say 
chronological, is relying on being given the particular type of UUIDs 
that have the time built-in. When given mixed-type or non-time-based 
UUIDs, such code will break. The purpose of UUID schemes is generating 
globally unique identifiers, not interpreting them. Various types exist 
just because it's practical to generate UUIDs differently in different 
contexts. Programs should not try to extract business information from 
UUIDs (except probably for hunting down virus authors: 
http://en.wikipedia.org/wiki/Melissa_%28computer_virus%29). So I think a 
good general-purpose compareTo() method should try to discourage such 
usage for example by reversing the bits of the UUID value before 
comparing. This would also make algorithms that order UUIDs for 
facilitating quick access (B-Tree, Red-Black-Tree, ...) happier.


Try the following benchmark (with options: -wi 5 -i 5 -t 1 -f 1):

public class TreeMapInsertionBench {

private static final int N = 10;

@GenerateMicroBenchmark()
@BenchmarkMode(Mode.AverageTime)
public static void testTreeMapSequentialInsert(BlackHole bh) {
MapInteger, Boolean map = new TreeMap();
for (int i = 0; i  N; i++) {
map.put(new Integer(i), Boolean.TRUE);
}
bh.consume(map);
}

@GenerateMicroBenchmark()
@BenchmarkMode(Mode.AverageTime)
public static void testTreeMapReversedBitsInsert(BlackHole bh) {
MapInteger, Boolean map = new TreeMap();
for (int i = 0; i  N; i++) {
map.put(new Integer(Integer.reverse(i)), Boolean.TRUE);
}
bh.consume(map);
}
}

...and the results will be:

Benchmark Mode   Samples Mean   Mean errorUnits
o.s.TreeMapInsertionBench.testTreeMapReversedBitsInsert avgt 
58.2080.100ms/op
o.s.TreeMapInsertionBench.testTreeMapSequentialInsert avgt 
5   10.4981.164ms/op



That's just an opinion.

Regards, Peter



Re: UUID.compareTo broken?

2014-04-11 Thread Paul Sandoz
On Apr 11, 2014, at 8:54 AM, Peter Levart peter.lev...@gmail.com wrote:
 
 Hi,
 
 Code that relies on UUIDs to have a natural order, say chronological, is 
 relying on being given the particular type of UUIDs that have the time 
 built-in. When given mixed-type or non-time-based UUIDs, such code will 
 break. The purpose of UUID schemes is generating globally unique identifiers, 
 not interpreting them. Various types exist just because it's practical to 
 generate UUIDs differently in different contexts.

Good point, the bias is towards producers, but still i can imagine time-based 
UUIDs are quite useful for consumers and i think implicitly such users would 
reasonably expect them to be ordered in some manner based on the time.


 Programs should not try to extract business information from UUIDs (except 
 probably for hunting down virus authors: 
 http://en.wikipedia.org/wiki/Melissa_%28computer_virus%29). So I think a good 
 general-purpose compareTo() method should try to discourage such usage for 
 example by reversing the bits of the UUID value before comparing. This would 
 also make algorithms that order UUIDs for facilitating quick access (B-Tree, 
 Red-Black-Tree, ...) happier.
 

Interesting, i suppose if those N values are randomly shuffled then added it 
may get close to reversed bit result, i should probably try it :-)

Paul.


Re: UUID.compareTo broken?

2014-04-10 Thread Steven Schlansker

On Apr 9, 2014, at 2:21 AM, Paul Sandoz paul.san...@oracle.com wrote:

 On Apr 8, 2014, at 9:15 PM, Mike Duigou mike.dui...@oracle.com wrote:
 
 That seems a terribly broken usage of UUID for 128 bit numbers or a pair of 
 signed 64 bit numbers :-)
 
 Part of me thinks we should not be supporting such broken usage. Might be 
 worth getting some usage data from grepcode or maven central.
 

I’m guilty of doing this at a point in the past.  We used it to intermix 
multiple sources of data - a few that used ‘long’ IDs and a few that used 
‘real’ UUIDs.  We took prefixes that had flags never generated by UUID 
libraries (the “reserved for compatibility with Microsoft” ones IIRC) and 
slapped those with a long to make pseudo-UUIDs.  That way everything was a UUID 
and we were guaranteed to never see collisions within our dataset.

We never expected them to give sensible meanings to the various getters e.g. 
timestamp() but we did expect the UUID class to work generally.
We never relied on any particular ordering.

 
 We could provide static methods to return appropriate comparators for 
 version 1 and version 2 UUIDs--I've actually written them before for other 
 projects.
 
 
 It would be nice to just have one compareTo that does the right thing based 
 of the UUID types being compared.
 
 If it were up to me only the time and DCE UUIDs would be comparable, there's 
 no ordering relationship for other versions.
 
 I think it is fine for random UUIDs to be comparable with each other.
 
 
 The comparators I've considered adding would only allow comparisons within 
 the same version.
 
 Yes, although for a general comparator the primary sort key could be the 
 version value.

+1 to a “sort first by version then version-specific ordering” — it gives you 
the best of both worlds IMO.

I think the natural ordering for UUIDs must be able to create an ordering over 
all possible UUIDs, no matter the version or even if it is valid or not.  If 
you read UUIDs from an external source you have no way to understand what 
version they are or aren’t.  Imagine a process that loads data from a database 
or text file into a TreeMap - it would be awful if a change in the UUID 
generation scheme on the far side caused the Comparator you used to no longer 
function.



Re: UUID.compareTo broken?

2014-04-09 Thread Paul Sandoz
On Apr 8, 2014, at 9:15 PM, Mike Duigou mike.dui...@oracle.com wrote:
 For the case of incorrect signed comparison is it sticking around because 
 there is code dependent on the current broken behaviour?
 
 Probably even if the dependence is implicit such as expecting a particular 
 iteration order
 
 or do you think it would be possible to change that?
 
 I requested fixing the compareTo long ago (back when I was a mere user of the 
 JDK) and was told that it could not be changed.

Any more details on that?


 
 i have my suspicious that code dependent compareTo may not break if the 
 total ordering changes, since many cases are likely to be for randomly 
 generated UUIDs.
 
 I agree that most code would probably be fine. I worry though that there will 
 be cases that either explicitly or implicitly depend upon current ordering. 
 Surprisingly there a lot of cases where UUIDs are created using the 
 UUID(long, long) constructor with non-random data such as incrementing 
 numbers:
 
 UUID myUUID = new UUID(SOME_CONSTANT, atomicSerial.incrementAndGet());
 

That seems a terribly broken usage of UUID for 128 bit numbers or a pair of 
signed 64 bit numbers :-)

Part of me thinks we should not be supporting such broken usage. Might be worth 
getting some usage data from grepcode or maven central.


 This usage exists, I believe, because we haven't provided direct support for 
 MAC version UUIDs. Unfortunately the UUID(long,long) constructor does not 
 check that the UUID constructed is even valid and usually they are not. This 
 usage, while egregious, is common. These people would probably be impacted by 
 changes in ordering.
 

For correct usage the lsb long will for practical purposes always be -ve, since 
the msb is 1, as per the variant [1].

For time-based (MAC) UUIDs the comparison is totally broken as the time_low 
field bits will cycle often (~ every 3.5 mins by my calculation), so i doubt 
anyone would be using the comparator for such UUIDs to derive any meaningful 
order, for example, try this:

UUID a = UUID.fromString(a938c470-bfc1-11e3-8a33-0800200c9a66);
UUID b = UUID.fromString(14857ca0-bfc2-11e3-8a33-0800200c9a66);
UUID c = UUID.fromString(d7a34910-bfc2-11e3-8a33-0800200c9a66);

System.out.println(a.timestamp()  b.timestamp());
System.out.println(b.timestamp()  c.timestamp());
System.out.println(a.timestamp()  c.timestamp());

System.out.println(a.compareTo(b));
System.out.println(b.compareTo(c));
System.out.println(a.compareTo(c));



 We could provide static methods to return appropriate comparators for 
 version 1 and version 2 UUIDs--I've actually written them before for other 
 projects.
 
 
 It would be nice to just have one compareTo that does the right thing based 
 of the UUID types being compared.
 
 If it were up to me only the time and DCE UUIDs would be comparable, there's 
 no ordering relationship for other versions.

I think it is fine for random UUIDs to be comparable with each other.


 The comparators I've considered adding would only allow comparisons within 
 the same version.
 

Yes, although for a general comparator the primary sort key could be the 
version value.


 I also note that UUID does not currently support version 5, SHA-1, which it 
 should.
 
 I am hoping to do other performance work on UUID within the scope of Java 
 9. Adding additional Comparators would fit right in with that.
 
 
 OK, although it might be nice to bash this on the head sooner? as it might 
 be possible to get into an 8u release depending on what we conclude about 
 backwards compatibility.
 
 Any change to the ordering of the compareTo would have to happen in a major 
 version if at all. I am very reluctant to change this behaviour when 
 providing alternative comparators might just be a better choice.
 

Here is an alternative, deprecate UUID and create a new UUID class that is not 
broken. If we cannot fix the existing class lets just recognize it for what it 
is, a defective implementation for 128 bit numbers.

Paul.

[1] https://tools.ietf.org/html/rfc4122#section-4.1.1


Re: UUID.compareTo broken?

2014-04-08 Thread Paul Sandoz

On Apr 7, 2014, at 7:23 PM, Mike Duigou mike.dui...@oracle.com wrote:

 The issue is that the comparison is done upon the signed most significant and 
 least significant long values.
 
 At minimum UUIDs should be compared as if they were 128-bit unsigned values.
 

Thanks.


 Beyond that, version (which is a type not a version) 1 and 2 UUIDs (time 
 based and DCE), should have a more sophisticated comparison which orders the 
 UUIDs by their creation time.
 
 This is one of those cases where sloppiness/laziness/ignorance in the 
 original implementation version sticks around forever.
 

For the case of incorrect signed comparison is it sticking around because there 
is code dependent on the current broken behaviour? or do you think it would be 
possible to change that? i have my suspicious that code dependent compareTo may 
not break if the total ordering changes, since many cases are likely to be for 
randomly generated UUIDs.


 We could provide static methods to return appropriate comparators for version 
 1 and version 2 UUIDs--I've actually written them before for other projects.
 

It would be nice to just have one compareTo that does the right thing based of 
the UUID types being compared.


 I also note that UUID does not currently support version 5, SHA-1, which it 
 should.
 
 I am hoping to do other performance work on UUID within the scope of Java 9. 
 Adding additional Comparators would fit right in with that.
 

OK, although it might be nice to bash this on the head sooner? as it might be 
possible to get into an 8u release depending on what we conclude about 
backwards compatibility.

Paul.


Re: UUID.compareTo broken?

2014-04-08 Thread Steven Schlansker

On Apr 8, 2014, at 1:03 AM, Paul Sandoz paul.san...@oracle.com wrote:

 
 On Apr 7, 2014, at 7:23 PM, Mike Duigou mike.dui...@oracle.com wrote:
 
 I also note that UUID does not currently support version 5, SHA-1, which it 
 should.
 
 I am hoping to do other performance work on UUID within the scope of Java 9. 
 Adding additional Comparators would fit right in with that.
 
 
 OK, although it might be nice to bash this on the head sooner? as it might be 
 possible to get into an 8u release depending on what we conclude about 
 backwards compatibility.

If random-end-user opinions count for anything, I would strongly +1 for an 8u 
update with this.  We are still porting around routines to bypass UUID’s 
from/toString because they are embarrassingly inefficient, and it would be nice 
to ditch that code before 9 ships :)



Re: UUID.compareTo broken?

2014-04-08 Thread Mike Duigou
I am targeting to have the performance improvements you contributed to UUID 
into 8u40 (around the end of the year). I expect to contribute the work into 
OpenJDK 9 in June-Julyish.

Mike

On Apr 8 2014, at 09:34 , Steven Schlansker stevenschlans...@gmail.com wrote:

 
 On Apr 8, 2014, at 1:03 AM, Paul Sandoz paul.san...@oracle.com wrote:
 
 
 On Apr 7, 2014, at 7:23 PM, Mike Duigou mike.dui...@oracle.com wrote:
 
 I also note that UUID does not currently support version 5, SHA-1, which it 
 should.
 
 I am hoping to do other performance work on UUID within the scope of Java 
 9. Adding additional Comparators would fit right in with that.
 
 
 OK, although it might be nice to bash this on the head sooner? as it might 
 be possible to get into an 8u release depending on what we conclude about 
 backwards compatibility.
 
 If random-end-user opinions count for anything, I would strongly +1 for an 8u 
 update with this.  We are still porting around routines to bypass UUID’s 
 from/toString because they are embarrassingly inefficient, and it would be 
 nice to ditch that code before 9 ships :)
 



Re: UUID.compareTo broken?

2014-04-08 Thread Mike Duigou

On Apr 8 2014, at 01:03 , Paul Sandoz paul.san...@oracle.com wrote:

 
 On Apr 7, 2014, at 7:23 PM, Mike Duigou mike.dui...@oracle.com wrote:
 
 The issue is that the comparison is done upon the signed most significant 
 and least significant long values.
 
 At minimum UUIDs should be compared as if they were 128-bit unsigned values.
 
 
 Thanks.
 
 
 Beyond that, version (which is a type not a version) 1 and 2 UUIDs (time 
 based and DCE), should have a more sophisticated comparison which orders the 
 UUIDs by their creation time.
 
 This is one of those cases where sloppiness/laziness/ignorance in the 
 original implementation version sticks around forever.
 
 
 For the case of incorrect signed comparison is it sticking around because 
 there is code dependent on the current broken behaviour?

Probably even if the dependence is implicit such as expecting a particular 
iteration order

 or do you think it would be possible to change that?

I requested fixing the compareTo long ago (back when I was a mere user of the 
JDK) and was told that it could not be changed.

 i have my suspicious that code dependent compareTo may not break if the total 
 ordering changes, since many cases are likely to be for randomly generated 
 UUIDs.

I agree that most code would probably be fine. I worry though that there will 
be cases that either explicitly or implicitly depend upon current ordering. 
Surprisingly there a lot of cases where UUIDs are created using the UUID(long, 
long) constructor with non-random data such as incrementing numbers:

 UUID myUUID = new UUID(SOME_CONSTANT, atomicSerial.incrementAndGet());

This usage exists, I believe, because we haven't provided direct support for 
MAC version UUIDs. Unfortunately the UUID(long,long) constructor does not check 
that the UUID constructed is even valid and usually they are not. This usage, 
while egregious, is common. These people would probably be impacted by changes 
in ordering.

 We could provide static methods to return appropriate comparators for 
 version 1 and version 2 UUIDs--I've actually written them before for other 
 projects.
 
 
 It would be nice to just have one compareTo that does the right thing based 
 of the UUID types being compared.

If it were up to me only the time and DCE UUIDs would be comparable, there's no 
ordering relationship for other versions. The comparators I've considered 
adding would only allow comparisons within the same version.

 I also note that UUID does not currently support version 5, SHA-1, which it 
 should.
 
 I am hoping to do other performance work on UUID within the scope of Java 9. 
 Adding additional Comparators would fit right in with that.
 
 
 OK, although it might be nice to bash this on the head sooner? as it might be 
 possible to get into an 8u release depending on what we conclude about 
 backwards compatibility.

Any change to the ordering of the compareTo would have to happen in a major 
version if at all. I am very reluctant to change this behaviour when providing 
alternative comparators might just be a better choice.

Mike

Re: UUID.compareTo broken?

2014-04-07 Thread Mike Duigou
The issue is that the comparison is done upon the signed most significant and 
least significant long values.

At minimum UUIDs should be compared as if they were 128-bit unsigned values.

Beyond that, version (which is a type not a version) 1 and 2 UUIDs (time 
based and DCE), should have a more sophisticated comparison which orders the 
UUIDs by their creation time.

This is one of those cases where sloppiness/laziness/ignorance in the original 
implementation version sticks around forever.

We could provide static methods to return appropriate comparators for version 1 
and version 2 UUIDs--I've actually written them before for other projects.

I also note that UUID does not currently support version 5, SHA-1, which it 
should.

I am hoping to do other performance work on UUID within the scope of Java 9. 
Adding additional Comparators would fit right in with that.

Mike

On Apr 7 2014, at 03:49 , Paul Sandoz paul.san...@oracle.com wrote:

 Hi,
 
 https://github.com/cowtowncoder/java-uuid-generator
 
 JDK's java.util.UUID has flawed implementation of compareTo(), which uses 
 naive comparison of 64-bit values. 
 
 Anyone familiar with the JDK UUID implementation care to comment?
 
 Paul.