Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-09-29 Thread Claes Redestad


On 2017-09-28 16:22, Jason Mehrens wrote:

Admittedly this is more of a Claes Redestad faster boot up type change but, the 
use of the AlphaDecimalComparator constructor in Comparator might force the 
bytecode verifier to force load AlphaDecimalComparator just to check the return 
type.

If you make factory methods in AlphaDecimalComparator you can usually craft the 
return type to match and the verify of Comparator will not force load the 
AlphaDecimalComparator.
Otherwise, the more ugly solution is to wrap 'new' with Comparator.class.cast.

Jason


... you called? :-)

You're probably right in the general case, but I think we avoid this 
particular startup issue here since the (HotSpot) bytecode verifier by 
default doesn't check classes defined to the the boot loader.


/Claes



Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-09-29 Thread Jason Mehrens
Makes sense on the sub-sequences having to be string.  Looks good.

Jason


From: Ivan Gerasimov <ivan.gerasi...@oracle.com>
Sent: Thursday, September 28, 2017 3:19 PM
To: Jason Mehrens; core-libs-dev
Subject: Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

Thank you Jason for valuable input!


On 9/28/17 7:22 AM, Jason Mehrens wrote:
> Ivan,
>
> Am I correct that subtraction of code points is safe from underflow due to 
> the range of Character.MIN_CODE_POINT and Character.MAX_CODE_POINT?  That 
> would explain why you are not using Integer.compare.
Right.  Since codepoints are all positive, it should be safe to subtract
them.

> One alternative to calling CharSequence.subSequence is to use 
> CharBuffer.wrap(CharSequence csq, int start, int end).  The sub-sequences are 
> short lived so that may be faster in the String case.
Yes, it's a good point, given how expensive String.subSequence() is.
The disadvantage of using CharBuffer.wrap() is that it wouldn't be
possible to use the standard String comparators as the custom
alphaComparator.
With the current implementation, it is possible (with additional casts,
of course) because we know that String.subSequence returns a String object.

To be honest, I see no easy way to optimize the String case with the use
of a custom comparator because we need to pass to it portions of the
input as String objects.
One possibility might be to recognize common cases, like using the
standard comparators (for example, String.CASE_INSENSITIVE_ORDER) in
place of the custom alphaComparator, and then execute special variant of
the routine, which doesn't create explicit subsequences, but works on
the entire sequence instead.

At this point, however, it's probably too early to do such optimizations.

> Admittedly this is more of a Claes Redestad faster boot up type change but, 
> the use of the AlphaDecimalComparator constructor in Comparator might force 
> the bytecode verifier to force load AlphaDecimalComparator just to check the 
> return type.
Hm, interesting.
We can introduce another factory method in the Comparators class, so
that AlphaDecimalComparator won't be mentioned in the Comparator at all.

Please see the updated webrev:
http://cr.openjdk.java.net/~igerasim/8134512/05/webrev/

With kind regards,
Ivan


> If you make factory methods in AlphaDecimalComparator you can usually craft 
> the return type to match and the verify of Comparator will not force load the 
> AlphaDecimalComparator.
> Otherwise, the more ugly solution is to wrap 'new' with Comparator.class.cast.
>
> Jason
>
>
>
> 
> From: core-libs-dev <core-libs-dev-boun...@openjdk.java.net> on behalf of 
> Ivan Gerasimov <ivan.gerasi...@oracle.com>
> Sent: Monday, September 25, 2017 12:49 PM
> To: core-libs-dev
> Subject: Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator
>
> Hello!
>
> Could you please review at your convenience?
>
> In the latest webrev I took all suggestions into account (unless I
> missed something.)
>
> http://cr.openjdk.java.net/~igerasim/8134512/04/webrev/
>
>
> I think, if the suggested comparator is found useful by the users, then
> it may make sense to create the String-oriented variant, which can be
> implemented through the CharSequence-oriented one as:
>
> class String {
>   ...
>   @SuppressWarnings("unchecked")
>   public static  Comparator
>   comparingAlphaDecimal(Comparator alphaComparator) {
>   return (Comparator) (Comparator)
>   new
> Comparators.AlphaDecimalComparator<>(Objects.requireNonNull(
>   (Comparator) alphaComparator),
> false);
>   }
> }
>
> This will be safe, since the specification guarantees that
> String.subSequence() returns a String.
>
> Then in the application code it would be possible to instantiate the
> comparators as
>
>   String.comparingAlphaDecimal(String::compareTo);
>
>   String.comparingAlphaDecimal(String::compareToIgnoreCase);
>
> or, alternatively,
>   String.comparingAlphaDecimal(Comparator.naturalOrder());
>
> String.comparingAlphaDecimal(String.CASE_INSENSITIVE_ORDER);
>
> But this could be deferred for later, of course.
>
> With kind regards,
> Ivan
>
>
> On 8/27/17 1:38 PM, Ivan Gerasimov wrote:
>> Hello everyone!
>>
>> Here's another iteration of the comparator with suggested improvements.
>>
>> Now, there is the only input argument -- the alpha-comparator for
>> comparing the non-decimal-digit sub-sequences.
>>
>> For the javadoc I used the text suggested by Peter with some
>> modifications, additional example and API/imp

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-09-28 Thread Ivan Gerasimov

Thank you Jason for valuable input!


On 9/28/17 7:22 AM, Jason Mehrens wrote:

Ivan,

Am I correct that subtraction of code points is safe from underflow due to the 
range of Character.MIN_CODE_POINT and Character.MAX_CODE_POINT?  That would 
explain why you are not using Integer.compare.
Right.  Since codepoints are all positive, it should be safe to subtract 
them.



One alternative to calling CharSequence.subSequence is to use 
CharBuffer.wrap(CharSequence csq, int start, int end).  The sub-sequences are 
short lived so that may be faster in the String case.

Yes, it's a good point, given how expensive String.subSequence() is.
The disadvantage of using CharBuffer.wrap() is that it wouldn't be 
possible to use the standard String comparators as the custom 
alphaComparator.
With the current implementation, it is possible (with additional casts, 
of course) because we know that String.subSequence returns a String object.


To be honest, I see no easy way to optimize the String case with the use 
of a custom comparator because we need to pass to it portions of the 
input as String objects.
One possibility might be to recognize common cases, like using the 
standard comparators (for example, String.CASE_INSENSITIVE_ORDER) in 
place of the custom alphaComparator, and then execute special variant of 
the routine, which doesn't create explicit subsequences, but works on 
the entire sequence instead.


At this point, however, it's probably too early to do such optimizations.


Admittedly this is more of a Claes Redestad faster boot up type change but, the 
use of the AlphaDecimalComparator constructor in Comparator might force the 
bytecode verifier to force load AlphaDecimalComparator just to check the return 
type.

Hm, interesting.
We can introduce another factory method in the Comparators class, so 
that AlphaDecimalComparator won't be mentioned in the Comparator at all.


Please see the updated webrev:
http://cr.openjdk.java.net/~igerasim/8134512/05/webrev/

With kind regards,
Ivan



If you make factory methods in AlphaDecimalComparator you can usually craft the 
return type to match and the verify of Comparator will not force load the 
AlphaDecimalComparator.
Otherwise, the more ugly solution is to wrap 'new' with Comparator.class.cast.

Jason




From: core-libs-dev <core-libs-dev-boun...@openjdk.java.net> on behalf of Ivan 
Gerasimov <ivan.gerasi...@oracle.com>
Sent: Monday, September 25, 2017 12:49 PM
To: core-libs-dev
Subject: Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

Hello!

Could you please review at your convenience?

In the latest webrev I took all suggestions into account (unless I
missed something.)

http://cr.openjdk.java.net/~igerasim/8134512/04/webrev/


I think, if the suggested comparator is found useful by the users, then
it may make sense to create the String-oriented variant, which can be
implemented through the CharSequence-oriented one as:

class String {
  ...
  @SuppressWarnings("unchecked")
  public static  Comparator
  comparingAlphaDecimal(Comparator alphaComparator) {
  return (Comparator) (Comparator)
  new
Comparators.AlphaDecimalComparator<>(Objects.requireNonNull(
  (Comparator) alphaComparator),
false);
  }
}

This will be safe, since the specification guarantees that
String.subSequence() returns a String.

Then in the application code it would be possible to instantiate the
comparators as

  String.comparingAlphaDecimal(String::compareTo);

  String.comparingAlphaDecimal(String::compareToIgnoreCase);

or, alternatively,
  String.comparingAlphaDecimal(Comparator.naturalOrder());

String.comparingAlphaDecimal(String.CASE_INSENSITIVE_ORDER);

But this could be deferred for later, of course.

With kind regards,
Ivan


On 8/27/17 1:38 PM, Ivan Gerasimov wrote:

Hello everyone!

Here's another iteration of the comparator with suggested improvements.

Now, there is the only input argument -- the alpha-comparator for
comparing the non-decimal-digit sub-sequences.

For the javadoc I used the text suggested by Peter with some
modifications, additional example and API/implementation notes.
Overall, the javadoc looks heavier than need to me, so I'd love to
hear comments about how to make it shorter and cleaner.

Also, I adopted the name AlphaDecimal, suggested by Peter.  This name
is one of popular in the list of variants found in the wild. So, there
are higher chances the users can find the routine by its name.

For testing if a code point is a decimal digit, I used
(Character.getType(cp) == Character.DECIMAL_DIGIT_NUMBER), which seem
to be more appropriate than Character.isDigit().  (The later is true
for things like a digit in a circle, superscript, etc., which do not
seem to be a part of a decimal number composed of several digits.)

The updated webrev:
http://cr.openjdk.java.net/~igerasim/81345

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-09-28 Thread Jason Mehrens
Ivan,

Am I correct that subtraction of code points is safe from underflow due to the 
range of Character.MIN_CODE_POINT and Character.MAX_CODE_POINT?  That would 
explain why you are not using Integer.compare.

One alternative to calling CharSequence.subSequence is to use 
CharBuffer.wrap(CharSequence csq, int start, int end).  The sub-sequences are 
short lived so that may be faster in the String case.

Admittedly this is more of a Claes Redestad faster boot up type change but, the 
use of the AlphaDecimalComparator constructor in Comparator might force the 
bytecode verifier to force load AlphaDecimalComparator just to check the return 
type.

If you make factory methods in AlphaDecimalComparator you can usually craft the 
return type to match and the verify of Comparator will not force load the 
AlphaDecimalComparator.
Otherwise, the more ugly solution is to wrap 'new' with Comparator.class.cast.

Jason




From: core-libs-dev <core-libs-dev-boun...@openjdk.java.net> on behalf of Ivan 
Gerasimov <ivan.gerasi...@oracle.com>
Sent: Monday, September 25, 2017 12:49 PM
To: core-libs-dev
Subject: Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

Hello!

Could you please review at your convenience?

In the latest webrev I took all suggestions into account (unless I
missed something.)

http://cr.openjdk.java.net/~igerasim/8134512/04/webrev/


I think, if the suggested comparator is found useful by the users, then
it may make sense to create the String-oriented variant, which can be
implemented through the CharSequence-oriented one as:

class String {
 ...
 @SuppressWarnings("unchecked")
 public static  Comparator
 comparingAlphaDecimal(Comparator alphaComparator) {
 return (Comparator) (Comparator)
 new
Comparators.AlphaDecimalComparator<>(Objects.requireNonNull(
 (Comparator) alphaComparator),
false);
 }
}

This will be safe, since the specification guarantees that
String.subSequence() returns a String.

Then in the application code it would be possible to instantiate the
comparators as

 String.comparingAlphaDecimal(String::compareTo);

 String.comparingAlphaDecimal(String::compareToIgnoreCase);

or, alternatively,
 String.comparingAlphaDecimal(Comparator.naturalOrder());

String.comparingAlphaDecimal(String.CASE_INSENSITIVE_ORDER);

But this could be deferred for later, of course.

With kind regards,
Ivan


On 8/27/17 1:38 PM, Ivan Gerasimov wrote:
> Hello everyone!
>
> Here's another iteration of the comparator with suggested improvements.
>
> Now, there is the only input argument -- the alpha-comparator for
> comparing the non-decimal-digit sub-sequences.
>
> For the javadoc I used the text suggested by Peter with some
> modifications, additional example and API/implementation notes.
> Overall, the javadoc looks heavier than need to me, so I'd love to
> hear comments about how to make it shorter and cleaner.
>
> Also, I adopted the name AlphaDecimal, suggested by Peter.  This name
> is one of popular in the list of variants found in the wild. So, there
> are higher chances the users can find the routine by its name.
>
> For testing if a code point is a decimal digit, I used
> (Character.getType(cp) == Character.DECIMAL_DIGIT_NUMBER), which seem
> to be more appropriate than Character.isDigit().  (The later is true
> for things like a digit in a circle, superscript, etc., which do not
> seem to be a part of a decimal number composed of several digits.)
>
> The updated webrev:
> http://cr.openjdk.java.net/~igerasim/8134512/04/webrev/
>
> Please review at your convenience.
>
> With kind regards,
> Ivan
>
> On 8/9/17 4:59 PM, Stuart Marks wrote:
>> On 8/1/17 11:56 PM, Ivan Gerasimov wrote:
>>> I've tried to go one step further and created even more abstract
>>> comparator:  It
>>> uses a supplied predicate to decompose the input sequences into
>>> odd/even
>>> subsequences (e.g. alpha/numeric) and then uses two separate
>>> comparator to
>>> compare them. Additionally, a comparator for comparing sequences,
>>> consisting
>>> only of digits is provided. For example, to build a case-insensitive
>>> AlphaDecimal comparator one could use: 1) Character::isDigit -- as
>>> the predicate
>>> for decomposing, 2) String::compareToIgnoreCase -- to compare alpha
>>> (i.e. odd
>>> parts); to work with CharSequences one would need to make it
>>> Comparator.comparing(CharSequence::toString,
>>> String::compareToIgnoreCase), 3)
>>> The special decimal-only comparator, which compares the decimal
>>> representation
>>> of the sequences. Here's the file with all the comparators and a
&g

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-09-25 Thread Ivan Gerasimov

Hello!

Could you please review at your convenience?

In the latest webrev I took all suggestions into account (unless I 
missed something.)


http://cr.openjdk.java.net/~igerasim/8134512/04/webrev/


I think, if the suggested comparator is found useful by the users, then 
it may make sense to create the String-oriented variant, which can be 
implemented through the CharSequence-oriented one as:


class String {
...
@SuppressWarnings("unchecked")
public static  Comparator
comparingAlphaDecimal(Comparator alphaComparator) {
return (Comparator) (Comparator)
new 
Comparators.AlphaDecimalComparator<>(Objects.requireNonNull(
(Comparator) alphaComparator), 
false);

}
}

This will be safe, since the specification guarantees that 
String.subSequence() returns a String.


Then in the application code it would be possible to instantiate the 
comparators as


String.comparingAlphaDecimal(String::compareTo);

String.comparingAlphaDecimal(String::compareToIgnoreCase);

or, alternatively,
String.comparingAlphaDecimal(Comparator.naturalOrder());

String.comparingAlphaDecimal(String.CASE_INSENSITIVE_ORDER);

But this could be deferred for later, of course.

With kind regards,
Ivan


On 8/27/17 1:38 PM, Ivan Gerasimov wrote:

Hello everyone!

Here's another iteration of the comparator with suggested improvements.

Now, there is the only input argument -- the alpha-comparator for 
comparing the non-decimal-digit sub-sequences.


For the javadoc I used the text suggested by Peter with some 
modifications, additional example and API/implementation notes. 
Overall, the javadoc looks heavier than need to me, so I'd love to 
hear comments about how to make it shorter and cleaner.


Also, I adopted the name AlphaDecimal, suggested by Peter.  This name 
is one of popular in the list of variants found in the wild. So, there 
are higher chances the users can find the routine by its name.


For testing if a code point is a decimal digit, I used 
(Character.getType(cp) == Character.DECIMAL_DIGIT_NUMBER), which seem 
to be more appropriate than Character.isDigit().  (The later is true 
for things like a digit in a circle, superscript, etc., which do not 
seem to be a part of a decimal number composed of several digits.)


The updated webrev:
http://cr.openjdk.java.net/~igerasim/8134512/04/webrev/

Please review at your convenience.

With kind regards,
Ivan

On 8/9/17 4:59 PM, Stuart Marks wrote:

On 8/1/17 11:56 PM, Ivan Gerasimov wrote:
I've tried to go one step further and created even more abstract 
comparator:  It
uses a supplied predicate to decompose the input sequences into 
odd/even
subsequences (e.g. alpha/numeric) and then uses two separate 
comparator to
compare them. Additionally, a comparator for comparing sequences, 
consisting

only of digits is provided. For example, to build a case-insensitive
AlphaDecimal comparator one could use: 1) Character::isDigit -- as 
the predicate
for decomposing, 2) String::compareToIgnoreCase -- to compare alpha 
(i.e. odd

parts); to work with CharSequences one would need to make it
Comparator.comparing(CharSequence::toString, 
String::compareToIgnoreCase), 3)
The special decimal-only comparator, which compares the decimal 
representation
of the sequences. Here's the file with all the comparators and a 
simple test:

http://cr.openjdk.java.net/~igerasim/8134512/test/Test.java


Hi, a couple follow-up thoughts on this.

1) Supplementary characters

The current code uses Character.isDigit(char), which works only for 
char values in the BMP (basic multilingual plane, values <= U+). 
It won't work for supplementary characters. There are several blocks 
of digits in the BMP, but there are several more in the supplementary 
character range.


I don't see any reason not to handle the supplementary characters as 
well, except that it spoils the nice char-by-char technique of 
processing the string. Instead, it'd have to pull in code point 
values, which might be comprised of two surrogate chars. There are a 
variety of methods on Character that help with this. Note that there 
is an overload Character.isDigit(int) which takes any code point 
value, including supplementary characters.


2) Too much generality?

This version includes Predicate for determining whether a 
character is part of the alphabetic or decimal portion of the string. 
I'm thinking this might be overkill. It might be sufficient to 
"hardwire" the partitioning predicate to be Character::isDigit and 
the value mapping function to use Character::digit.


The problem is that adding a predicate opens the door to a lot more 
complexity, while providing dimishing value. First, the predicate 
would have to handle code points (per the above) so it'd need to be 
an IntPredicate. Second, there would also need to be a mapping 
function from the code point value to a numeric value. This might be 
an IntUnaryOperator. This would allow someone to 

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-08-27 Thread Ivan Gerasimov

Hello everyone!

Here's another iteration of the comparator with suggested improvements.

Now, there is the only input argument -- the alpha-comparator for 
comparing the non-decimal-digit sub-sequences.


For the javadoc I used the text suggested by Peter with some 
modifications, additional example and API/implementation notes. Overall, 
the javadoc looks heavier than need to me, so I'd love to hear comments 
about how to make it shorter and cleaner.


Also, I adopted the name AlphaDecimal, suggested by Peter.  This name is 
one of popular in the list of variants found in the wild. So, there are 
higher chances the users can find the routine by its name.


For testing if a code point is a decimal digit, I used 
(Character.getType(cp) == Character.DECIMAL_DIGIT_NUMBER), which seem to 
be more appropriate than Character.isDigit().  (The later is true for 
things like a digit in a circle, superscript, etc., which do not seem to 
be a part of a decimal number composed of several digits.)


The updated webrev:
http://cr.openjdk.java.net/~igerasim/8134512/04/webrev/

Please review at your convenience.

With kind regards,
Ivan

On 8/9/17 4:59 PM, Stuart Marks wrote:

On 8/1/17 11:56 PM, Ivan Gerasimov wrote:
I've tried to go one step further and created even more abstract 
comparator:  It

uses a supplied predicate to decompose the input sequences into odd/even
subsequences (e.g. alpha/numeric) and then uses two separate 
comparator to
compare them. Additionally, a comparator for comparing sequences, 
consisting

only of digits is provided. For example, to build a case-insensitive
AlphaDecimal comparator one could use: 1) Character::isDigit -- as 
the predicate
for decomposing, 2) String::compareToIgnoreCase -- to compare alpha 
(i.e. odd

parts); to work with CharSequences one would need to make it
Comparator.comparing(CharSequence::toString, 
String::compareToIgnoreCase), 3)
The special decimal-only comparator, which compares the decimal 
representation
of the sequences. Here's the file with all the comparators and a 
simple test:

http://cr.openjdk.java.net/~igerasim/8134512/test/Test.java


Hi, a couple follow-up thoughts on this.

1) Supplementary characters

The current code uses Character.isDigit(char), which works only for 
char values in the BMP (basic multilingual plane, values <= U+). 
It won't work for supplementary characters. There are several blocks 
of digits in the BMP, but there are several more in the supplementary 
character range.


I don't see any reason not to handle the supplementary characters as 
well, except that it spoils the nice char-by-char technique of 
processing the string. Instead, it'd have to pull in code point 
values, which might be comprised of two surrogate chars. There are a 
variety of methods on Character that help with this. Note that there 
is an overload Character.isDigit(int) which takes any code point 
value, including supplementary characters.


2) Too much generality?

This version includes Predicate for determining whether a 
character is part of the alphabetic or decimal portion of the string. 
I'm thinking this might be overkill. It might be sufficient to 
"hardwire" the partitioning predicate to be Character::isDigit and the 
value mapping function to use Character::digit.


The problem is that adding a predicate opens the door to a lot more 
complexity, while providing dimishing value. First, the predicate 
would have to handle code points (per the above) so it'd need to be an 
IntPredicate. Second, there would also need to be a mapping function 
from the code point value to a numeric value. This might be an 
IntUnaryOperator. This would allow someone to sort based on Roman 
numerals, using Character::getNumericValue. (Yes, Roman numerals are 
in Unicode.) Or maybe the mapping function should return any 
Comparable value, not an int. ... See where I'm going here?


Since this kind of sorting is intended to be viewed by people, it's 
probably worth providing full internationalization support 
(supplementary characters, and delegation to sub-comparators, to allow 
locale-specific collating sequences). But I start to question any 
complexity beyond that.


s'marks



--
With kind regards,
Ivan Gerasimov



Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-08-09 Thread Stuart Marks

On 8/1/17 11:56 PM, Ivan Gerasimov wrote:

I've tried to go one step further and created even more abstract comparator:  It
uses a supplied predicate to decompose the input sequences into odd/even
subsequences (e.g. alpha/numeric) and then uses two separate comparator to
compare them. Additionally, a comparator for comparing sequences, consisting
only of digits is provided. For example, to build a case-insensitive
AlphaDecimal comparator one could use: 1) Character::isDigit -- as the predicate
for decomposing, 2) String::compareToIgnoreCase -- to compare alpha (i.e. odd
parts); to work with CharSequences one would need to make it
Comparator.comparing(CharSequence::toString, String::compareToIgnoreCase), 3)
The special decimal-only comparator, which compares the decimal representation
of the sequences. Here's the file with all the comparators and a simple test:
http://cr.openjdk.java.net/~igerasim/8134512/test/Test.java


Hi, a couple follow-up thoughts on this.

1) Supplementary characters

The current code uses Character.isDigit(char), which works only for char values 
in the BMP (basic multilingual plane, values <= U+). It won't work for 
supplementary characters. There are several blocks of digits in the BMP, but 
there are several more in the supplementary character range.


I don't see any reason not to handle the supplementary characters as well, 
except that it spoils the nice char-by-char technique of processing the string. 
Instead, it'd have to pull in code point values, which might be comprised of two 
surrogate chars. There are a variety of methods on Character that help with 
this. Note that there is an overload Character.isDigit(int) which takes any code 
point value, including supplementary characters.


2) Too much generality?

This version includes Predicate for determining whether a character 
is part of the alphabetic or decimal portion of the string. I'm thinking this 
might be overkill. It might be sufficient to "hardwire" the partitioning 
predicate to be Character::isDigit and the value mapping function to use 
Character::digit.


The problem is that adding a predicate opens the door to a lot more complexity, 
while providing dimishing value. First, the predicate would have to handle code 
points (per the above) so it'd need to be an IntPredicate. Second, there would 
also need to be a mapping function from the code point value to a numeric value. 
This might be an IntUnaryOperator. This would allow someone to sort based on 
Roman numerals, using Character::getNumericValue. (Yes, Roman numerals are in 
Unicode.) Or maybe the mapping function should return any Comparable value, not 
an int. ... See where I'm going here?


Since this kind of sorting is intended to be viewed by people, it's probably 
worth providing full internationalization support (supplementary characters, and 
delegation to sub-comparators, to allow locale-specific collating sequences). 
But I start to question any complexity beyond that.


s'marks


Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-08-02 Thread Ivan Gerasimov

On 7/28/17 12:16 PM, Jonathan Bluett-Duncan wrote:


Hi Ivan,

It looks like the MyComparator code example which you gave in your 
last email lost its formatting along the way, so I'm finding it 
difficult to read. Would you mind resubmitting it?




Oh, sorry about that.
I've just uploaded another modification of the comparator here:
http://cr.openjdk.java.net/~igerasim/8134512/test/Test.java

With kind regards,
Ivan


Cheers,
Jonathan

On 28 July 2017 at 17:25, Ivan Gerasimov > wrote:


Hi Peter!

Thank a lot for looking into this!

On 7/28/17 7:32 AM, Peter Levart wrote:

Hi Ivan,

In the light of what Stuart Marks wrote, then what do you
think about a parameterized comparator (would be sub-optimal,
I know) where one would supply
2 Comparator(s) which would be used to compare Ax and Nx
sub-sequences respectively as described below...

Yes.  Inspired by what Stuart suggested I made a draft of such a
comparator (see below).
It works, but as you've said it's not that efficient (mostly due
to expensive substrings) and a bit harder to use in a simple case.
Now I need to think about how to combine two approaches.

For Nx sub-sequences, one would need a comparator comparing
the reversed sequence lexicographically,

I'm not sure I understand why they need to be reversed.

but for Ax sub-sequences, one could choose from a plethora of
case-(in)sensitive comparator(s) and collators already
available on the platform.

Yes. In the example below I used compareToIgnoreCase to compare
alpha subsequences.

---

class MyComparator implements Comparator
{Comparator alphaCmp;Comparator
numCmp;MyComparator(Comparator alphaCmp,Comparator
numCmp) {this.alphaCmp = alphaCmp;this.numCmp = numCmp;}boolean
skipLeadingZeroes(String s, int len) {for (int i = 0; i < len ;
++i) {if (Character.digit(s.charAt(i), 10) != 0)return
false;}return true;}@Override public int compare(String o1, String
o2) {Supplier s1 = new NumberSlicer(o1);Supplier
s2 = new NumberSlicer(o2);while (true) {// alpha part String ss1 =
s1.get();String ss2 = s2.get();int cmp = alphaCmp.compare(ss1,
ss2);if (cmp != 0) return cmp;if (ss1.length() == 0) return 0;//
numeric part ss1 = s1.get();ss2 = s2.get();int len1 =
ss1.length();int len2 = ss2.length();int dlen = len1 - len2;if
(dlen > 0) {if (!skipLeadingZeroes(ss1, dlen))return 1;ss1 =
ss1.substring(dlen, len1);} else if (dlen < 0) {if
(!skipLeadingZeroes(ss2, -dlen))return -1;ss2 =
ss2.substring(-dlen, len2);}cmp = numCmp.compare(ss1, ss2);if (cmp
!= 0) return cmp;if (dlen != 0) return dlen;}}static class
NumberSlicer implements Supplier {private String
sequence;private boolean expectNumber = false;private int index =
0;NumberSlicer(String s) {sequence = s;}@Override public String
get() {int start = index, end = start, len = sequence.length() -
start;for (; len > 0; ++end, --len) {if
(Character.isDigit(sequence.ch arAt(end)) ^
expectNumber)break;}expectNumber = !expectNumber;return
sequence.substring(start, index = end);}}}Here how it
is invoked with case-insensitive comparator:Arrays.sort(str,new

MyComparator(Comparator.comparing(String::toString,String::compareToIgnoreCase),Comparator.comparing(String::toString,String::compareTo)));

simple test results for case insensitive sorting:java 1java 1
javajava 1 JAVAJava 2JAVA 5jaVA 6.1java 10java 10 v 13Java 10 v
013Java 10 v 13java 10 v 113Java 2017Java 2017Java 20017Java
200017Java 217Java 2017Java 2017Java 20017With
kind regards, Ivan




--
With kind regards,
Ivan Gerasimov



Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-08-02 Thread Ivan Gerasimov

Thank you Peter for sharing your thoughts and experiments!


On 7/29/17 10:22 AM, Peter Levart wrote:

Hi Ivan,

On 07/28/2017 06:25 PM, Ivan Gerasimov wrote:


Hi Peter!

Thank a lot for looking into this!

On 7/28/17 7:32 AM, Peter Levart wrote:

Hi Ivan,

In the light of what Stuart Marks wrote, then what do you think 
about a parameterized comparator (would be sub-optimal, I know) 
where one would supply
2 Comparator(s) which would be used to compare Ax and Nx 
sub-sequences respectively as described below...


Yes.  Inspired by what Stuart suggested I made a draft of such a 
comparator (see below).
It works, but as you've said it's not that efficient (mostly due to 
expensive substrings) and a bit harder to use in a simple case.

Now I need to think about how to combine two approaches.


There is a bug in MyComparator. Comparing "1" with "2" gives 0 
(equal). You must not return "equal" when both alpha prefixes are empty.


Yes, you're right.  Should have checked the length of numerical 
substrings, or just use a separate method for end-of-sequence.




For Nx sub-sequences, one would need a comparator comparing the 
reversed sequence lexicographically,

I'm not sure I understand why they need to be reversed.


Scrap that. I got confused.  :-o

but for Ax sub-sequences, one could choose from a plethora of 
case-(in)sensitive comparator(s) and collators already available on 
the platform.


Yes. In the example below I used compareToIgnoreCase to compare alpha 
subsequences.

...
Arrays.sort(str,new 
MyComparator(Comparator.comparing(String::toString,String::compareToIgnoreCase),Comparator.comparing(String::toString,String::compareTo)));
Substrings are by copy. There is another approach. Originally you 
created the NumericComparator to compare CharSequence(s) - not 
String(s) as you did with MyComparator. So why not keeping that? 
Well, it was mostly for illustrative purposes.  Of course the final code 
should deal with CharSequences, not Strings for generality. However, I 
assume that in most cases this comparator will need to work with 
Strings, and it's important to remember that String.subSequence() is as 
expensive as String.substring() [1] [1] 
http://docs.oracle.com/javase/8/docs/api/java/lang/String.html#subSequence-int-int- 

While talking about types, you don't need to have a generic parameter: 
NumericComparator implements 
CharSequence simple: NumericComparator implements 
Comparator is suitable for comparing any subtype of 
CharSequence as use-sites should accept Comparator or 
Comparator. 
But it wouldn't work in a case you need to pass the comparator to a 
function of form  fun(Comparator cmp).
For example: Stream sorted(Comparator comparator); 
You can use Comparator to sort Stream as well as 
Stream or Stream... I tried an approach 
where sub-sequences are CharSequence objects by delegation and use 
sub-comparators that don't convert CharSequence(s) to String(s). 
Here's what this looks like: 
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/AlphadecimalComparator.java 
with supporting sub-comparator classes: 
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/CharSequenceComparator.java 
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/DecimalComparator.java 
I created a JMH  benchmark comparing your 02/webrev 
(ivan.NumericComparator): 
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/NumericComparator.java 
with a modified variant of your MyComparator that works correctly 
(ivan.MyComparator): 
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/MyComparator.java 
and my (peter.AlphadecimalComparator). The later two are tested with 
case-sensitive (CS) and case-insensitive (CIS) variants for comparing 
alpha sub-sequences. Here are the results: 
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_speed.txt 
With -prof gc, it can be seen that JIT is able to optimize-away 
allocation of sub-sequence CharSequence(s). When the hot code is 
JIT-ed, at least with provided helper sub-comparators, no allocation 
is taking place (not so with MyComparator which creates substrings): 
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_gc.txt 


That's a very good news then!
The performance of AlphadecimalComparator is about 50% slower than 
your specialized NumericComparator, but it allows custom 
parametrization. Perhaps the decimal part of variations could be 
hidden behind a boolean or an enum parameter (there is a 
leading-zeroes-greater, leading-zeroes-less, but also a 
leading-zeroes-equal variant that makes sense). I tried to do that by 
making DecimalComparator an enum and restricting the 

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-30 Thread Peter Levart

Hi Ivan,

On 07/29/2017 07:22 PM, Peter Levart wrote:
I tried an approach where sub-sequences are CharSequence objects by 
delegation and use sub-comparators that don't convert CharSequence(s) 
to String(s). Here's what this looks like:


http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/AlphadecimalComparator.java

Here are the results:

http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_speed.txt


A simple optimization of above code (manually unrolling the 
alpha/decimal alteration - i.e. getting rid of 'alpha' boolean flag):


before:

boolean alpha = true;
while (true) {
if (ss1.next(alpha)) {
if (ss2.next(alpha)) {
Comparator
comparator = alpha
 ? alphaComparator
 : decimalComparator;
int cmp = comparator.compare(ss1, ss2);
if (cmp != 0) {
return cmp;
}
} else {
return 1;
}
} else {
if (ss2.next(alpha)) {
return -1;
} else {
return 0;
}
}
alpha = !alpha;
}

after:

while (true) {
// alpha part
if (ss1.next(true)) {
if (ss2.next(true)) {
int cmp = alphaComparator.compare(ss1, ss2);
if (cmp != 0) {
return cmp;
}
} else {
return 1;
}
} else {
if (ss2.next(true)) {
return -1;
} else {
return 0;
}
}
// decimal part
if (ss1.next(false)) {
if (ss2.next(false)) {
int cmp = decimalComparator.compare(ss1, ss2);
if (cmp != 0) {
return cmp;
}
} else {
return 1;
}
} else {
if (ss2.next(false)) {
return -1;
} else {
return 0;
}
}
}

...yields a noticeable spead-up:

// reference:

Benchmark (_strings)(comparator)  Mode Cnt  
Score Error  Units
ComparatorBench.teststrings1  ivan.NumericComparator  avgt 10   
1455.789 ±  37.158  ns/op
ComparatorBench.teststrings2  ivan.NumericComparator  avgt   10 
13680.131 ± 338.763  ns/op


// before explicit alpha/decimal alteration unrolling:

Benchmark (_strings) (comparator) Mode  
Cnt  Score Error  Units
ComparatorBench.teststrings1  peter.AlphadecimalComparator.CS avgt   
10   2065.324 ±  50.532  ns/op
ComparatorBench.teststrings2  peter.AlphadecimalComparator.CS avgt   
10  21118.430 ± 350.769  ns/op


// after explicit alpha/decimal alteration unrolling:

Benchmark (_strings) (comparator) Mode  
Cnt  Score  Error  Units
ComparatorBench.teststrings1  peter.AlphadecimalComparator.CS avgt   
10   1692.145 ±2.160  ns/op
ComparatorBench.teststrings2  peter.AlphadecimalComparator.CS avgt   
10  18624.677 ± 1900.779  ns/op



With this mod, AlphadecimalComparator is only about 16 - 35% slower than 
NumericComparator.



Regards, Peter



Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-29 Thread Peter Levart

Hi Ivan,

On 07/28/2017 06:25 PM, Ivan Gerasimov wrote:


Hi Peter!

Thank a lot for looking into this!

On 7/28/17 7:32 AM, Peter Levart wrote:

Hi Ivan,

In the light of what Stuart Marks wrote, then what do you think about 
a parameterized comparator (would be sub-optimal, I know) where one 
would supply
2 Comparator(s) which would be used to compare Ax and Nx 
sub-sequences respectively as described below...


Yes.  Inspired by what Stuart suggested I made a draft of such a 
comparator (see below).
It works, but as you've said it's not that efficient (mostly due to 
expensive substrings) and a bit harder to use in a simple case.

Now I need to think about how to combine two approaches.


There is a bug in MyComparator. Comparing "1" with "2" gives 0 (equal). 
You must not return "equal" when both alpha prefixes are empty.




For Nx sub-sequences, one would need a comparator comparing the 
reversed sequence lexicographically,

I'm not sure I understand why they need to be reversed.


Scrap that. I got confused.  :-o

but for Ax sub-sequences, one could choose from a plethora of 
case-(in)sensitive comparator(s) and collators already available on 
the platform.


Yes. In the example below I used compareToIgnoreCase to compare alpha 
subsequences.

...
Arrays.sort(str,new 
MyComparator(Comparator.comparing(String::toString,String::compareToIgnoreCase),Comparator.comparing(String::toString,String::compareTo)));


Substrings are by copy. There is another approach. Originally you 
created the NumericComparator to compare CharSequence(s) - not String(s) 
as you did with MyComparator. So why not keeping that?


While talking about types, you don't need to have a generic parameter:

NumericComparator implements CharSequence

simple:

NumericComparator implements Comparator

is suitable for comparing any subtype of CharSequence as use-sites 
should accept Comparator or ComparatorCharSequence>. For example:


Stream sorted(Comparator comparator);

You can use Comparator to sort Stream as well as 
Stream or Stream...



I tried an approach where sub-sequences are CharSequence objects by 
delegation and use sub-comparators that don't convert CharSequence(s) to 
String(s). Here's what this looks like:


http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/AlphadecimalComparator.java

with supporting sub-comparator classes:

http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/CharSequenceComparator.java
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/DecimalComparator.java

I created a JMH  benchmark comparing your 02/webrev 
(ivan.NumericComparator):


http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/NumericComparator.java

with a modified variant of your MyComparator that works correctly 
(ivan.MyComparator):


http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/MyComparator.java

and my (peter.AlphadecimalComparator). The later two are tested with 
case-sensitive (CS) and case-insensitive (CIS) variants for comparing 
alpha sub-sequences.


Here are the results:

http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_speed.txt

With -prof gc, it can be seen that JIT is able to optimize-away 
allocation of sub-sequence CharSequence(s). When the hot code is JIT-ed, 
at least with provided helper sub-comparators, no allocation is taking 
place (not so with MyComparator which creates substrings):


http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_gc.txt

The performance of AlphadecimalComparator is about 50% slower than your 
specialized NumericComparator, but it allows custom parametrization. 
Perhaps the decimal part of variations could be hidden behind a boolean 
or an enum parameter (there is a leading-zeroes-greater, 
leading-zeroes-less, but also a leading-zeroes-equal variant that makes 
sense). I tried to do that by making DecimalComparator an enum and 
restricting the decimalComparato parameter to that type, but maybe 
specializing code around a boolean flag might yield better performance 
and DecimalComparator as Comparator doesn't make much 
sense outside the AlphadecimalComparator.


So what do you think? Also, what do you think about the name?

Regards, Peter



Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-28 Thread Jonathan Bluett-Duncan
Hi Ivan,

It looks like the MyComparator code example which you gave in your last
email lost its formatting along the way, so I'm finding it difficult to
read. Would you mind resubmitting it?

Cheers,
Jonathan

On 28 July 2017 at 17:25, Ivan Gerasimov  wrote:

> Hi Peter!
>
> Thank a lot for looking into this!
>
> On 7/28/17 7:32 AM, Peter Levart wrote:
>
>> Hi Ivan,
>>
>> In the light of what Stuart Marks wrote, then what do you think about a
>> parameterized comparator (would be sub-optimal, I know) where one would
>> supply
>> 2 Comparator(s) which would be used to compare Ax and Nx sub-sequences
>> respectively as described below...
>>
>> Yes.  Inspired by what Stuart suggested I made a draft of such a
> comparator (see below).
> It works, but as you've said it's not that efficient (mostly due to
> expensive substrings) and a bit harder to use in a simple case.
> Now I need to think about how to combine two approaches.
>
> For Nx sub-sequences, one would need a comparator comparing the reversed
>> sequence lexicographically,
>>
> I'm not sure I understand why they need to be reversed.
>
>> but for Ax sub-sequences, one could choose from a plethora of
>> case-(in)sensitive comparator(s) and collators already available on the
>> platform.
>>
>> Yes. In the example below I used compareToIgnoreCase to compare alpha
> subsequences.
>
> ---
>
> class MyComparator implements Comparator {Comparator
> alphaCmp;Comparator numCmp;MyComparator(Comparator
> alphaCmp,Comparator numCmp) {this.alphaCmp = alphaCmp;this.numCmp =
> numCmp;}boolean skipLeadingZeroes(String s, int len) {for (int i = 0; i <
> len ; ++i) {if (Character.digit(s.charAt(i), 10) != 0)return false;}return
> true;}@Override public int compare(String o1, String o2)
> {Supplier s1 = new NumberSlicer(o1);Supplier s2 = new
> NumberSlicer(o2);while (true) {// alpha part String ss1 = s1.get();String
> ss2 = s2.get();int cmp = alphaCmp.compare(ss1, ss2);if (cmp != 0) return
> cmp;if (ss1.length() == 0) return 0;// numeric part ss1 = s1.get();ss2 =
> s2.get();int len1 = ss1.length();int len2 = ss2.length();int dlen = len1 -
> len2;if (dlen > 0) {if (!skipLeadingZeroes(ss1, dlen))return 1;ss1 =
> ss1.substring(dlen, len1);} else if (dlen < 0) {if (!skipLeadingZeroes(ss2,
> -dlen))return -1;ss2 = ss2.substring(-dlen, len2);}cmp =
> numCmp.compare(ss1, ss2);if (cmp != 0) return cmp;if (dlen != 0) return
> dlen;}}static class NumberSlicer implements Supplier {private
> String sequence;private boolean expectNumber = false;private int index =
> 0;NumberSlicer(String s) {sequence = s;}@Override public String get()
> {int start = index, end = start, len = sequence.length() - start;for (; len
> > 0; ++end, --len) {if (Character.isDigit(sequence.charAt(end)) ^
> expectNumber)break;}expectNumber = !expectNumber;return
> sequence.substring(start, index = end);}}}Here how it is
> invoked with case-insensitive comparator:Arrays.sort(str,new
> MyComparator(Comparator.comparing(String::toString,String::
> compareToIgnoreCase),Comparator.comparing(String::toString,
> String::compareTo)));
>
> simple test results for case insensitive sorting:java 1java 1 javajava 1
> JAVAJava 2JAVA 5jaVA 6.1java 10java 10 v 13Java 10 v 013Java 10 v
> 13java 10 v 113Java 2017Java 2017Java 20017Java 200017Java 217Java
> 2017Java 2017Java 20017With kind regards, Ivan
>


Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-28 Thread joe darcy

Hi Ivan,

A few comments.

I don't have a specific suggestion for a different name, but I don't 
think "comparingNumerically" does an adequate job capturing the 
described behavior of the method. Likewise, the summary of the methods' 
behavior in the javadoc should have a more suggestive description of the 
behavior.


Please add comments describing what the "// Fullwidth characters" values 
are in the test.


Interesting test cases would also include substrings for 
Integer.MAX_VALUE, Integer.MAX_VALUE+1, Long.MAX_VALUE, Long.MAX_VALUE + 
1, etc. Test cases of purely numerical and purely alphabetical inputs 
would be warranted too.


While randomizing the array and sorting is a valid test, a more through 
test would do pair-wise comparisons of each array element. While such 
comparisons are quadratic with array length, the arrays here are small.


Cheers,

-Joe


On 7/28/2017 9:03 AM, Ivan Gerasimov wrote:

Hi Roger!


On 7/24/17 7:42 AM, Roger Riggs wrote:

Hi Ivan,

Thanks for bringing this up again.

Some initial comments, not a full review.

Though the enhancement says that no consideration is given to sign 
characters they
may produce puzzling results for strings like "-2000" and "-2001" 
where the strings appear
to be signed numbers but the comparison will be for the "-" prefix 
and the rest unsigned.
Including the word 'unsigned' in the description might help reinforce 
the semantics.


Yes, it's a good point.  I'll add some words to make it certain we 
only recognize unsigned numbers.
Otherwise, it would become confusing when comparing something like 
"2017-07-28" and "2017-07-29".


Also, I don't see test cases for strings like: "abc200123" and 
"abc2123" which share
a prefix part of which is numeric but differ in the number of 
"leading" zeros after the prefix.



Sure.  Good addition to the test.

What about strings that include more than 1 numeric segment: 
abc2003def0001 and abc02003def001

in the leading zero case?

With the first comparator (which treats the numbers with more leading 
zeroes as greater ones), these strings would be sorted as 
"abc2003def0001" < "abc02003def001".

The logic here is as following:
1) skip common prefix "abc",
2) find the numerical parts: "2003" and "02003",
3) after skipping leading zeroes, they are compared to be equal,
4) then, the string with more leading zeroes is said to be greater.

Though the test adds the @key randomness, it would be useful to use 
the test library
mechanisms to report the seed and be able to run the test with a 
seed, if case they fail.

(As might be provided by the test library jdk.test.lib.RandomFactory).


Okay, I'll add this Random to the shuffling to make it reproducible.


Comparator.java:
576: "the common prefix is skipped" is problematic when considering 
leading zeros.

   The common prefix may contain non-zero digits.

Yes, of course.
While scanning the string for the common prefix, we still keep track 
of the numeric part.

Probably "skip" is a wrong word here.

585: it is not clear whether the "different number of leading zeros" 
is applied regardless

   of the common prefix or only starting after the common prefix.

When talking about leading zeroes, then the entire numerical substring 
is meant.

Part of this substring can belong to the common prefix.

550, 586: for many readers, it is easier to read 'for example' than 
"e.g." or "i.e.".



Thanks!  I'll change it accordingly.


562, 598:  editorial: "is to compare" -> "compares"


Thanks!

Comparators.java:

192: @param for param o is missing;  (the param name "o" usually 
refers to an object, not a string).
200:   Can skipLeadingZeros be coded to correctly work if cnt == 0; 
it would be more robust
 SkipLeading zeros works correctly only if pos is given the first 
numeric digit in the subsequence

 so the numStart1 and numStart2 must be first digit in each string.
I don't think that skipLeadingZeros() is very specific that it always 
called for longer string, trying to reduce its size via skipping 
leading zeroes.
It wouldn't make sense to call it with cnt == 0, and so we can avoid 
one initial comparing and save a couple of nanoseconds :)
I added this prerequisite to the javadoc of the method, so hopefully 
it shouldn't cause much confusion.



compare():

Line 223: if strings typically have non-numeric prefixes, it might 
perform slightly better
to detect the end of the common prefix and then scan back to find the 
beginning of the numeric part.

(Avoiding checking every char for isDigit).

On the other hand, we're saving on not calling String.charAt() for the 
second time :)


Line 224: If assigned for every digit, it will hold the last equal 
digit in the common prefix, not the first digit.

It will only be assigned when a non-digit is met.
And since the index was just incremented @ Line 222, numStart1 will be 
set to the index of the first digit inside the common prefix.


For example, if the common prefix is abc123, then the numStart1 will 
be 

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-28 Thread Ivan Gerasimov

Hi Peter!

Thank a lot for looking into this!

On 7/28/17 7:32 AM, Peter Levart wrote:

Hi Ivan,

In the light of what Stuart Marks wrote, then what do you think about 
a parameterized comparator (would be sub-optimal, I know) where one 
would supply
2 Comparator(s) which would be used to compare Ax and Nx sub-sequences 
respectively as described below...


Yes.  Inspired by what Stuart suggested I made a draft of such a 
comparator (see below).
It works, but as you've said it's not that efficient (mostly due to 
expensive substrings) and a bit harder to use in a simple case.

Now I need to think about how to combine two approaches.

For Nx sub-sequences, one would need a comparator comparing the 
reversed sequence lexicographically,

I'm not sure I understand why they need to be reversed.
but for Ax sub-sequences, one could choose from a plethora of 
case-(in)sensitive comparator(s) and collators already available on 
the platform.


Yes. In the example below I used compareToIgnoreCase to compare alpha 
subsequences.


---

class MyComparator implements Comparator {Comparator 
alphaCmp;Comparator numCmp;MyComparator(Comparator 
alphaCmp,Comparator numCmp) {this.alphaCmp = 
alphaCmp;this.numCmp = numCmp;}boolean skipLeadingZeroes(String s, int 
len) {for (int i = 0; i < len ; ++i) {if (Character.digit(s.charAt(i), 
10) != 0)return false;}return true;}@Override public int compare(String 
o1, String o2) {Supplier s1 = new 
NumberSlicer(o1);Supplier s2 = new NumberSlicer(o2);while (true) 
{// alpha part String ss1 = s1.get();String ss2 = s2.get();int cmp = 
alphaCmp.compare(ss1, ss2);if (cmp != 0) return cmp;if (ss1.length() == 
0) return 0;// numeric part ss1 = s1.get();ss2 = s2.get();int len1 = 
ss1.length();int len2 = ss2.length();int dlen = len1 - len2;if (dlen > 
0) {if (!skipLeadingZeroes(ss1, dlen))return 1;ss1 = ss1.substring(dlen, 
len1);} else if (dlen < 0) {if (!skipLeadingZeroes(ss2, -dlen))return 
-1;ss2 = ss2.substring(-dlen, len2);}cmp = numCmp.compare(ss1, ss2);if 
(cmp != 0) return cmp;if (dlen != 0) return dlen;}}static class 
NumberSlicer implements Supplier {private String 
sequence;private boolean expectNumber = false;private int index = 
0;NumberSlicer(String s) {sequence = s;}@Override public String get() 
{int start = index, end = start, len = sequence.length() - start;for (; 
len > 0; ++end, --len) {if (Character.isDigit(sequence.charAt(end)) ^ 
expectNumber)break;}expectNumber = !expectNumber;return 
sequence.substring(start, index = end);}}}Here how it is 
invoked with case-insensitive comparator:Arrays.sort(str,new 
MyComparator(Comparator.comparing(String::toString,String::compareToIgnoreCase),Comparator.comparing(String::toString,String::compareTo)));


simple test results for case insensitive sorting:java 1java 1 javajava 1 
JAVAJava 2JAVA 5jaVA 6.1java 10java 10 v 13Java 10 v 013Java 10 v 
13java 10 v 113Java 2017Java 2017Java 20017Java 200017Java 
217Java 2017Java 2017Java 20017With kind regards, Ivan


Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-28 Thread Ivan Gerasimov

Hi Roger!


On 7/24/17 7:42 AM, Roger Riggs wrote:

Hi Ivan,

Thanks for bringing this up again.

Some initial comments, not a full review.

Though the enhancement says that no consideration is given to sign 
characters they
may produce puzzling results for strings like "-2000" and "-2001" 
where the strings appear
to be signed numbers but the comparison will be for the "-" prefix and 
the rest unsigned.
Including the word 'unsigned' in the description might help reinforce 
the semantics.


Yes, it's a good point.  I'll add some words to make it certain we only 
recognize unsigned numbers.
Otherwise, it would become confusing when comparing something like 
"2017-07-28" and "2017-07-29".


Also, I don't see test cases for strings like: "abc200123" and 
"abc2123" which share
a prefix part of which is numeric but differ in the number of 
"leading" zeros after the prefix.



Sure.  Good addition to the test.

What about strings that include more than 1 numeric segment: 
abc2003def0001 and abc02003def001

in the leading zero case?

With the first comparator (which treats the numbers with more leading 
zeroes as greater ones), these strings would be sorted as 
"abc2003def0001" < "abc02003def001".

The logic here is as following:
1) skip common prefix "abc",
2) find the numerical parts: "2003" and "02003",
3) after skipping leading zeroes, they are compared to be equal,
4) then, the string with more leading zeroes is said to be greater.

Though the test adds the @key randomness, it would be useful to use 
the test library
mechanisms to report the seed and be able to run the test with a seed, 
if case they fail.

(As might be provided by the test library jdk.test.lib.RandomFactory).


Okay, I'll add this Random to the shuffling to make it reproducible.


Comparator.java:
576: "the common prefix is skipped" is problematic when considering 
leading zeros.

   The common prefix may contain non-zero digits.

Yes, of course.
While scanning the string for the common prefix, we still keep track of 
the numeric part.

Probably "skip" is a wrong word here.

585: it is not clear whether the "different number of leading zeros" 
is applied regardless

   of the common prefix or only starting after the common prefix.

When talking about leading zeroes, then the entire numerical substring 
is meant.

Part of this substring can belong to the common prefix.

550, 586: for many readers, it is easier to read 'for example' than 
"e.g." or "i.e.".



Thanks!  I'll change it accordingly.


562, 598:  editorial: "is to compare" -> "compares"


Thanks!

Comparators.java:

192: @param for param o is missing;  (the param name "o" usually 
refers to an object, not a string).
200:   Can skipLeadingZeros be coded to correctly work if cnt == 0; it 
would be more robust
 SkipLeading zeros works correctly only if pos is given the first 
numeric digit in the subsequence

 so the numStart1 and numStart2 must be first digit in each string.
I don't think that skipLeadingZeros() is very specific that it always 
called for longer string, trying to reduce its size via skipping leading 
zeroes.
It wouldn't make sense to call it with cnt == 0, and so we can avoid one 
initial comparing and save a couple of nanoseconds :)
I added this prerequisite to the javadoc of the method, so hopefully it 
shouldn't cause much confusion.



compare():

Line 223: if strings typically have non-numeric prefixes, it might 
perform slightly better
to detect the end of the common prefix and then scan back to find the 
beginning of the numeric part.

(Avoiding checking every char for isDigit).

On the other hand, we're saving on not calling String.charAt() for the 
second time :)


Line 224: If assigned for every digit, it will hold the last equal 
digit in the common prefix, not the first digit.

It will only be assigned when a non-digit is met.
And since the index was just incremented @ Line 222, numStart1 will be 
set to the index of the first digit inside the common prefix.


For example, if the common prefix is abc123, then the numStart1 will be 
updated to 3 when looking at the char 'c'.  Last three cycles of the 
loop it won't be updated, since all the trailing chars are digits.




239, 240:  chars at o1(index) and o2(index) are already known to be 
digits, can it be avoided checking them twice

by starting at index+1?

Not quite necessarily.  We can be here due to numLen1 > 0, and *only 
one* or *none* of the string remainders start with digits.

In the later case we'll end up @ line 279.

Here's the updated webrev:
http://cr.openjdk.java.net/~igerasim/8134512/03/webrev/

Given the further discussion this iteration is probably not the final, 
but it's better to have a checkpoint :)


With kind regards,
Ivan


$.02, Roger


On 7/19/2017 4:41 AM, Ivan Gerasimov wrote:

Hello!

It is a proposal to provide a String comparator, which will pay 
attention to the numbers embedded into the strings (should they 
present).


This proposal was initially discussed 

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-28 Thread Peter Levart

Hi Ivan,

In the light of what Stuart Marks wrote, then what do you think about a 
parameterized comparator (would be sub-optimal, I know) where one would 
supply
2 Comparator(s) which would be used to compare Ax and Nx sub-sequences 
respectively as described below...


For Nx sub-sequences, one would need a comparator comparing the reversed 
sequence lexicographically, but for Ax sub-sequences, one could choose 
from a plethora of case-(in)sensitive comparator(s) and collators 
already available on the platform.


Regards, peter

On 07/28/2017 04:12 PM, Peter Levart wrote:

Hi Ivan,

Would I be wrong if I described the logic of these two comparators as 
the following:


The comparator compares two character sequences as though each of them 
would be 1st transformed into a tuple of the form:


(A0, N0, A1, N1, ..., An-1, Nn-1, An)

where:

A0 and An are (possibly empty) sub-sequences consisting of 
non-decimal-digit characters
A1 ... An-1 are non-empty sub-sequences consisting of 
non-decimal-digit characters
N0 ... Nn-1 are non-empty sub-sequences consisting of decimal-digit 
characters


...such that all sub-sequences concatenated together in order as they 
appear in the tuple yield the original character sequence.


Any characher sequence can be uniquely transformed into such tuple. 
For example:


"" -> (A0); where A0 == ""
"ab10" -> (A0, N0, A1); where A0 == "ab", N0 == "10", A1 = ""
"1" -> (A0, N0, A1); where A0 == "", N0 == "1", A1 = ""
...

After transformation, the tuples are compared by their elements (from 
left to right) so that corresponding Ax elements are compared 
lexicographically and Nx elements are compared as decimal integers 
(with two variations considering leading zeroes).


If I am right than perhaps such description would be easier to 
understand.


What do you think?


Regards, Peter

On 07/19/2017 10:41 AM, Ivan Gerasimov wrote:

Hello!

It is a proposal to provide a String comparator, which will pay 
attention to the numbers embedded into the strings (should they 
present).


This proposal was initially discussed back in 2014 and seemed to 
bring some interest from the community:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-December/030343.html 



In the latest webrev two methods are added to the public API:
j.u.Comparator.comparingNumerically() and
j.u.Comparator.comparingNumericallyLeadingZerosAhead().

The regression test is extended to exercise this new comparator.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512
WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/

Comments, suggestions are very welcome!







Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-28 Thread Peter Levart

Hi Ivan,

Would I be wrong if I described the logic of these two comparators as 
the following:


The comparator compares two character sequences as though each of them 
would be 1st transformed into a tuple of the form:


(A0, N0, A1, N1, ..., An-1, Nn-1, An)

where:

A0 and An are (possibly empty) sub-sequences consisting of 
non-decimal-digit characters
A1 ... An-1 are non-empty sub-sequences consisting of non-decimal-digit 
characters
N0 ... Nn-1 are non-empty sub-sequences consisting of decimal-digit 
characters


...such that all sub-sequences concatenated together in order as they 
appear in the tuple yield the original character sequence.


Any characher sequence can be uniquely transformed into such tuple. For 
example:


"" -> (A0); where A0 == ""
"ab10" -> (A0, N0, A1); where A0 == "ab", N0 == "10", A1 = ""
"1" -> (A0, N0, A1); where A0 == "", N0 == "1", A1 = ""
...

After transformation, the tuples are compared by their elements (from 
left to right) so that corresponding Ax elements are compared 
lexicographically and Nx elements are compared as decimal integers (with 
two variations considering leading zeroes).


If I am right than perhaps such description would be easier to understand.

What do you think?


Regards, Peter

On 07/19/2017 10:41 AM, Ivan Gerasimov wrote:

Hello!

It is a proposal to provide a String comparator, which will pay 
attention to the numbers embedded into the strings (should they present).


This proposal was initially discussed back in 2014 and seemed to bring 
some interest from the community:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-December/030343.html 



In the latest webrev two methods are added to the public API:
j.u.Comparator.comparingNumerically() and
j.u.Comparator.comparingNumericallyLeadingZerosAhead().

The regression test is extended to exercise this new comparator.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512
WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/

Comments, suggestions are very welcome!





Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-28 Thread Ivan Gerasimov

Hi Jonathan!

On 7/24/17 3:42 AM, Jonathan Bluett-Duncan wrote:

You're welcome Ivan!

I'm just proofreading the new webrev, and a few more things have 
occurred to me:


1. What happens if the comparators are given the elements {"1abc", 
"2abc", "10abc"} to compare? Would they sort the elements as {"1abc", 
"2abc", "10abc"} or as {"1abc", "10abc", "2abc"}?

What about the array {"x1yz", "x2yz", "x10yz"}?
I wonder if these two cases should be added to the test too and given 
as additional examples in the javadocs.


These arrays would be sorted in the way you expect: i.e. {"1abc", 
"2abc", "10abc"} and {"x1yz", "x2yz", "x10yz"}, respectively.
I agree it is worthwhile to choose a good descriptive example for the 
javadoc, so it would make it clear for a reader what to expect from the 
routine.

Probably, something similar to what they have in MSDN:
https://msdn.microsoft.com/en-us/library/windows/desktop/bb759947(v=vs.85).aspx

2. The example arrays which you kindly added to the comparators' 
javadoc have no whitespace at the start but one space between the last 
element and the }, so I think there either should be no space at the 
end or an extra space at the start.



Okay, I'll make it consistent.

3. Very sorry, error on my part: on the javadoc lines which now say 
"then the corresponding numerical values are compared; otherwise", I 
suggested to add a "then" at the start; re-reading it though, I 
personally think it reads better without, but I would understand and 
not press it further if you disagree and think that the "then" is a 
useful addition.



Fixed, thanks!

I'll post the updated webrev later, once incorporate other suggestions.

With kind regards,
Ivan




Best regards,
Jonathan


On 24 Jul 2017 06:21, "Ivan Gerasimov" > wrote:


Thank you Jonathan for looking into that!

I agree with all your suggestions.

Here's the updated webrev with suggested modifications:
WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/02/webrev/


With kind regards,
Ivan


On 7/23/17 3:56 AM, Jonathan Bluett-Duncan wrote:

Meant to reply to core-libs-dev as well; doing so now.

Jonathan

On 23 July 2017 at 11:50, Jonathan Bluett-Duncan
> wrote:

Hi Ivan,

I'm not a reviewer per se, but I thought I'd chime in.

There's a few things with Comparator.java which I think could
be improved:

 1. `comparingNumericallyLeadingZerosAhead()` is a confusing
name for me, as the "ahead" has no meaning in my mind;
IMO the name `comparingNumericallyLeadingZerosFirst()`
better implies that "0001" < "001" < "01".
 2. It wasn't clear to me from the javadoc that the
comparators compare strings like "abc9" and "abc10" as
"abc9" < "abc10", so I think they should include more
examples.
 3. There's a typo in the javadocs for both methods:
"represets" --> "represents".
 4. Where the javadocs say "follows the procedure", I think
it'd make more grammatical sense if they said "does the
following".
 5. The lines which say "at the boundary of a subsequence,
consisting of decimal digits, the" would flow better if
they said "at the boundary of a subsequence *consisting
solely of decimal digits*, then the". Note the removal of
the comma between "subsequence" and "consisting".

Hope this helps!

Jonathan

On 23 July 2017 at 05:36, Ivan Gerasimov
> wrote:

Hello!

This is a gentle reminder.

The proposed comparator implementation would be
particularly useful when one will need to compare version
strings.
Some popular file managers also use similar comparing
algorithm, as the results often look more natural to the
human eyes (e.g. File Explorer on Windows, Files on Ubuntu).

Now, as Java 10 is been worked on, to sort the list of
Java names correctly, this kind of comparator is needed:

Look: a list { ... "Java 8", "Java 9", "Java 10" }
definitely looks nicer than { "Java 1", "Java 10", "Java
2", ... }  :-)

Would you please help review the proposal?

With kind regards,
Ivan



On 7/19/17 1:41 AM, Ivan Gerasimov wrote:

Hello!

It is a proposal to provide a String comparator,
which will pay attention to the numbers embedded into
the strings (should they present).

This proposal was initially discussed back in 2014
   

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-27 Thread Stuart Marks

Hi Ivan,

I think this is an interesting avenue to explore adding to the platform. The 
idea of sorting this way is pretty subtle and it seems to come up frequently, so 
it seems valuable. There are some issues that warrant further discussion, 
though. Briefly:


1. Should this be in the JDK?
2. What do other platforms do?
3. Does it have the right semantics?

Discussion follows.


--


1. Should this be in the JDK?

I think a case for it can be made. It does appear in other platforms (see below) 
and there are also several third party implementations available in a variety of 
environments. So people do have a need for this feature. It's also complicated 
enough to have generated lots of discussions and articles on the topic. The 
questions are whether this can be specified sufficiently clearly, and whether it 
provides value for the use cases for which it's intended. It's not obvious 
whether this is true, but I believe a case can and should be made.



2. What do other platforms do?

It was a bit difficult to find information about this, since it doesn't seem to 
have a well established name. Words like "natural", "logical", "alphanum", and 
"mixed" tend to be used. I eventually found these:


Windows XP StrCmpLogicalW [1]:

Compares two Unicode strings. Digits in the strings are considered as
numerical content rather than text. This test is not case-sensitive.

Windows 7 CompareStringEx SORT_DIGITSASNUMBERS [2]

Treat digits as numbers during sorting, for example, sort "2" before "10".

(Note: this API takes a locale parameter.)

Macintosh Mac NSString localizedStandardCompare [3]

This method should be used whenever file names or other strings are
presented in lists and tables where Finder-like sorting is appropriate.
The exact sorting behavior of this method is different under different
locales and may be changed in future releases. This method uses the
current locale.

(Note: I observe that the Mac Finder sorting is case insensitive.)

Swift String.localizedStandardCompare [4]

Compares strings as sorted by the Finder.

There are also third party, open source implementations available for a variety 
of platforms. These aren't too hard to find; this Coding Horror article [5] has 
a discussion of the issues and links to several implementations. Of particular 
note is the short Python implementation embedded in the article.


There is also the Node package javascript-natural-sort [6] which is one of 
several (of course) similar packages on NPM. This one seems popular, with more 
than 200,000 downloads in the past month.


Finally, there is mention of "numericOrdering" in this Unicode TR [7] but it 
seems fairly non-specific, and I don't know how it applies. The point here is 
that the Unicode community is aware of this kind of ordering, and various 
libraries that implement Unicode collation, such as ICU [8], might have 
implementations that can provide guidance.



3. Does it have the right semantics?

I think you can see from the above survey that there is no standard, and 
different implementations are all over the map, and most if not all are 
completely ill-specified. But what is useful about the survey is that it shows 
what people are actually using, and that there are things that many of them have 
in common. Two items jump out at me:


 - case-insensitive comparison (sometimes optional)
 - locale-specific collation

The obvious (but simplistic) thing to do is to provide variations of this API 
that can use String.CASE_INSENSITIVE_ORDER. Note however that its doc 
specifically states that it provides "unsatisfactory ordering for certain 
locales" and directs the reader to the Collator class, which does take locale 
into account.


Now, I'm sensitive about making this more complicated than necessary. But the 
point of "logical" comparator is to provide something that makes sense to humans 
looking at the result, which implies that locale-specific collation needs to be 
applied, as well as case insensitivity (which itself is locale-specific). So I 
think consideration of those is indeed necessary.


I don't know what the API should look like. The java.text.Collator class 
implements Comparator. This suggests the possibility of an API that allows a 
"downstream" comparator to be specified, to which ordering of certain 
subsequences can be delegated.


s'marks



[1] 
https://msdn.microsoft.com/en-us/library/windows/desktop/bb759947(v=vs.85).aspx

[2] 
https://msdn.microsoft.com/en-us/library/windows/desktop/dd317761(v=vs.85).aspx

[3] 
https://developer.apple.com/documentation/foundation/nsstring/1409742-localizedstandardcompare?language=objc


[4] 
https://developer.apple.com/documentation/swift/string/1408384-localizedstandardcompare


[5] https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

[6] https://www.npmjs.com/package/javascript-natural-sort

[7] http://unicode.org/reports/tr35/tr35-collation.html#Setting_Options

[8] 

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-24 Thread Roger Riggs

Hi Ivan,

Thanks for bringing this up again.

Some initial comments, not a full review.

Though the enhancement says that no consideration is given to sign 
characters they
may produce puzzling results for strings like "-2000" and "-2001" where 
the strings appear
to be signed numbers but the comparison will be for the "-" prefix and 
the rest unsigned.
Including the word 'unsigned' in the description might help reinforce 
the semantics.


Also, I don't see test cases for strings like:  "abc200123" and 
"abc2123" which share
a prefix part of which is numeric but differ in the number of "leading" 
zeros after the prefix.


What about strings that include more than 1 numeric segment: 
abc2003def0001 and abc02003def001

in the leading zero case?

Though the test adds the @key randomness, it would be useful to use the 
test library
mechanisms to report the seed and be able to run the test with a seed, 
if case they fail.

(As might be provided by the test library jdk.test.lib.RandomFactory).


Comparator.java:
576: "the common prefix is skipped" is problematic when considering 
leading zeros.

   The common prefix may contain non-zero digits.
585: it is not clear whether the "different number of leading zeros" is 
applied regardless

   of the common prefix or only starting after the common prefix.

550, 586: for many readers, it is easier to read 'for example' than 
"e.g." or "i.e.".


562, 598:  editorial: "is to compare" -> "compares"

Comparators.java:

192: @param for param o is missing;  (the param name "o" usually refers 
to an object, not a string).
200:   Can skipLeadingZeros be coded to correctly work if cnt == 0; it 
would be more robust
 SkipLeading zeros works correctly only if pos is given the first 
numeric digit in the subsequence

 so the numStart1 and numStart2 must be first digit in each string.

compare():

Line 223: if strings typically have non-numeric prefixes, it might 
perform slightly better
to detect the end of the common prefix and then scan back to find the 
beginning of the numeric part.

(Avoiding checking every char for isDigit).

Line 224: If assigned for every digit, it will hold the last equal digit 
in the common prefix, not the first digit.


239, 240:  chars at o1(index) and o2(index) are already known to be 
digits, can it be avoided checking them twice

by starting at index+1?

$.02, Roger


On 7/19/2017 4:41 AM, Ivan Gerasimov wrote:

Hello!

It is a proposal to provide a String comparator, which will pay 
attention to the numbers embedded into the strings (should they present).


This proposal was initially discussed back in 2014 and seemed to bring 
some interest from the community:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-December/030343.html 



In the latest webrev two methods are added to the public API:
j.u.Comparator.comparingNumerically() and
j.u.Comparator.comparingNumericallyLeadingZerosAhead().

The regression test is extended to exercise this new comparator.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512
WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/

Comments, suggestions are very welcome!





Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-24 Thread Jonathan Bluett-Duncan
You're welcome Ivan!

I'm just proofreading the new webrev, and a few more things have occurred
to me:

1. What happens if the comparators are given the elements {"1abc", "2abc",
"10abc"} to compare? Would they sort the elements as {"1abc", "2abc",
"10abc"} or as {"1abc", "10abc", "2abc"}?
What about the array {"x1yz", "x2yz", "x10yz"}?
I wonder if these two cases should be added to the test too and given as
additional examples in the javadocs.

2. The example arrays which you kindly added to the comparators' javadoc
have no whitespace at the start but one space between the last element and
the }, so I think there either should be no space at the end or an extra
space at the start.

3. Very sorry, error on my part: on the javadoc lines which now say "then the
corresponding numerical values are compared; otherwise", I suggested to add
a "then" at the start; re-reading it though, I personally think it reads
better without, but I would understand and not press it further if you
disagree and think that the "then" is a useful addition.

Best regards,
Jonathan


On 24 Jul 2017 06:21, "Ivan Gerasimov"  wrote:

Thank you Jonathan for looking into that!

I agree with all your suggestions.

Here's the updated webrev with suggested modifications:
WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/02/webrev/

With kind regards,
Ivan


On 7/23/17 3:56 AM, Jonathan Bluett-Duncan wrote:

Meant to reply to core-libs-dev as well; doing so now.

Jonathan

On 23 July 2017 at 11:50, Jonathan Bluett-Duncan 
wrote:

> Hi Ivan,
>
> I'm not a reviewer per se, but I thought I'd chime in.
>
> There's a few things with Comparator.java which I think could be improved:
>
>1. `comparingNumericallyLeadingZerosAhead()` is a confusing name for
>me, as the "ahead" has no meaning in my mind; IMO the name
>`comparingNumericallyLeadingZerosFirst()` better implies that "0001" <
>"001" < "01".
>2. It wasn't clear to me from the javadoc that the comparators compare
>strings like "abc9" and "abc10" as "abc9" < "abc10", so I think they should
>include more examples.
>3. There's a typo in the javadocs for both methods: "represets" -->
>"represents".
>4. Where the javadocs say "follows the procedure", I think it'd make
>more grammatical sense if they said "does the following".
>5. The lines which say "at the boundary of a subsequence, consisting
>of decimal digits, the" would flow better if they said "at the boundary of
>a subsequence *consisting solely of decimal digits*, then the". Note
>the removal of the comma between "subsequence" and "consisting".
>
> Hope this helps!
>
> Jonathan
>
> On 23 July 2017 at 05:36, Ivan Gerasimov 
> wrote:
>
>> Hello!
>>
>> This is a gentle reminder.
>>
>> The proposed comparator implementation would be particularly useful when
>> one will need to compare version strings.
>> Some popular file managers also use similar comparing algorithm, as the
>> results often look more natural to the human eyes (e.g. File Explorer on
>> Windows, Files on Ubuntu).
>>
>> Now, as Java 10 is been worked on, to sort the list of Java names
>> correctly, this kind of comparator is needed:
>>
>> Look: a list { ... "Java 8", "Java 9", "Java 10" } definitely looks nicer
>> than { "Java 1", "Java 10", "Java 2", ... }  :-)
>>
>> Would you please help review the proposal?
>>
>> With kind regards,
>> Ivan
>>
>>
>>
>> On 7/19/17 1:41 AM, Ivan Gerasimov wrote:
>>
>>> Hello!
>>>
>>> It is a proposal to provide a String comparator, which will pay
>>> attention to the numbers embedded into the strings (should they present).
>>>
>>> This proposal was initially discussed back in 2014 and seemed to bring
>>> some interest from the community:
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-De
>>> cember/030343.html
>>>
>>> In the latest webrev two methods are added to the public API:
>>> j.u.Comparator.comparingNumerically() and
>>> j.u.Comparator.comparingNumericallyLeadingZerosAhead().
>>>
>>> The regression test is extended to exercise this new comparator.
>>>
>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512
>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/
>>>
>>> Comments, suggestions are very welcome!
>>>
>>>
>> --
>> With kind regards,
>> Ivan Gerasimov
>>
>>
>

-- 
With kind regards,
Ivan Gerasimov


Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-23 Thread Ivan Gerasimov

Thank you Jonathan for looking into that!

I agree with all your suggestions.

Here's the updated webrev with suggested modifications:
WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/02/webrev/

With kind regards,
Ivan

On 7/23/17 3:56 AM, Jonathan Bluett-Duncan wrote:

Meant to reply to core-libs-dev as well; doing so now.

Jonathan

On 23 July 2017 at 11:50, Jonathan Bluett-Duncan 
> wrote:


Hi Ivan,

I'm not a reviewer per se, but I thought I'd chime in.

There's a few things with Comparator.java which I think could be
improved:

 1. `comparingNumericallyLeadingZerosAhead()` is a confusing name
for me, as the "ahead" has no meaning in my mind; IMO the name
`comparingNumericallyLeadingZerosFirst()` better implies that
"0001" < "001" < "01".
 2. It wasn't clear to me from the javadoc that the comparators
compare strings like "abc9" and "abc10" as "abc9" < "abc10",
so I think they should include more examples.
 3. There's a typo in the javadocs for both methods: "represets"
--> "represents".
 4. Where the javadocs say "follows the procedure", I think it'd
make more grammatical sense if they said "does the following".
 5. The lines which say "at the boundary of a subsequence,
consisting of decimal digits, the" would flow better if they
said "at the boundary of a subsequence *consisting solely of
decimal digits*, then the". Note the removal of the comma
between "subsequence" and "consisting".

Hope this helps!

Jonathan

On 23 July 2017 at 05:36, Ivan Gerasimov
> wrote:

Hello!

This is a gentle reminder.

The proposed comparator implementation would be particularly
useful when one will need to compare version strings.
Some popular file managers also use similar comparing
algorithm, as the results often look more natural to the human
eyes (e.g. File Explorer on Windows, Files on Ubuntu).

Now, as Java 10 is been worked on, to sort the list of Java
names correctly, this kind of comparator is needed:

Look: a list { ... "Java 8", "Java 9", "Java 10" } definitely
looks nicer than { "Java 1", "Java 10", "Java 2", ... }  :-)

Would you please help review the proposal?

With kind regards,
Ivan



On 7/19/17 1:41 AM, Ivan Gerasimov wrote:

Hello!

It is a proposal to provide a String comparator, which
will pay attention to the numbers embedded into the
strings (should they present).

This proposal was initially discussed back in 2014 and
seemed to bring some interest from the community:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-December/030343.html




In the latest webrev two methods are added to the public API:
j.u.Comparator.comparingNumerically() and
j.u.Comparator.comparingNumericallyLeadingZerosAhead().

The regression test is extended to exercise this new
comparator.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512

WEBREV:
http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/


Comments, suggestions are very welcome!


-- 
With kind regards,

Ivan Gerasimov





--
With kind regards,
Ivan Gerasimov



Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-23 Thread Jonathan Bluett-Duncan
Meant to reply to core-libs-dev as well; doing so now.

Jonathan

On 23 July 2017 at 11:50, Jonathan Bluett-Duncan 
wrote:

> Hi Ivan,
>
> I'm not a reviewer per se, but I thought I'd chime in.
>
> There's a few things with Comparator.java which I think could be improved:
>
>1. `comparingNumericallyLeadingZerosAhead()` is a confusing name for
>me, as the "ahead" has no meaning in my mind; IMO the name `
>comparingNumericallyLeadingZerosFirst()` better implies that "0001" <
>"001" < "01".
>2. It wasn't clear to me from the javadoc that the comparators compare
>strings like "abc9" and "abc10" as "abc9" < "abc10", so I think they should
>include more examples.
>3. There's a typo in the javadocs for both methods: "represets" -->
>"represents".
>4. Where the javadocs say "follows the procedure", I think it'd make
>more grammatical sense if they said "does the following".
>5. The lines which say "at the boundary of a subsequence, consisting
>of decimal digits, the" would flow better if they said "at the boundary of
>a subsequence *consisting solely of decimal digits*, then the". Note
>the removal of the comma between "subsequence" and "consisting".
>
> Hope this helps!
>
> Jonathan
>
> On 23 July 2017 at 05:36, Ivan Gerasimov 
> wrote:
>
>> Hello!
>>
>> This is a gentle reminder.
>>
>> The proposed comparator implementation would be particularly useful when
>> one will need to compare version strings.
>> Some popular file managers also use similar comparing algorithm, as the
>> results often look more natural to the human eyes (e.g. File Explorer on
>> Windows, Files on Ubuntu).
>>
>> Now, as Java 10 is been worked on, to sort the list of Java names
>> correctly, this kind of comparator is needed:
>>
>> Look: a list { ... "Java 8", "Java 9", "Java 10" } definitely looks nicer
>> than { "Java 1", "Java 10", "Java 2", ... }  :-)
>>
>> Would you please help review the proposal?
>>
>> With kind regards,
>> Ivan
>>
>>
>>
>> On 7/19/17 1:41 AM, Ivan Gerasimov wrote:
>>
>>> Hello!
>>>
>>> It is a proposal to provide a String comparator, which will pay
>>> attention to the numbers embedded into the strings (should they present).
>>>
>>> This proposal was initially discussed back in 2014 and seemed to bring
>>> some interest from the community:
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-De
>>> cember/030343.html
>>>
>>> In the latest webrev two methods are added to the public API:
>>> j.u.Comparator.comparingNumerically() and
>>> j.u.Comparator.comparingNumericallyLeadingZerosAhead().
>>>
>>> The regression test is extended to exercise this new comparator.
>>>
>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512
>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/
>>>
>>> Comments, suggestions are very welcome!
>>>
>>>
>> --
>> With kind regards,
>> Ivan Gerasimov
>>
>>
>


Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

2017-07-22 Thread Ivan Gerasimov

Hello!

This is a gentle reminder.

The proposed comparator implementation would be particularly useful when 
one will need to compare version strings.
Some popular file managers also use similar comparing algorithm, as the 
results often look more natural to the human eyes (e.g. File Explorer on 
Windows, Files on Ubuntu).


Now, as Java 10 is been worked on, to sort the list of Java names 
correctly, this kind of comparator is needed:


Look: a list { ... "Java 8", "Java 9", "Java 10" } definitely looks 
nicer than { "Java 1", "Java 10", "Java 2", ... }  :-)


Would you please help review the proposal?

With kind regards,
Ivan


On 7/19/17 1:41 AM, Ivan Gerasimov wrote:

Hello!

It is a proposal to provide a String comparator, which will pay 
attention to the numbers embedded into the strings (should they present).


This proposal was initially discussed back in 2014 and seemed to bring 
some interest from the community:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-December/030343.html 



In the latest webrev two methods are added to the public API:
j.u.Comparator.comparingNumerically() and
j.u.Comparator.comparingNumericallyLeadingZerosAhead().

The regression test is extended to exercise this new comparator.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512
WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/

Comments, suggestions are very welcome!



--
With kind regards,
Ivan Gerasimov