Hi Ivan,

On 07/28/2017 06:25 PM, Ivan Gerasimov wrote:

Hi Peter!

Thank a lot for looking into this!

On 7/28/17 7:32 AM, Peter Levart wrote:
Hi Ivan,

In the light of what Stuart Marks wrote, then what do you think about a parameterized comparator (would be sub-optimal, I know) where one would supply 2 Comparator(s) which would be used to compare Ax and Nx sub-sequences respectively as described below...

Yes. Inspired by what Stuart suggested I made a draft of such a comparator (see below). It works, but as you've said it's not that efficient (mostly due to expensive substrings) and a bit harder to use in a simple case.
Now I need to think about how to combine two approaches.

There is a bug in MyComparator. Comparing "1" with "2" gives 0 (equal). You must not return "equal" when both alpha prefixes are empty.


For Nx sub-sequences, one would need a comparator comparing the reversed sequence lexicographically,
I'm not sure I understand why they need to be reversed.

Scrap that. I got confused.  :-o

but for Ax sub-sequences, one could choose from a plethora of case-(in)sensitive comparator(s) and collators already available on the platform.

Yes. In the example below I used compareToIgnoreCase to compare alpha subsequences.
...
Arrays.sort(str,new MyComparator(Comparator.comparing(String::toString,String::compareToIgnoreCase),Comparator.comparing(String::toString,String::compareTo)));

Substrings are by copy. There is another approach. Originally you created the NumericComparator to compare CharSequence(s) - not String(s) as you did with MyComparator. So why not keeping that?

While talking about types, you don't need to have a generic parameter:

    NumericComparator<T extends CharSequence> implements CharSequence<T>

simple:

    NumericComparator implements Comparator<CharSequence>

is suitable for comparing any subtype of CharSequence as use-sites should accept Comparator<? super String> or Comparator<? super CharSequence>. For example:

    Stream<T> sorted(Comparator<? super T> comparator);

You can use Comparator<CharSequence> to sort Stream<String> as well as Stream<StringBuilder> or Stream<CharSequence>...


I tried an approach where sub-sequences are CharSequence objects by delegation and use sub-comparators that don't convert CharSequence(s) to String(s). Here's what this looks like:

http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/AlphadecimalComparator.java

with supporting sub-comparator classes:

http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/CharSequenceComparator.java
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/DecimalComparator.java

I created a JMH benchmark comparing your 02/webrev (ivan.NumericComparator):

http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/NumericComparator.java

with a modified variant of your MyComparator that works correctly (ivan.MyComparator):

http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/MyComparator.java

and my (peter.AlphadecimalComparator). The later two are tested with case-sensitive (CS) and case-insensitive (CIS) variants for comparing alpha sub-sequences.

Here are the results:

http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_speed.txt

With -prof gc, it can be seen that JIT is able to optimize-away allocation of sub-sequence CharSequence(s). When the hot code is JIT-ed, at least with provided helper sub-comparators, no allocation is taking place (not so with MyComparator which creates substrings):

http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_gc.txt

The performance of AlphadecimalComparator is about 50% slower than your specialized NumericComparator, but it allows custom parametrization. Perhaps the decimal part of variations could be hidden behind a boolean or an enum parameter (there is a leading-zeroes-greater, leading-zeroes-less, but also a leading-zeroes-equal variant that makes sense). I tried to do that by making DecimalComparator an enum and restricting the decimalComparato parameter to that type, but maybe specializing code around a boolean flag might yield better performance and DecimalComparator as Comparator<CharSequence> doesn't make much sense outside the AlphadecimalComparator.

So what do you think? Also, what do you think about the name?

Regards, Peter

Reply via email to