Re[2]: Benchmarking of new optimized Dual-Pivot Quicksort

2019-07-26 Thread Vladimir Yaroslavskiy
Hi Laurent,

Many thanks for feedback!

I run tests on two computers under Windows (8 processors) and Linux (88 
processors).

The details of the first computer is: Window 10 (64-bit), Intel Core i7 8650U 
1.90 GHz
and output of "uname -a" command for the second computer is:
Linux rho 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 
x86_64 x86_64 GNU/Linux

The JDK on Windows is OpenJDK 64-Bit Server VM, build 14-ea+6-171,
on Linux - jdk13 latest version.

The setting for JVM running is (one option -Xbatch only):
java -Xbatch -classpath %CLASSES% Main

Also I uploaded the results from Linux computer for sequential sorting of int / 
long arrays.
details are there  
https://github.com/iaroslavski/sorting/tree/master/benchmarking/results
The files -s- and -p- are related to the first 
(Windows) computer,
the files -s88- and -p88- are related to the second 
(Linux) computer.

The summary is here:

[int] sequential sorting, gain:
Windows: size 1M: 0.31; size 30K: 0.28; size: 150: 0.15
      Linux: size 1M: 0.31; size 30K: 0.27; size: 150: 0.07

[long] sequential sorting, gain:
Windows: size 1M: 0.32; size 30K: 0.23; size 150: 0.12
      Linux: size 1M: 0.30; size 30K: 0.23; size 150: 0.05 You can see that 
sorting shows the same results on large and medium sizes for both types,
and it works slower on small size on Linux, but shows positive gains.

Thank you,
Vladimir

>Пятница, 26 июля 2019, 10:43 +03:00 от Laurent Bourgès 
>:
>
>Hi Vladimir,
>
>First Thank you for these impressive results: 15% to 70% overall gains is 
>amazing and will make a big difference, Congratulations !
>
>Could you publish the different test environment ?
>- HW: cpu (lscpu output)
>- OS version
>- JDK version + JVM settings 
>
>Ideally you or I could write a shell script to collect setup and write a 
>simple log file.
>
>PS: I suppose you tested DPQS only on intel cpus, could somebody test on ppc 
>or arm cpus as well ?
>
>Cheers,
>Laurent
>Le jeu. 25 juil. 2019 à 16:04, Vladimir Yaroslavskiy < vlv.spb...@mail.ru > a 
>écrit :
>>Hi all,
>>With the help of Laurent Bourges I run benchmarking of new optimized 
>>Dual-Pivot Quicksort
>>and summarized results. The tests were run for all types (int / long / ... / 
>>double), for both types:
>>sequential and parallel sorting, for small, medium and large sizes. The 
>>comparison was done
>>on several data types, such as: random, period, sorted, equal, stagger, 
>>shuffle.
>>Here is the summary of total gains. The gain is defined as (jdk_time - 
>>dpqs_time) / jdk_time,
>>where jdk_time is sorting time by the current jdk14 and dpqs_time is sorting 
>>time by new
>>optimized Dual-Pivot Quicksort. To get stable and reproducible results, min 
>>time of many
>>invocations is taken.
>>You can find all detailed results there
>>https://github.com/iaroslavski/sorting/tree/master/benchmarking/results
>>
>>Sources of benchmarking tests are there
>>https://github.com/iaroslavski/sorting/tree/master/benchmarking/sources
>>
>>You can see not good performance for float / double types (parallel sorting).
>>This behavior can be explained by existing bug in jdk, when NaNs and -0.0s
>>are not processed properly. New sorting has total losses for small float / 
>>double
>>arrays, few percents only. For all other cases new optimized sorting is 
>>winner.
>>
>>
>>[int]
>>sequential, Length = 150, Gain: 0.15
>>sequential, Length = 3, Gain: 0.28
>>sequential, Length = 100, Gain: 0.31
>>parallel 8, Length = 150, Gain: 0.14
>>parallel 8, Length = 3, Gain: 0.15
>>parallel 8, Length = 100, Gain: 0.14
>>[long]
>>sequential, Length = 150, Gain: 0.12
>>sequential, Length = 3, Gain: 0.23
>>sequential, Length = 100, Gain: 0.32
>>parallel 8, Length = 150, Gain: 0.11
>>parallel 8, Length = 3, Gain: 0.16
>>parallel 8, Length = 100, Gain: 0.24
>>parallel 88 processors, Length = 100, Gain: 0.05
>>[byte]
>>sequential, Length = 150, Gain: 0.06
>>sequential, Length = 3, Gain: 0.36
>>sequential, Length = 100, Gain: 0.37
>>parallel 8, Length = 150, Gain: 0.13
>>parallel 8, Length = 3, Gain: 0.73
>>parallel 8, Length = 100, Gain: 0.65
>>[char]
>>sequential, Length = 150, Gain: 0.10
>>sequential, Length = 3, Gain: 0.00
>>sequential, Length = 100, Gain: 0.17
>>parallel 8, Length = 150, Gain: 0.10
>>parallel 8, Length = 3, Gain: 0.33
>>parallel 8, Length = 100, Gain: 0.62
>>[short]
>>sequential, Length = 

Re: Benchmarking of new optimized Dual-Pivot Quicksort

2019-07-26 Thread Laurent Bourgès
Hi Vladimir,

First Thank you for these impressive results: 15% to 70% overall gains is
amazing and will make a big difference, Congratulations !

Could you publish the different test environment ?
- HW: cpu (lscpu output)
- OS version
- JDK version + JVM settings

Ideally you or I could write a shell script to collect setup and write a
simple log file.

PS: I suppose you tested DPQS only on intel cpus, could somebody test on
ppc or arm cpus as well ?

Cheers,
Laurent

Le jeu. 25 juil. 2019 à 16:04, Vladimir Yaroslavskiy  a
écrit :

> Hi all,
>
> With the help of Laurent Bourges I run benchmarking of new optimized
> Dual-Pivot Quicksort
> and summarized results. The tests were run for all types (int / long / ...
> / double), for both types:
> sequential and parallel sorting, for small, medium and large sizes. The
> comparison was done
> on several data types, such as: random, period, sorted, equal, stagger,
> shuffle.
>
> Here is the summary of total gains. The gain is defined as (jdk_time -
> dpqs_time) / jdk_time,
> where jdk_time is sorting time by the current jdk14 and dpqs_time is
> sorting time by new
> optimized Dual-Pivot Quicksort. To get stable and reproducible results,
> min time of many
> invocations is taken.
>
> You can find all detailed results there
> https://github.com/iaroslavski/sorting/tree/master/benchmarking/results
>
> Sources of benchmarking tests are there
> https://github.com/iaroslavski/sorting/tree/master/benchmarking/sources
>
> <https://github.com/iaroslavski/sorting/tree/master/benchmarking/sources>You
> can see not good performance for float / double types (parallel sorting).
> This behavior can be explained by existing bug in jdk, when NaNs and -0.0s
> are not processed properly. New sorting has total losses for small float /
> double
> arrays, few percents only. For all other cases new optimized sorting is
> winner.
>
>
> 
> [int]
>
> sequential, Length = 150, Gain: 0.15
> sequential, Length = 3, Gain: 0.28
> sequential, Length = 100, Gain: 0.31
> parallel 8, Length = 150, Gain: 0.14
> parallel 8, Length = 3, Gain: 0.15
> parallel 8, Length = 100, Gain: 0.14
>
> [long]
> sequential, Length = 150, Gain: 0.12
> sequential, Length = 3, Gain: 0.23
> sequential, Length = 100, Gain: 0.32
> parallel 8, Length = 150, Gain: 0.11
> parallel 8, Length = 3, Gain: 0.16
> parallel 8, Length = 100, Gain: 0.24
> parallel 88 processors, Length = 100, Gain: 0.05
>
> [byte]
> sequential, Length = 150, Gain: 0.06
> sequential, Length = 3, Gain: 0.36
> sequential, Length = 100, Gain: 0.37
> parallel 8, Length = 150, Gain: 0.13
> parallel 8, Length = 3, Gain: 0.73
> parallel 8, Length = 100, Gain: 0.65
>
> [char]
> sequential, Length = 150, Gain: 0.10
> sequential, Length = 3, Gain: 0.00
> sequential, Length = 100, Gain: 0.17
> parallel 8, Length = 150, Gain: 0.10
> parallel 8, Length = 3, Gain: 0.33
> parallel 8, Length = 100, Gain: 0.62
>
> [short]
> sequential, Length = 150, Gain: 0.14
> sequential, Length = 3, Gain: 0.10
> sequential, Length = 100, Gain: 0.18
> parallel 8, Length = 150, Gain: 0.13
> parallel 8, Length = 3, Gain: 0.41
> parallel 8, Length = 100, Gain: 0.65
>
> [float]
> sequential, Length = 150, Gain: -0.02
> sequential, Length = 3, Gain: 0.21
> sequential, Length = 100, Gain: 0.24
> parallel 8, Length = 150, Gain: -0.05
> parallel 8, Length = 3, Gain: -0.04
> parallel 8, Length = 100, Gain: -0.12
>
> [double]
> sequential, Length = 150, Gain: -0.03
> sequential, Length = 3, Gain: 0.19
> sequential, Length = 100, Gain: 0.23
> parallel 8, Length = 150, Gain: -0.05
> parallel 8, Length = 3, Gain: 0.05
> parallel 8, Length = 100, Gain: 0.15
>
> Thank you,
> Vladimir
>


Benchmarking of new optimized Dual-Pivot Quicksort

2019-07-25 Thread Vladimir Yaroslavskiy

Hi all,
With the help of Laurent Bourges I run benchmarking of new optimized Dual-Pivot 
Quicksort
and summarized results. The tests were run for all types (int / long / ... / 
double), for both types:
sequential and parallel sorting, for small, medium and large sizes. The 
comparison was done
on several data types, such as: random, period, sorted, equal, stagger, shuffle.
Here is the summary of total gains. The gain is defined as (jdk_time - 
dpqs_time) / jdk_time,
where jdk_time is sorting time by the current jdk14 and dpqs_time is sorting 
time by new
optimized Dual-Pivot Quicksort. To get stable and reproducible results, min 
time of many
invocations is taken.
You can find all detailed results there
https://github.com/iaroslavski/sorting/tree/master/benchmarking/results  

Sources of benchmarking tests are there
https://github.com/iaroslavski/sorting/tree/master/benchmarking/sources

You can see not good performance for float / double types (parallel sorting).
This behavior can be explained by existing bug in jdk, when NaNs and -0.0s
are not processed properly. New sorting has total losses for small float / 
double
arrays, few percents only. For all other cases new optimized sorting is winner.


[int]
sequential, Length = 150, Gain: 0.15
sequential, Length = 3, Gain: 0.28
sequential, Length = 100, Gain: 0.31
parallel 8, Length = 150, Gain: 0.14
parallel 8, Length = 3, Gain: 0.15
parallel 8, Length = 100, Gain: 0.14
[long]
sequential, Length = 150, Gain: 0.12
sequential, Length = 3, Gain: 0.23
sequential, Length = 100, Gain: 0.32
parallel 8, Length = 150, Gain: 0.11
parallel 8, Length = 3, Gain: 0.16
parallel 8, Length = 100, Gain: 0.24
parallel 88 processors, Length = 100, Gain: 0.05
[byte]
sequential, Length = 150, Gain: 0.06
sequential, Length = 3, Gain: 0.36
sequential, Length = 100, Gain: 0.37
parallel 8, Length = 150, Gain: 0.13
parallel 8, Length = 3, Gain: 0.73
parallel 8, Length = 100, Gain: 0.65
[char]
sequential, Length = 150, Gain: 0.10
sequential, Length = 3, Gain: 0.00
sequential, Length = 100, Gain: 0.17
parallel 8, Length = 150, Gain: 0.10
parallel 8, Length = 3, Gain: 0.33
parallel 8, Length = 100, Gain: 0.62
[short]
sequential, Length = 150, Gain: 0.14
sequential, Length = 3, Gain: 0.10
sequential, Length = 100, Gain: 0.18
parallel 8, Length = 150, Gain: 0.13
parallel 8, Length = 3, Gain: 0.41
parallel 8, Length = 100, Gain: 0.65
[float]
sequential, Length = 150, Gain: -0.02
sequential, Length = 3, Gain: 0.21
sequential, Length = 100, Gain: 0.24
parallel 8, Length = 150, Gain: -0.05
parallel 8, Length = 3, Gain: -0.04
parallel 8, Length = 100, Gain: -0.12
[double]
sequential, Length = 150, Gain: -0.03
sequential, Length = 3, Gain: 0.19
sequential, Length = 100, Gain: 0.23
parallel 8, Length = 150, Gain: -0.05
parallel 8, Length = 3, Gain: 0.05
parallel 8, Length = 100, Gain: 0.15
Thank you,
Vladimir