GideonPotok commented on code in PR #46917:
URL: https://github.com/apache/spark/pull/46917#discussion_r1638766832
##########
sql/core/benchmarks/CollationBenchmark-results.txt:
##########
Review Comment:
@uros-db now the benchmark results are updated (jdk17 only at the moment) .
Relative to each other (collations), it looks good. It is about as performant
as the other approach.
@uros-db @dbatomic Which approach should we go with?
This PR:
```
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1021-azure
AMD EPYC 7763 64-Core Processor
collation e2e benchmarks - mode - 10000 elements: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------------
mode df column with collation - UTF8_BINARY_LCASE 58
69 7 0.2 5757.5 1.0X
mode df column with collation - UNICODE 52
58 5 0.2 5233.2 1.1X
mode df column with collation - UTF8_BINARY 45
50 5 0.2 4462.9 1.3X
mode df column with collation - UNICODE_CI 46
50 5 0.2 4570.9 1.3X
```
# The original approach
(GroupMapReduce)[https://github.com/apache/spark/pull/46597]:
```
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1021-azure
AMD EPYC 7763 64-Core Processor
collation e2e benchmarks - mode - 10000 elements: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------------
mode df column with collation - UTF8_BINARY_LCASE 56
68 7 0.2 5571.2 1.0X
mode df column with collation - UNICODE 47
52 5 0.2 4659.6 1.2X
mode df column with collation - UTF8_BINARY 44
48 3 0.2 4423.5 1.3X
mode df column with collation - UNICODE_CI 43
47 4 0.2 4316.9 1.3X
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]