GideonPotok commented on PR #45453:
URL: https://github.com/apache/spark/pull/45453#issuecomment-2002555272
@dbatomic @stefanbuk-db This PR is ready for your initial review.
Benchmark is queued to run in GHA, I will upload results to this branch once
that finishes.
Here are some local results:
```
[info] 13:51:04.324 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java classes
where applicable
[info] Running benchmark: filter df column with collation
[info] Running case: filter df column with collation - UNICODE_CI
[info] Stopped after 7 iterations, 2237 ms
[info] Running case: filter df column with collation - UNICODE
[info] Stopped after 26 iterations, 2040 ms
[info] Running case: filter df column with collation - UTF8_BINARY_LCASE
[info] Stopped after 9 iterations, 2148 ms
[info] Running case: filter df column with collation - UTF8_BINARY
[info] Stopped after 30 iterations, 2017 ms
[info] OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Mac OS X 14.4
[info] Apple M3 Max
[info] filter df column with collation: Best Time(ms)
Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
-----------------------------------------------------------------------------------------------------------------------------------
[info] filter df column with collation - UNICODE_CI 303
320 14 0.0 303345750.0 1.0X
[info] filter df column with collation - UNICODE 67
78 7 0.0 67441125.0 4.5X
[info] filter df column with collation - UTF8_BINARY_LCASE 196
239 36 0.0 196200250.0 1.5X
[info] filter df column with collation - UTF8_BINARY 61
67 4 0.0 61342750.0 4.9X
[info] Running benchmark: filter collation types
[info] Running case: filter - UTF8_BINARY
[info] Stopped after 349209 iterations, 2000 ms
[info] Running case: hashFunction - UTF8_BINARY
[info] Stopped after 262778 iterations, 2000 ms
[info] Running case: filter - UTF8_BINARY_LCASE
[info] Stopped after 36348 iterations, 2000 ms
[info] Running case: hashFunction - UTF8_BINARY_LCASE
[info] Stopped after 67744 iterations, 2000 ms
[info] Running case: filter - UNICODE
[info] Stopped after 276488 iterations, 2000 ms
[info] Running case: hashFunction - UNICODE
[info] Stopped after 13285 iterations, 2000 ms
[info] Running case: filter - UNICODE_CI
[info] Stopped after 40592 iterations, 2000 ms
[info] Running case: hashFunction - UNICODE_CI
[info] Stopped after 13140 iterations, 2000 ms
[info] OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Mac OS X 14.4
[info] Apple M3 Max
[info] filter collation types: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
------------------------------------------------------------------------------------------------------------------------
[info] filter - UTF8_BINARY 0
0 0 255.4 3.9 1.0X
[info] hashFunction - UTF8_BINARY 0
0 0 160.0 6.3 0.6X
[info] filter - UTF8_BINARY_LCASE 0
0 0 53.1 18.8 0.2X
[info] hashFunction - UTF8_BINARY_LCASE 0
0 0 52.2 19.2 0.2X
[info] filter - UNICODE 0
0 0 184.6 5.4 0.7X
[info] hashFunction - UNICODE 0
0 0 12.9 77.3 0.1X
[info] filter - UNICODE_CI 0
0 0 44.4 22.5 0.2X
[info] hashFunction - UNICODE_CI 0
0 0 13.8 72.2 0.1X
```
Let me know next steps. Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]