Re: [PR] [WIP] [PROOF OF CONCEPT] [SPARK] [SQL] Collation Mode [spark]

via GitHub Thu, 13 Jun 2024 12:19:06 -0700


GideonPotok commented on code in PR #46917:
URL: https://github.com/apache/spark/pull/46917#discussion_r1638766832



##########
sql/core/benchmarks/CollationBenchmark-results.txt:
##########


Review Comment:
   @uros-db now the benchmark results are updated (jdk17 only at the moment) . 
Relative to each other, it looks good. It is about as performant as the other 
approach. 
   
   @uros-db @dbatomic  Which approach should we go with? 
   
   This PR: 
   ```
   
    OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1021-azure
    AMD EPYC 7763 64-Core Processor
    collation e2e benchmarks - mode - 10000 elements:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    
---------------------------------------------------------------------------------------------------------------------------------
    mode df column with collation - UTF8_BINARY_LCASE             58            
 69           7          0.2        5757.5       1.0X
    mode df column with collation - UNICODE                       52            
 58           5          0.2        5233.2       1.1X
    mode df column with collation - UTF8_BINARY                   45            
 50           5          0.2        4462.9       1.3X
    mode df column with collation - UNICODE_CI                    46            
 50           5          0.2        4570.9       1.3X
    ```
    
    ```
    The original approach (GroupMapReduce):
    OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1021-azure
    AMD EPYC 7763 64-Core Processor
    collation e2e benchmarks - mode - 10000 elements:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    
---------------------------------------------------------------------------------------------------------------------------------
    mode df column with collation - UTF8_BINARY_LCASE             56            
 68           7          0.2        5571.2       1.0X
    mode df column with collation - UNICODE                       47            
 52           5          0.2        4659.6       1.2X
    mode df column with collation - UTF8_BINARY                   44            
 48           3          0.2        4423.5       1.3X
    mode df column with collation - UNICODE_CI                    43            
 47           4          0.2        4316.9       1.3X
    ```
    
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [WIP] [PROOF OF CONCEPT] [SPARK] [SQL] Collation Mode [spark]

Reply via email to