LuciferYang commented on PR #37609:
URL: https://github.com/apache/spark/pull/37609#issuecomment-1245286689

   ```
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 150:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     1407           1408      
     1          0.1       14071.2       1.0X
   Use zip + collection.breakOut                       1327           1328      
     2          0.1       13270.1       1.1X
   Use Manual builder                                  1282           1282      
     0          0.1       12815.3       1.1X
   Use Manual map                                      1734           1735      
     1          0.1       17339.8       0.8X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 200:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     1769           1769      
     1          0.1       17688.7       1.0X
   Use zip + collection.breakOut                       1595           1598      
     5          0.1       15949.0       1.1X
   Use Manual builder                                  1544           1546      
     3          0.1       15440.0       1.1X
   Use Manual map                                      2416           2418      
     2          0.0       24161.1       0.7X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 300:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     2701           2705      
     6          0.0       27007.0       1.0X
   Use zip + collection.breakOut                       2472           2475      
     4          0.0       24719.1       1.1X
   Use Manual builder                                  2379           2384      
     8          0.0       23787.5       1.1X
   Use Manual map                                      3803           3807      
     5          0.0       38031.9       0.7X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 400:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     3757           3758      
     2          0.0       37565.1       1.0X
   Use zip + collection.breakOut                       3446           3447      
     3          0.0       34455.1       1.1X
   Use Manual builder                                  3314           3318      
     5          0.0       33139.8       1.1X
   Use Manual map                                      5283           5287      
     5          0.0       52832.3       0.7X
   ```
   
   Add results of input size 150, 200, 300, 400.
   
   @cloud-fan , from bench results:
   
   - If input data size < 500, the performance of using `zip + 
collection.breakOut` and `while loop manually to build the map with mapbuilder` 
are close, 10%+ faster than `zip(...).toMap`.
   
   - If input data size >= 500,  will be no significant performance gap


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to