LuciferYang commented on PR #37609:
URL: https://github.com/apache/spark/pull/37609#issuecomment-1244867478

   > The idea is still valid: we can write a while loop manually to build the 
map, instead of `zip(...).toMap`, if this code path is proven to be performance 
critical.
   
   @cloud-fan Do you mean like follows?
   
   ```scala
     private def zipToMapUseMapBuilder[A, B, K, V](keys: Seq[A], values: 
Seq[B]): Map[K, V] = {
       import scala.collection.immutable
       val builder = immutable.Map.newBuilder[K, V]
       val keyIter = keys.iterator
       val valueIter = values.iterator
       while (keyIter.hasNext && valueIter.hasNext) {
         builder += (keyIter.next(), valueIter.next()).asInstanceOf[(K, V)]
       }
       builder.result()
     }
   
     private def zipToMapUseMap[A, B, K, V](keys: Seq[A], values: Seq[B]): 
Map[K, V] = {
       var elems: Map[K, V] = Map.empty[K, V]
       val keyIter = keys.iterator
       val valueIter = values.iterator
       while (keyIter.hasNext && valueIter.hasNext) {
         elems += (keyIter.next().asInstanceOf[K] -> 
valueIter.next().asInstanceOf[V])
       }
       elems
     }
   ```
    
   
   I write a microben to compare `data.zip(data).toMap`, 
`data.zip(data)(collection.breakOut)` and above methods, the result as follows:
   
   ```
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 1:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                      22             22       
    1          4.6         217.6       1.0X
   Use zip + collection.breakOut                         8              9       
    1         11.9          84.4       2.6X
   Use Manual builder                                    3              3       
    0         32.1          31.2       7.0X
   Use Manual map                                        3              3       
    0         36.5          27.4       7.9X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 5:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     100            100       
    1          1.0         998.8       1.0X
   Use zip + collection.breakOut                        11             11       
    1          9.1         110.5       9.0X
   Use Manual builder                                   76             76       
    1          1.3         755.6       1.3X
   Use Manual map                                       47             47       
    1          2.1         468.1       2.1X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 10:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     123            123       
    1          0.8        1226.3       1.0X
   Use zip + collection.breakOut                        16             16       
    1          6.2         160.9       7.6X
   Use Manual builder                                   95             95       
    1          1.1         947.2       1.3X
   Use Manual map                                       92             94       
    1          1.1         922.5       1.3X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 20:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     162            162       
    1          0.6        1615.7       1.0X
   Use zip + collection.breakOut                        26             27       
    1          3.8         261.3       6.2X
   Use Manual builder                                  132            133       
    1          0.8        1321.4       1.2X
   Use Manual map                                      185            186       
    1          0.5        1846.3       0.9X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 50:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     604            606       
    2          0.2        6042.7       1.0X
   Use zip + collection.breakOut                        76             77       
    2          1.3         759.9       8.0X
   Use Manual builder                                  534            537       
    2          0.2        5335.6       1.1X
   Use Manual map                                      510            513       
    2          0.2        5102.1       1.2X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 100:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     1087           1087      
     0          0.1       10865.5       1.0X
   Use zip + collection.breakOut                        134            135      
     1          0.7        1336.2       8.1X
   Use Manual builder                                  1000           1002      
     3          0.1        9996.8       1.1X
   Use Manual map                                      1081           1083      
     2          0.1       10813.0       1.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 500:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     4536           4544      
    10          0.0       45364.9       1.0X
   Use zip + collection.breakOut                        778            784      
     5          0.1        7783.8       5.8X
   Use Manual builder                                  4347           4347      
     0          0.0       43470.2       1.0X
   Use Manual map                                      6775           6785      
    15          0.0       67745.2       0.7X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 1000:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
--------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     11813          11822     
     13          0.0      118125.2       1.0X
   Use zip + collection.breakOut                        1590           1601     
     15          0.1       15898.3       7.4X
   Use Manual builder                                  11431          11450     
     27          0.0      114312.3       1.0X
   Use Manual map                                      14801          14812     
     16          0.0      148005.2       0.8X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 5000:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
--------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     64917          65007     
    127          0.0      649172.0       1.0X
   Use zip + collection.breakOut                        8127           8130     
      5          0.0       81265.8       8.0X
   Use Manual builder                                  63836          63959     
    174          0.0      638356.4       1.0X
   Use Manual map                                      88139          88308     
    239          0.0      881392.4       0.7X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 10000:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
---------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     130985         131331    
     489          0.0     1309847.7       1.0X
   Use zip + collection.breakOut                        16133          16142    
      13          0.0      161325.8       8.1X
   Use Manual builder                                  136655         136916    
     369          0.0     1366553.8       1.0X
   Use Manual map                                      190525         190794    
     380          0.0     1905252.4       0.7X
   
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test zip to map with collectionSize = 20000:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
---------------------------------------------------------------------------------------------------------------------------
   Use zip + toMap                                     306207         306628    
     595          0.0     3062071.9       1.0X
   Use zip + collection.breakOut                        32482          32498    
      23          0.0      324818.3       9.4X
   Use Manual builder                                  336547         337705    
    1637          0.0     3365473.0       0.9X
   Use Manual map                                      410734         411271    
     758          0.0     4107344.5       0.7X
   ```
   
   From the results, the performance of `while loop manually to build the map` 
is not fast enough. Is there a problem with my test code?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to