wangyum commented on pull request #29642:
URL: https://github.com/apache/spark/pull/29642#issuecomment-840985630


   @dongjoon-hyun I think this performance issue is not caused by this change. 
This PR only changes the `In` predicate. It is also slow without this change:
   
   ```
   OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
   Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
   Select 0 string row (value IS NULL):      Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Parquet Vectorized                                10623          10994       
  272          1.5         675.4       1.0X
   Parquet Vectorized (Pushdown)                       627            657       
   24         25.1          39.9      16.9X
   Native ORC Vectorized                              7490           7653       
  203          2.1         476.2       1.4X
   Native ORC Vectorized (Pushdown)                    553            606       
   34         28.4          35.2      19.2X
   ```
   https://github.com/wangyum/spark/runs/2580852093


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to