wangyum commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-840985630
@dongjoon-hyun I think this performance issue is not caused by this change. This PR only changes the `In` predicate. It is also slow without this change: ``` OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Select 0 string row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Parquet Vectorized 10623 10994 272 1.5 675.4 1.0X Parquet Vectorized (Pushdown) 627 657 24 25.1 39.9 16.9X Native ORC Vectorized 7490 7653 203 2.1 476.2 1.4X Native ORC Vectorized (Pushdown) 553 606 34 28.4 35.2 19.2X ``` https://github.com/wangyum/spark/runs/2580852093 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
