dongjoon-hyun edited a comment on pull request #29642:
URL: https://github.com/apache/spark/pull/29642#issuecomment-840817042
> No. Github action runs on different machines, there is a performance
difference between them.
No, @wangyum . I'm meaning the **ratio** between ORC and Parquet on the same
machine run. Previously, ORC and Parquet shows the similar performance but now
Parquet become slower than ORC after this PR. For example, the following.
```
- Parquet Vectorized 10512 10572
58 1.5 668.4 1.0X
- Parquet Vectorized (Pushdown) 596 621
19 26.4 37.9 17.6X
- Native ORC Vectorized 8555 8723
97 1.8 543.9 1.2X
- Native ORC Vectorized (Pushdown) 592 609
11 26.6 37.7 17.8X
+ Parquet Vectorized 9788 10231
259 1.6 622.3 1.0X
+ Parquet Vectorized (Pushdown) 493 536
29 31.9 31.3 19.9X
+ Native ORC Vectorized 6487 6575
137 2.4 412.4 1.5X
+ Native ORC Vectorized (Pushdown) 436 447
14 36.1 27.7 22.4X
```
Although the value is too small, this generate result shows a slowdown of
Parquet compared with ORC. That was my questions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]