stczwd commented on pull request #35256:
URL: https://github.com/apache/spark/pull/35256#issuecomment-1019645995
Thanks for your reply @c21.
Sorry, my last test didn't adjust the limit amount, so the limit didn't
really work.
This time I also test `limitBenchMark` in
`ParquetNestedSchemaPruningBenchmarkset`, but set the batch capacity to 10240
and the limit to 12500. Now we found that the performance improved by 1.3x.
```
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1025-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Limiting: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Top-level column with out limit 83 101
21 12.1 82.7 1.0X
Nested column with out limit 77 95
11 12.9 77.4 1.1X
Nested column in array with out limit 105 121
19 9.5 105.1 0.8X
Top-level column with limit 61 69
5 16.3 61.3 1.3X
Nested column with limit 66 73
7 15.2 65.8 1.3X
Nested column in array with limit 101 113
20 9.9 101.2 0.8X
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]