[GitHub] [spark] stczwd commented on pull request #35256: [SPARK-37933][SQL] Limit push down for parquet vectorized reader

GitBox Sun, 23 Jan 2022 18:08:35 -0800


stczwd commented on pull request #35256:
URL: https://github.com/apache/spark/pull/35256#issuecomment-1019645995



   Thanks for your reply @c21. 
   Sorry, my last test didn't adjust the limit amount, so the limit didn't 
really work. 
   This time I also test `limitBenchMark` in 
`ParquetNestedSchemaPruningBenchmarkset`, but set the batch capacity to 10240 
and the limit to 12500. Now we found that the performance improved by 1.3x.
   ```
   OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1025-azure
   Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
   Limiting:                                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Top-level column with out limit                      83            101       
   21         12.1          82.7       1.0X
   Nested column with out limit                         77             95       
   11         12.9          77.4       1.1X
   Nested column in array with out limit               105            121       
   19          9.5         105.1       0.8X
   Top-level column with limit                          61             69       
    5         16.3          61.3       1.3X
   Nested column with limit                             66             73       
    7         15.2          65.8       1.3X
   Nested column in array with limit                   101            113       
   20          9.9         101.2       0.8X
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] stczwd commented on pull request #35256: [SPARK-37933][SQL] Limit push down for parquet vectorized reader

Reply via email to