Github user CodingCat commented on the issue:

    https://github.com/apache/spark/pull/19810
  
    Hi, @cloud-fan, this PR is not only for the case where the data size is 
larger than the memory size, even when all data is in-memory, I observed up to 
10-40% speedup  because the implementation here
    
    (1) read less data
    
    (2) started less tasks
    
    you can understand this PR as it implement the functionality of Parquet's 
footer for the in-memory table
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to