viirya edited a comment on pull request #34642:
URL: https://github.com/apache/spark/pull/34642#issuecomment-977016485


   > I'm trying to understand the motivation. Is it because in-memory table can 
output rows efficiently? Parquet scan can also output rows but we try our best 
to output columnar batches.
   
   For Parquet scan, when we say it to output columnar batches, actually it 
behaves quite different than row-based approach because it runs vectorized 
Parquet reader. I think this is why we try our best to do columnar batches on 
Parquet or Orc scan because vectorized reader usually has much better 
performance which can counteract the cost of columnar-to-row transition if any 
later.
   
   For in-memory table, it is not actually doing a physical disk scan but the 
data is already serialized in memory. The motivation is that during some local 
experiments on other works I found columnar-to-row transition is costly and the 
columnar output seems meaningless.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to