viirya edited a comment on pull request #34642: URL: https://github.com/apache/spark/pull/34642#issuecomment-977016485
> I'm trying to understand the motivation. Is it because in-memory table can output rows efficiently? Parquet scan can also output rows but we try our best to output columnar batches. For Parquet scan, when we say it to output columnar batches, actually it behaves quite different than row-based approach because it runs vectorized Parquet reader. I think this is why we try our best to do columnar batches on Parquet or Orc scan because vectorized reader usually has much better performance which can counteract the cost of columnar-to-row transition if any later. For in-memory table, it is not actually doing a physical disk scan but the data is already serialized in memory. The motivation is that during some local experiments on other works I found columnar-to-row transition is costly and the columnar output seems meaningless. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org