huaxingao commented on PR #52578:
URL: https://github.com/apache/spark/pull/52578#issuecomment-3398211655

   Thanks @viirya for the PR! Appreciate you putting together an alternative. A 
couple of thoughts before we move forward:
   
   1. Variant + column pruning are not special-cased semantics
   
   - Variant is just another column in the projected schema from Spark’s 
perspective. When a source supports column pruning, Spark should prune all 
columns it doesn’t need, including Variant, exactly like it does for other 
types. I don't think we need to introduce a new interface to control if Variant 
should be pruned.
   
   - If a particular DSv2 implementation does not support column pruning, then 
pushing Variant access down into that source is not meaningful; the primary 
benefit of Variant pushdown is to reduce the read surface via pruning. 
   
   2. On testing built-in Parquet vs. InMemoryTable
   
   - The InMemoryTable tests were designed to validate the planner and API 
plumbing, to test that Spark forms the right pruned schema and pushdown 
contracts and the source receives them. It should be sufficient for testing.
   
   - It's nice to add a smoke test with the built-in DSv2 Parquet to increase 
end-to-end confidence and catch regressions in common deployments. But I don't 
think it should be used as a judgement that a PR is not good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to