viirya commented on PR #52578: URL: https://github.com/apache/spark/pull/52578#issuecomment-3398615399
> Thanks @viirya for the reply! If i understand correctly, the core concern is: “If Spark pushes a variant scan into a DSv2 source that doesn’t understand it, we’ll see unexpected errors.” I agree — that’s exactly the failure mode we should avoid. > > Would it be acceptable if I add a planner-side guard so this only happens when the source explicitly opts in? Thanks @huaxingao for the discussion and understanding. I think we need an explicit DSv2 API to make the contract between Spark and datasource implementation around this variant pushdown feature. That is this PR proposed to do. > On tests: I understand my InMemoryTable test might have issues. Can we fix it to exercise the planner contract correctly, or do you consider a built-in DSv2 Parquet test required for DSv2 change? That is also what this PR proposed to do, adding dedicated DSv2 Variant pushdown tests for both row-based and vectorized-based readers with good test coverage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
