rdblue commented on issue #23836: [SPARK-26915][SQL] DataFrameWriter.save() should write without schema validation URL: https://github.com/apache/spark/pull/23836#issuecomment-465280946 @jose-torres, I'm not saying that the default should be v1 forever. The right way to move over is to develop them in parallel and switch over when we can validate that the behavior is the same. Right now, v2 can't run CTAS plans so we clearly can't switch. But when v2 has all of the necessary logical plans, then we can start running the existing behavior tests on v2 to see what changes remain, like changing validation for path-based tables. Continuing to use SaveMode actually inhibits the move to v2. If write paths use SaveMode, then they can pass behavior tests and appear to work when they actually don't. Also, let me clarify my comment on using v1. I think we need to keep v1 around until the process of moving to v2 is complete because there are code paths that we know can't be changed to v2 without altering behavior. For example, we've agreed to standardize behavior on what file sources do. Users will have to choose between existing behavior and using v2 for other sources. I'm not confident that all v1 behaviors will be available in v2. In v1, a CTAS plan can be validated against an existing table. In some cases, that CTAS should fail because the table exists (SQL) and in some cases, the plan that is created should be AppendData instead of CTAS (DataFrameWriter). Does the validation for AppendData work exactly the same way as validating a CTAS that is actually and append? My guess is that it doesn't, and that we might not want it to. I think the final solution is to introduce a new write API that always uses v2 and makes it obvious what plan will be used. I've proposed such an API in the logical plans SPIP. Moving users to that API and eventually deprecating the DataFrameWriter API will take care of migrating the last few cases (which should be minor) from v1 to v2.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
