[ https://issues.apache.org/jira/browse/SPARK-23418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363127#comment-16363127 ]
Apache Spark commented on SPARK-23418: -------------------------------------- User 'rdblue' has created a pull request for this issue: https://github.com/apache/spark/pull/20603 > DataSourceV2 should not allow userSpecifiedSchema without > ReadSupportWithSchema > ------------------------------------------------------------------------------- > > Key: SPARK-23418 > URL: https://issues.apache.org/jira/browse/SPARK-23418 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.3.0 > Reporter: Ryan Blue > Priority: Major > > DataSourceV2 currently does not reject user-specified schemas when a source > does not implement ReadSupportWithSchema. This is confusing behavior. Here's > a quote from a discussion on SPARK-23203: > {quote}I think this will cause confusion when source schemas change. Also, I > can't think of a situation where it is a good idea to pass a schema that is > ignored. > Here's an example of how this will be confusing: think of a job that supplies > a schema identical to the table's schema and runs fine, so it goes into > production. What happens when the table's schema changes? If someone adds a > column to the table, then the job will start failing and report that the > source doesn't support user-supplied schemas, even though it had previously > worked just fine with a user-supplied schema. In addition, the change to the > table is actually compatible with the old job because the new column will be > removed by a projection. > To fix this situation, it may be tempting to use the user-supplied schema as > an initial projection. But that doesn't make sense because we don't need two > projection mechanisms. If we used this as a second way to project, it would > be confusing that you can't actually leave out columns (at least for CSV) and > it would be odd that using this path you can coerce types, which should > usually be done by Spark. > I think it is best not to allow a user-supplied schema when it isn't > supported by a source. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org