[jira] [Assigned] (SPARK-23418) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema
[ https://issues.apache.org/jira/browse/SPARK-23418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-23418: --- Assignee: Ryan Blue > DataSourceV2 should not allow userSpecifiedSchema without > ReadSupportWithSchema > --- > > Key: SPARK-23418 > URL: https://issues.apache.org/jira/browse/SPARK-23418 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Fix For: 2.4.0 > > > DataSourceV2 currently does not reject user-specified schemas when a source > does not implement ReadSupportWithSchema. This is confusing behavior. Here's > a quote from a discussion on SPARK-23203: > {quote}I think this will cause confusion when source schemas change. Also, I > can't think of a situation where it is a good idea to pass a schema that is > ignored. > Here's an example of how this will be confusing: think of a job that supplies > a schema identical to the table's schema and runs fine, so it goes into > production. What happens when the table's schema changes? If someone adds a > column to the table, then the job will start failing and report that the > source doesn't support user-supplied schemas, even though it had previously > worked just fine with a user-supplied schema. In addition, the change to the > table is actually compatible with the old job because the new column will be > removed by a projection. > To fix this situation, it may be tempting to use the user-supplied schema as > an initial projection. But that doesn't make sense because we don't need two > projection mechanisms. If we used this as a second way to project, it would > be confusing that you can't actually leave out columns (at least for CSV) and > it would be odd that using this path you can coerce types, which should > usually be done by Spark. > I think it is best not to allow a user-supplied schema when it isn't > supported by a source. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23418) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema
[ https://issues.apache.org/jira/browse/SPARK-23418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23418: Assignee: Apache Spark > DataSourceV2 should not allow userSpecifiedSchema without > ReadSupportWithSchema > --- > > Key: SPARK-23418 > URL: https://issues.apache.org/jira/browse/SPARK-23418 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ryan Blue >Assignee: Apache Spark >Priority: Major > > DataSourceV2 currently does not reject user-specified schemas when a source > does not implement ReadSupportWithSchema. This is confusing behavior. Here's > a quote from a discussion on SPARK-23203: > {quote}I think this will cause confusion when source schemas change. Also, I > can't think of a situation where it is a good idea to pass a schema that is > ignored. > Here's an example of how this will be confusing: think of a job that supplies > a schema identical to the table's schema and runs fine, so it goes into > production. What happens when the table's schema changes? If someone adds a > column to the table, then the job will start failing and report that the > source doesn't support user-supplied schemas, even though it had previously > worked just fine with a user-supplied schema. In addition, the change to the > table is actually compatible with the old job because the new column will be > removed by a projection. > To fix this situation, it may be tempting to use the user-supplied schema as > an initial projection. But that doesn't make sense because we don't need two > projection mechanisms. If we used this as a second way to project, it would > be confusing that you can't actually leave out columns (at least for CSV) and > it would be odd that using this path you can coerce types, which should > usually be done by Spark. > I think it is best not to allow a user-supplied schema when it isn't > supported by a source. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23418) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema
[ https://issues.apache.org/jira/browse/SPARK-23418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23418: Assignee: (was: Apache Spark) > DataSourceV2 should not allow userSpecifiedSchema without > ReadSupportWithSchema > --- > > Key: SPARK-23418 > URL: https://issues.apache.org/jira/browse/SPARK-23418 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ryan Blue >Priority: Major > > DataSourceV2 currently does not reject user-specified schemas when a source > does not implement ReadSupportWithSchema. This is confusing behavior. Here's > a quote from a discussion on SPARK-23203: > {quote}I think this will cause confusion when source schemas change. Also, I > can't think of a situation where it is a good idea to pass a schema that is > ignored. > Here's an example of how this will be confusing: think of a job that supplies > a schema identical to the table's schema and runs fine, so it goes into > production. What happens when the table's schema changes? If someone adds a > column to the table, then the job will start failing and report that the > source doesn't support user-supplied schemas, even though it had previously > worked just fine with a user-supplied schema. In addition, the change to the > table is actually compatible with the old job because the new column will be > removed by a projection. > To fix this situation, it may be tempting to use the user-supplied schema as > an initial projection. But that doesn't make sense because we don't need two > projection mechanisms. If we used this as a second way to project, it would > be confusing that you can't actually leave out columns (at least for CSV) and > it would be odd that using this path you can coerce types, which should > usually be done by Spark. > I think it is best not to allow a user-supplied schema when it isn't > supported by a source. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org