subject:"\[jira\] \[Assigned\] \(SPARK\-23418\) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema"

[jira] [Assigned] (SPARK-23418) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema

2018-02-21 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-23418:
---

Assignee: Ryan Blue

> DataSourceV2 should not allow userSpecifiedSchema without 
> ReadSupportWithSchema
> ---
>
> Key: SPARK-23418
> URL: https://issues.apache.org/jira/browse/SPARK-23418
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 2.4.0
>
>
> DataSourceV2 currently does not reject user-specified schemas when a source 
> does not implement ReadSupportWithSchema. This is confusing behavior. Here's 
> a quote from a discussion on SPARK-23203:
> {quote}I think this will cause confusion when source schemas change. Also, I 
> can't think of a situation where it is a good idea to pass a schema that is 
> ignored.
> Here's an example of how this will be confusing: think of a job that supplies 
> a schema identical to the table's schema and runs fine, so it goes into 
> production. What happens when the table's schema changes? If someone adds a 
> column to the table, then the job will start failing and report that the 
> source doesn't support user-supplied schemas, even though it had previously 
> worked just fine with a user-supplied schema. In addition, the change to the 
> table is actually compatible with the old job because the new column will be 
> removed by a projection.
> To fix this situation, it may be tempting to use the user-supplied schema as 
> an initial projection. But that doesn't make sense because we don't need two 
> projection mechanisms. If we used this as a second way to project, it would 
> be confusing that you can't actually leave out columns (at least for CSV) and 
> it would be odd that using this path you can coerce types, which should 
> usually be done by Spark.
> I think it is best not to allow a user-supplied schema when it isn't 
> supported by a source.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23418) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema

2018-02-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23418:


Assignee: Apache Spark

> DataSourceV2 should not allow userSpecifiedSchema without 
> ReadSupportWithSchema
> ---
>
> Key: SPARK-23418
> URL: https://issues.apache.org/jira/browse/SPARK-23418
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Assignee: Apache Spark
>Priority: Major
>
> DataSourceV2 currently does not reject user-specified schemas when a source 
> does not implement ReadSupportWithSchema. This is confusing behavior. Here's 
> a quote from a discussion on SPARK-23203:
> {quote}I think this will cause confusion when source schemas change. Also, I 
> can't think of a situation where it is a good idea to pass a schema that is 
> ignored.
> Here's an example of how this will be confusing: think of a job that supplies 
> a schema identical to the table's schema and runs fine, so it goes into 
> production. What happens when the table's schema changes? If someone adds a 
> column to the table, then the job will start failing and report that the 
> source doesn't support user-supplied schemas, even though it had previously 
> worked just fine with a user-supplied schema. In addition, the change to the 
> table is actually compatible with the old job because the new column will be 
> removed by a projection.
> To fix this situation, it may be tempting to use the user-supplied schema as 
> an initial projection. But that doesn't make sense because we don't need two 
> projection mechanisms. If we used this as a second way to project, it would 
> be confusing that you can't actually leave out columns (at least for CSV) and 
> it would be odd that using this path you can coerce types, which should 
> usually be done by Spark.
> I think it is best not to allow a user-supplied schema when it isn't 
> supported by a source.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23418) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema

2018-02-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23418:


Assignee: (was: Apache Spark)

> DataSourceV2 should not allow userSpecifiedSchema without 
> ReadSupportWithSchema
> ---
>
> Key: SPARK-23418
> URL: https://issues.apache.org/jira/browse/SPARK-23418
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Priority: Major
>
> DataSourceV2 currently does not reject user-specified schemas when a source 
> does not implement ReadSupportWithSchema. This is confusing behavior. Here's 
> a quote from a discussion on SPARK-23203:
> {quote}I think this will cause confusion when source schemas change. Also, I 
> can't think of a situation where it is a good idea to pass a schema that is 
> ignored.
> Here's an example of how this will be confusing: think of a job that supplies 
> a schema identical to the table's schema and runs fine, so it goes into 
> production. What happens when the table's schema changes? If someone adds a 
> column to the table, then the job will start failing and report that the 
> source doesn't support user-supplied schemas, even though it had previously 
> worked just fine with a user-supplied schema. In addition, the change to the 
> table is actually compatible with the old job because the new column will be 
> removed by a projection.
> To fix this situation, it may be tempting to use the user-supplied schema as 
> an initial projection. But that doesn't make sense because we don't need two 
> projection mechanisms. If we used this as a second way to project, it would 
> be confusing that you can't actually leave out columns (at least for CSV) and 
> it would be odd that using this path you can coerce types, which should 
> usually be done by Spark.
> I think it is best not to allow a user-supplied schema when it isn't 
> supported by a source.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23418) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema

[jira] [Assigned] (SPARK-23418) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema

[jira] [Assigned] (SPARK-23418) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema

3 matches

Site Navigation

Mail list logo

Footer information