[jira] [Commented] (SPARK-30687) When reading from a file with pre-defined schema and encountering a single value that is not the same type as that of its column , Spark nullifies the entire row

2020-02-09 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033298#comment-17033298
 ] 

Maxim Gekk commented on SPARK-30687:


This feature will come with Spark 3.0 
https://github.com/apache/spark/commit/11e5f1bcd49eec8ab4225d6e68a051b5c6a21cb2 
. It wasn't backported to 2.4.x

> When reading from a file with pre-defined schema and encountering a single 
> value that is not the same type as that of its column , Spark nullifies the 
> entire row
> -
>
> Key: SPARK-30687
> URL: https://issues.apache.org/jira/browse/SPARK-30687
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bao Nguyen
>Priority: Major
>
> When reading from a file with pre-defined schema and encountering a single 
> value that is not the same type as that of its column , Spark nullifies the 
> entire row instead of setting the value at that cell to be null.
>  
> {code:java}
> case class TestModel(
>   num: Double, test: String, mac: String, value: Double
> )
> val schema = 
> ScalaReflection.schemaFor[TestModel].dataType.asInstanceOf[StructType]
> //here's the content of the file test.data
> //1~test~mac1~2
> //1.0~testdatarow2~mac2~non-numeric
> //2~test1~mac1~3
> val ds = spark
>   .read
>   .schema(schema)
>   .option("delimiter", "~")
>   .csv("/test-data/test.data")
> ds.show();
> //the content of data frame. second row is all null. 
> //  ++-++-+
> //  | num| test| mac|value|
> //  ++-++-+
> //  | 1.0| test|mac1|  2.0|
> //  |null| null|null| null|
> //  | 2.0|test1|mac1|  3.0|
> //  ++-++-+
> //should be
> // ++--++-+ 
> // | num| test | mac|value| 
> // ++--++-+ 
> // | 1.0| test |mac1| 2.0 | 
> // |1.0 |testdatarow2  |mac2| null| 
> // | 2.0|test1 |mac1| 3.0 | 
> // ++--++-+{code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30687) When reading from a file with pre-defined schema and encountering a single value that is not the same type as that of its column , Spark nullifies the entire row

2020-02-04 Thread pavithra ramachandran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029720#comment-17029720
 ] 

pavithra ramachandran commented on SPARK-30687:
---

yes. Issue is present 2.4.x also.

> When reading from a file with pre-defined schema and encountering a single 
> value that is not the same type as that of its column , Spark nullifies the 
> entire row
> -
>
> Key: SPARK-30687
> URL: https://issues.apache.org/jira/browse/SPARK-30687
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bao Nguyen
>Priority: Major
>
> When reading from a file with pre-defined schema and encountering a single 
> value that is not the same type as that of its column , Spark nullifies the 
> entire row instead of setting the value at that cell to be null.
>  
> {code:java}
> case class TestModel(
>   num: Double, test: String, mac: String, value: Double
> )
> val schema = 
> ScalaReflection.schemaFor[TestModel].dataType.asInstanceOf[StructType]
> //here's the content of the file test.data
> //1~test~mac1~2
> //1.0~testdatarow2~mac2~non-numeric
> //2~test1~mac1~3
> val ds = spark
>   .read
>   .schema(schema)
>   .option("delimiter", "~")
>   .csv("/test-data/test.data")
> ds.show();
> //the content of data frame. second row is all null. 
> //  ++-++-+
> //  | num| test| mac|value|
> //  ++-++-+
> //  | 1.0| test|mac1|  2.0|
> //  |null| null|null| null|
> //  | 2.0|test1|mac1|  3.0|
> //  ++-++-+
> //should be
> // ++--++-+ 
> // | num| test | mac|value| 
> // ++--++-+ 
> // | 1.0| test |mac1| 2.0 | 
> // |1.0 |testdatarow2  |mac2| null| 
> // | 2.0|test1 |mac1| 3.0 | 
> // ++--++-+{code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30687) When reading from a file with pre-defined schema and encountering a single value that is not the same type as that of its column , Spark nullifies the entire row

2020-02-03 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029572#comment-17029572
 ] 

Hyukjin Kwon commented on SPARK-30687:
--

[~bnguye1010], Spark 2.3.x is EOL. Can you test and see if the issue exists in 
2.4.x?

> When reading from a file with pre-defined schema and encountering a single 
> value that is not the same type as that of its column , Spark nullifies the 
> entire row
> -
>
> Key: SPARK-30687
> URL: https://issues.apache.org/jira/browse/SPARK-30687
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bao Nguyen
>Priority: Major
>
> When reading from a file with pre-defined schema and encountering a single 
> value that is not the same type as that of its column , Spark nullifies the 
> entire row instead of setting the value at that cell to be null.
>  
> {code:java}
> case class TestModel(
>   num: Double, test: String, mac: String, value: Double
> )
> val schema = 
> ScalaReflection.schemaFor[TestModel].dataType.asInstanceOf[StructType]
> //here's the content of the file test.data
> //1~test~mac1~2
> //1.0~testdatarow2~mac2~non-numeric
> //2~test1~mac1~3
> val ds = spark
>   .read
>   .schema(schema)
>   .option("delimiter", "~")
>   .csv("/test-data/test.data")
> ds.show();
> //the content of data frame. second row is all null. 
> //  ++-++-+
> //  | num| test| mac|value|
> //  ++-++-+
> //  | 1.0| test|mac1|  2.0|
> //  |null| null|null| null|
> //  | 2.0|test1|mac1|  3.0|
> //  ++-++-+
> //should be
> // ++--++-+ 
> // | num| test | mac|value| 
> // ++--++-+ 
> // | 1.0| test |mac1| 2.0 | 
> // |1.0 |testdatarow2  |mac2| null| 
> // | 2.0|test1 |mac1| 3.0 | 
> // ++--++-+{code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30687) When reading from a file with pre-defined schema and encountering a single value that is not the same type as that of its column , Spark nullifies the entire row

2020-01-31 Thread pavithra ramachandran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027333#comment-17027333
 ] 

pavithra ramachandran commented on SPARK-30687:
---

l would like to work on this issue.

> When reading from a file with pre-defined schema and encountering a single 
> value that is not the same type as that of its column , Spark nullifies the 
> entire row
> -
>
> Key: SPARK-30687
> URL: https://issues.apache.org/jira/browse/SPARK-30687
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bao Nguyen
>Priority: Major
>
> When reading from a file with pre-defined schema and encountering a single 
> value that is not the same type as that of its column , Spark nullifies the 
> entire row instead of setting the value at that cell to be null.
>  
> {code:java}
> case class TestModel(
>   num: Double, test: String, mac: String, value: Double
> )
> val schema = 
> ScalaReflection.schemaFor[TestModel].dataType.asInstanceOf[StructType]
> //here's the content of the file test.data
> //1~test~mac1~2
> //1.0~testdatarow2~mac2~non-numeric
> //2~test1~mac1~3
> val ds = spark
>   .read
>   .schema(schema)
>   .option("delimiter", "~")
>   .csv("/test-data/test.data")
> ds.show();
> //the content of data frame. second row is all null. 
> //  ++-++-+
> //  | num| test| mac|value|
> //  ++-++-+
> //  | 1.0| test|mac1|  2.0|
> //  |null| null|null| null|
> //  | 2.0|test1|mac1|  3.0|
> //  ++-++-+
> //should be
> // ++--++-+ 
> // | num| test | mac|value| 
> // ++--++-+ 
> // | 1.0| test |mac1| 2.0 | 
> // |1.0 |testdatarow2  |mac2| null| 
> // | 2.0|test1 |mac1| 3.0 | 
> // ++--++-+{code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org