[jira] [Commented] (SPARK-30687) When reading from a file with pre-defined schema and encountering a single value that is not the same type as that of its column , Spark nullifies the entire row
[ https://issues.apache.org/jira/browse/SPARK-30687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033298#comment-17033298 ] Maxim Gekk commented on SPARK-30687: This feature will come with Spark 3.0 https://github.com/apache/spark/commit/11e5f1bcd49eec8ab4225d6e68a051b5c6a21cb2 . It wasn't backported to 2.4.x > When reading from a file with pre-defined schema and encountering a single > value that is not the same type as that of its column , Spark nullifies the > entire row > - > > Key: SPARK-30687 > URL: https://issues.apache.org/jira/browse/SPARK-30687 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bao Nguyen >Priority: Major > > When reading from a file with pre-defined schema and encountering a single > value that is not the same type as that of its column , Spark nullifies the > entire row instead of setting the value at that cell to be null. > > {code:java} > case class TestModel( > num: Double, test: String, mac: String, value: Double > ) > val schema = > ScalaReflection.schemaFor[TestModel].dataType.asInstanceOf[StructType] > //here's the content of the file test.data > //1~test~mac1~2 > //1.0~testdatarow2~mac2~non-numeric > //2~test1~mac1~3 > val ds = spark > .read > .schema(schema) > .option("delimiter", "~") > .csv("/test-data/test.data") > ds.show(); > //the content of data frame. second row is all null. > // ++-++-+ > // | num| test| mac|value| > // ++-++-+ > // | 1.0| test|mac1| 2.0| > // |null| null|null| null| > // | 2.0|test1|mac1| 3.0| > // ++-++-+ > //should be > // ++--++-+ > // | num| test | mac|value| > // ++--++-+ > // | 1.0| test |mac1| 2.0 | > // |1.0 |testdatarow2 |mac2| null| > // | 2.0|test1 |mac1| 3.0 | > // ++--++-+{code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30687) When reading from a file with pre-defined schema and encountering a single value that is not the same type as that of its column , Spark nullifies the entire row
[ https://issues.apache.org/jira/browse/SPARK-30687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029720#comment-17029720 ] pavithra ramachandran commented on SPARK-30687: --- yes. Issue is present 2.4.x also. > When reading from a file with pre-defined schema and encountering a single > value that is not the same type as that of its column , Spark nullifies the > entire row > - > > Key: SPARK-30687 > URL: https://issues.apache.org/jira/browse/SPARK-30687 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bao Nguyen >Priority: Major > > When reading from a file with pre-defined schema and encountering a single > value that is not the same type as that of its column , Spark nullifies the > entire row instead of setting the value at that cell to be null. > > {code:java} > case class TestModel( > num: Double, test: String, mac: String, value: Double > ) > val schema = > ScalaReflection.schemaFor[TestModel].dataType.asInstanceOf[StructType] > //here's the content of the file test.data > //1~test~mac1~2 > //1.0~testdatarow2~mac2~non-numeric > //2~test1~mac1~3 > val ds = spark > .read > .schema(schema) > .option("delimiter", "~") > .csv("/test-data/test.data") > ds.show(); > //the content of data frame. second row is all null. > // ++-++-+ > // | num| test| mac|value| > // ++-++-+ > // | 1.0| test|mac1| 2.0| > // |null| null|null| null| > // | 2.0|test1|mac1| 3.0| > // ++-++-+ > //should be > // ++--++-+ > // | num| test | mac|value| > // ++--++-+ > // | 1.0| test |mac1| 2.0 | > // |1.0 |testdatarow2 |mac2| null| > // | 2.0|test1 |mac1| 3.0 | > // ++--++-+{code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30687) When reading from a file with pre-defined schema and encountering a single value that is not the same type as that of its column , Spark nullifies the entire row
[ https://issues.apache.org/jira/browse/SPARK-30687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029572#comment-17029572 ] Hyukjin Kwon commented on SPARK-30687: -- [~bnguye1010], Spark 2.3.x is EOL. Can you test and see if the issue exists in 2.4.x? > When reading from a file with pre-defined schema and encountering a single > value that is not the same type as that of its column , Spark nullifies the > entire row > - > > Key: SPARK-30687 > URL: https://issues.apache.org/jira/browse/SPARK-30687 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bao Nguyen >Priority: Major > > When reading from a file with pre-defined schema and encountering a single > value that is not the same type as that of its column , Spark nullifies the > entire row instead of setting the value at that cell to be null. > > {code:java} > case class TestModel( > num: Double, test: String, mac: String, value: Double > ) > val schema = > ScalaReflection.schemaFor[TestModel].dataType.asInstanceOf[StructType] > //here's the content of the file test.data > //1~test~mac1~2 > //1.0~testdatarow2~mac2~non-numeric > //2~test1~mac1~3 > val ds = spark > .read > .schema(schema) > .option("delimiter", "~") > .csv("/test-data/test.data") > ds.show(); > //the content of data frame. second row is all null. > // ++-++-+ > // | num| test| mac|value| > // ++-++-+ > // | 1.0| test|mac1| 2.0| > // |null| null|null| null| > // | 2.0|test1|mac1| 3.0| > // ++-++-+ > //should be > // ++--++-+ > // | num| test | mac|value| > // ++--++-+ > // | 1.0| test |mac1| 2.0 | > // |1.0 |testdatarow2 |mac2| null| > // | 2.0|test1 |mac1| 3.0 | > // ++--++-+{code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30687) When reading from a file with pre-defined schema and encountering a single value that is not the same type as that of its column , Spark nullifies the entire row
[ https://issues.apache.org/jira/browse/SPARK-30687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027333#comment-17027333 ] pavithra ramachandran commented on SPARK-30687: --- l would like to work on this issue. > When reading from a file with pre-defined schema and encountering a single > value that is not the same type as that of its column , Spark nullifies the > entire row > - > > Key: SPARK-30687 > URL: https://issues.apache.org/jira/browse/SPARK-30687 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bao Nguyen >Priority: Major > > When reading from a file with pre-defined schema and encountering a single > value that is not the same type as that of its column , Spark nullifies the > entire row instead of setting the value at that cell to be null. > > {code:java} > case class TestModel( > num: Double, test: String, mac: String, value: Double > ) > val schema = > ScalaReflection.schemaFor[TestModel].dataType.asInstanceOf[StructType] > //here's the content of the file test.data > //1~test~mac1~2 > //1.0~testdatarow2~mac2~non-numeric > //2~test1~mac1~3 > val ds = spark > .read > .schema(schema) > .option("delimiter", "~") > .csv("/test-data/test.data") > ds.show(); > //the content of data frame. second row is all null. > // ++-++-+ > // | num| test| mac|value| > // ++-++-+ > // | 1.0| test|mac1| 2.0| > // |null| null|null| null| > // | 2.0|test1|mac1| 3.0| > // ++-++-+ > //should be > // ++--++-+ > // | num| test | mac|value| > // ++--++-+ > // | 1.0| test |mac1| 2.0 | > // |1.0 |testdatarow2 |mac2| null| > // | 2.0|test1 |mac1| 3.0 | > // ++--++-+{code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org