[ https://issues.apache.org/jira/browse/SPARK-23448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-23448. ---------------------------------- Resolution: Fixed Fix Version/s: 2.3.1 Issue resolved by pull request 20666 [https://github.com/apache/spark/pull/20666] > Dataframe returns wrong result when column don't respect datatype > ----------------------------------------------------------------- > > Key: SPARK-23448 > URL: https://issues.apache.org/jira/browse/SPARK-23448 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2 > Environment: Local > Reporter: Ahmed ZAROUI > Assignee: Liang-Chi Hsieh > Priority: Major > Fix For: 2.3.1 > > > I have the following json file that contains some noisy data(String instead > of Array): > > {code:java} > {"attr1":"val1","attr2":"[\"val2\"]"} > {"attr1":"val1","attr2":["val2"]} > {code} > And i need to specify schema programatically like this: > > {code:java} > implicit val spark = SparkSession > .builder() > .master("local[*]") > .config("spark.ui.enabled", false) > .config("spark.sql.caseSensitive", "True") > .getOrCreate() > import spark.implicits._ > val schema = StructType( > Seq(StructField("attr1", StringType, true), > StructField("attr2", ArrayType(StringType, true), true))) > spark.read.schema(schema).json(input).collect().foreach(println) > {code} > The result given by this code is: > {code:java} > [null,null] > [val1,WrappedArray(val2)] > {code} > Instead of putting null in corrupted column, all columns of the first message > are null > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org