[ https://issues.apache.org/jira/browse/SPARK-31772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-31772. ---------------------------------- Resolution: Not A Problem It works by design. The last JSON does not match with the type and it fails to parse. You can use string types and manually cast later. > Json schema reading is not consistent between int and string types > ------------------------------------------------------------------ > > Key: SPARK-31772 > URL: https://issues.apache.org/jira/browse/SPARK-31772 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.4 > Reporter: yaniv oren > Priority: Major > > When reading json file using a schema, int value is converted to string if > field is string but string field is not converted to int value if field is > int. > Sample Code: > read_schema = StructType([StructField({color:#008080}"a"{color}, > IntegerType()), > StructField({color:#008080}"b"{color}, StringType())]) > df = > {color:#94558d}self{color}.spark_session.read.schema(read_schema).json({color:#008080}"input/json/temp_test"{color}) > df.show() > > json temp_test > {"a": 1,"b": "b1"} > {"a": 2,"b": "b2"} > {"a": 3,"b": 3} > {"a": "4","b": 4} > > actual: > | a| b| > +----+----+ > | 1| b1| > | 2| b2| > | 3| 3| > |null|null| > +----+----+ > > expected: > Third line will be nulled as the fourth line as b is int while in schema it's > string. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org