Dilip Biswal created SPARK-29806: ------------------------------------ Summary: Using multiline option for a JSON file which is not multiline results in silent truncation of data. Key: SPARK-29806 URL: https://issues.apache.org/jira/browse/SPARK-29806 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.4 Reporter: Dilip Biswal
The content of input Json File. {code:java} {"name":"John", "id":"100"} {"name":"Marry","id":"200"}{code} The above is valid json file but every record is in single line. But trying to read this file with a multiline option with FAILFAST mode, results in data truncation without any error. {code:java} scala> spark.read.option("multiLine", true).option("mode", "FAILFAST").format("json").load("/tmp/json").show(false) +---+----+ |id |name| +---+----+ |100|John| +---+----+ scala> spark.read.option("mode", "FAILFAST").format("json").load("/tmp/json").show(false) +---+-----+ |id |name | +---+-----+ |100|John | |200|Marry| +---+-----+{code} I think Spark should return an error in this case especially in FAILFAST mode. This can be a common user error and we should not do silent data truncation. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org