[jira] [Created] (SPARK-29806) Using multiline option for a JSON file which is not multiline results in silent truncation of data.

Dilip Biswal (Jira) Fri, 08 Nov 2019 17:21:08 -0800

Dilip Biswal created SPARK-29806:
------------------------------------

             Summary: Using multiline option for a JSON file which is not 
multiline results in silent truncation of data.
                 Key: SPARK-29806
                 URL: https://issues.apache.org/jira/browse/SPARK-29806
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.4
            Reporter: Dilip Biswal



The content of input Json File.
{code:java}
{"name":"John", "id":"100"}
{"name":"Marry","id":"200"}{code}
The above is valid json file but every record is in single line. But trying to 
read this file
 with a multiline option with FAILFAST mode, results in data truncation without 
any error.
{code:java}
scala> spark.read.option("multiLine", true).option("mode", 
"FAILFAST").format("json").load("/tmp/json").show(false)
+---+----+
|id |name|
+---+----+
|100|John|
+---+----+

scala> spark.read.option("mode", 
"FAILFAST").format("json").load("/tmp/json").show(false)
+---+-----+
|id |name |
+---+-----+
|100|John |
|200|Marry|
+---+-----+{code}

I think Spark should return an error in this case especially in FAILFAST mode. 
This can be a common user error and we should not do silent data truncation.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29806) Using multiline option for a JSON file which is not multiline results in silent truncation of data.

Reply via email to