Mukul Murthy created SPARK-28043:
------------------------------------

             Summary: Reading json with duplicate columns drops the first 
column value
                 Key: SPARK-28043
                 URL: https://issues.apache.org/jira/browse/SPARK-28043
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.0
            Reporter: Mukul Murthy


When reading a JSON blob with duplicate fields, Spark appears to ignore the 
value of the first one. JSON recommends unique names but does not require it; 
since JSON and Spark SQL both allow duplicate field names, we should fix the 
bug where the first column value is getting dropped.

 

Repro (Python, 2.4):

>>> jsonRDD = spark.sparkContext.parallelize(["\{ \"a\": \"blah\", \"a\": 
>>> \"blah2\"}"])
>>> df = spark.read.json(jsonRDD)
>>> df.show()
+----+-----+
| a| a|
+----+-----+
|null|blah2|
+----+-----+

 

The expected response would be:

+----+-----+
| a| a|
+----+-----+
|blah|blah2|
+----+-----+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to