[SPARK-SQL] Reading JSON column as a DataFrame and keeping partitioning information

Daniel Mateus Pires Fri, 20 Jul 2018 09:56:52 -0700

I've been trying to figure out this one for some time now, I have JSONs 
representing Products coming (physically) partitioned by Brand and I would like 
to create a DataFrame from the JSON but also keep the partitioning information 
(Brand)


```
case class Product(brand: String, value: String)
val df = spark.createDataFrame(Seq(Product("something", """{"a": "b", "c": 
"d"}""")))
df.write.partitionBy("brand").mode("overwrite").json("/tmp/products5/")
val df2 = spark.read.json("/tmp/products5/")

df2.show
/*
+--------------------+------+
|               value|brand|
+--------------------+------+
|{"a": "b", "c": "d"}|  something|
+--------------------+------+
*/


// This is simple and effective but it gets rid of the brand!
spark.read.json(df2.select("value").as[String]).show
/*
+---+---+
|  a|  c|
+---+---+
|  b|  d|
+---+---+
*/
```

Ideally I'd like something similar to spark.read.json that would keep the 
partitioning values and merge it with the rest of the DataFrame

End result I would like:
```
/*
+---+---+---+
|  a|  c| brand|
+---+---+---+
|  b|  d| something|
+---+---+---+
*/
```

Best regards,
Daniel Mateus Pires
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

[SPARK-SQL] Reading JSON column as a DataFrame and keeping partitioning information

Reply via email to