I've been trying to figure out this one for some time now, I have JSONs
representing Products coming (physically) partitioned by Brand and I would like
to create a DataFrame from the JSON but also keep the partitioning information
(Brand)
```
case class Product(brand: String, value: String)
val df = spark.createDataFrame(Seq(Product("something", """{"a": "b", "c":
"d"}""")))
df.write.partitionBy("brand").mode("overwrite").json("/tmp/products5/")
val df2 = spark.read.json("/tmp/products5/")
df2.show
/*
+--------------------+------+
| value|brand|
+--------------------+------+
|{"a": "b", "c": "d"}| something|
+--------------------+------+
*/
// This is simple and effective but it gets rid of the brand!
spark.read.json(df2.select("value").as[String]).show
/*
+---+---+
| a| c|
+---+---+
| b| d|
+---+---+
*/
```
Ideally I'd like something similar to spark.read.json that would keep the
partitioning values and merge it with the rest of the DataFrame
End result I would like:
```
/*
+---+---+---+
| a| c| brand|
+---+---+---+
| b| d| something|
+---+---+---+
*/
```
Best regards,
Daniel Mateus Pires
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]