Hello,
Let's say I have a very simple DataFrame, as below.
+---++
| id|datA|
+---++
| 1| a1|
| 2| a2|
| 3| a3|
+---++
Let's say I have a requirement to write this to a bizarre JSON structure.
For example:
{
"id": 1,
"stuff": {
"datA": "a1"
}
}
How can I achieve this with PySpark? I have only seen the following:
- writing the DataFrame as-is (doesn't meet requirement)
- using a UDF (seems frowned upon)
What I have tried is to do this within a `foreach`. I have had some
success, but also some problems with other requirements (serializing other
things).
Any advice? Please and thank you,
Marco.