Hi, You could rearrange the DataFrame so that writing the DataFrame as-is produces your structure:
df = spark.createDataFrame([(1, "a1"), (2, "a2"), (3, "a3")], "id int, datA string") +---+----+ | id|datA| +---+----+ | 1| a1| | 2| a2| | 3| a3| +---+----+ df2 = df.select(df.id, struct(df.datA).alias("stuff")) root |-- id: integer (nullable = true) |-- stuff: struct (nullable = false) | |-- datA: string (nullable = true) +---+-----+ | id|stuff| +---+-----+ | 1| {a1}| | 2| {a2}| | 3| {a3}| +---+-----+ df2.write.json("data.json") {"id":1,"stuff":{"datA":"a1"}} {"id":2,"stuff":{"datA":"a2"}} {"id":3,"stuff":{"datA":"a3"}} Looks pretty much like what you described. Enrico Am 04.05.23 um 06:37 schrieb Marco Costantini:
Hello, Let's say I have a very simple DataFrame, as below. +---+----+ | id|datA| +---+----+ | 1| a1| | 2| a2| | 3| a3| +---+----+ Let's say I have a requirement to write this to a bizarre JSON structure. For example: { "id": 1, "stuff": { "datA": "a1" } } How can I achieve this with PySpark? I have only seen the following: - writing the DataFrame as-is (doesn't meet requirement) - using a UDF (seems frowned upon) What I have tried is to do this within a `foreach`. I have had some success, but also some problems with other requirements (serializing other things). Any advice? Please and thank you, Marco.
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org