Re: Write custom JSON from DataFrame in PySpark

Marco Costantini Thu, 04 May 2023 05:10:15 -0700

Hi Enrico,
What a great answer. Thank you. Seems like I need to get comfortable with
the 'struct' and then I will be golden. Thank you again, friend.


Marco.

On Thu, May 4, 2023 at 3:00 AM Enrico Minack <enrico-min...@gmx.de> wrote:

> Hi,
>
> You could rearrange the DataFrame so that writing the DataFrame as-is
> produces your structure:
>
> df = spark.createDataFrame([(1, "a1"), (2, "a2"), (3, "a3")], "id int,
> datA string")
> +---+----+
> | id|datA|
> +---+----+
> |  1|  a1|
> |  2|  a2|
> |  3|  a3|
> +---+----+
>
> df2 = df.select(df.id, struct(df.datA).alias("stuff"))
> root
>   |-- id: integer (nullable = true)
>   |-- stuff: struct (nullable = false)
>   |    |-- datA: string (nullable = true)
> +---+-----+
> | id|stuff|
> +---+-----+
> |  1| {a1}|
> |  2| {a2}|
> |  3| {a3}|
> +---+-----+
>
> df2.write.json("data.json")
> {"id":1,"stuff":{"datA":"a1"}}
> {"id":2,"stuff":{"datA":"a2"}}
> {"id":3,"stuff":{"datA":"a3"}}
>
> Looks pretty much like what you described.
>
> Enrico
>
>
> Am 04.05.23 um 06:37 schrieb Marco Costantini:
> > Hello,
> >
> > Let's say I have a very simple DataFrame, as below.
> >
> > +---+----+
> > | id|datA|
> > +---+----+
> > |  1|  a1|
> > |  2|  a2|
> > |  3|  a3|
> > +---+----+
> >
> > Let's say I have a requirement to write this to a bizarre JSON
> > structure. For example:
> >
> > {
> >   "id": 1,
> >   "stuff": {
> >     "datA": "a1"
> >   }
> > }
> >
> > How can I achieve this with PySpark? I have only seen the following:
> > - writing the DataFrame as-is (doesn't meet requirement)
> > - using a UDF (seems frowned upon)
> >
> > What I have tried is to do this within a `foreach`. I have had some
> > success, but also some problems with other requirements (serializing
> > other things).
> >
> > Any advice? Please and thank you,
> > Marco.
>
>
>

Re: Write custom JSON from DataFrame in PySpark

Reply via email to