As of 3.0, the only way to do it is something that will recreate the whole
struct:
df.withColumn('timingPeriod',
f.struct(f.col('timingPeriod.start').cast('timestamp').alias('start'),
f.col('timingPeriod.end').cast('timestamp').alias('end')))

There's a new method coming in 3.1 on the column class called withField
which was designed for this purpose. I backported it to my personal 3.0
build because of how useful it is. It works something like:
df.withColumn('timingPeriod', f.col('timingPeriod').withField('start',
f.col('timingPeriod.start').cast('timestamp')).withField('end',
f.col('timingPeriod.end')))

And it works on multiple levels of nesting which is nice.

On Fri, Jan 29, 2021 at 11:32 AM Felix Kizhakkel Jose <
felixkizhakkelj...@gmail.com> wrote:

> Hello All,
>
> I am using pyspark structured streaming and I am getting timestamp fields
> as plain long (milliseconds), so I have to modify these fields into a
> timestamp type
>
> a sample json object object:
>
> {
>   "id":{
>       "value": "f40b2e22-4003-4d90-afd3-557bc013b05e",
>       "type": "UUID",
>       "system": "Test"
>     },
>   "status": "Active",
>   "timingPeriod": {
>     "startDateTime": 1611859271516,
>     "endDateTime": null
>   },
>   "eventDateTime": 1611859272122,
>   "isPrimary": true,
> }
>
>   Here I want to convert "eventDateTime" and "startDateTime" and
> "endDateTime" as timestamp types
>
> So I have done following,
>
> def transform_date_col(date_col):
>     return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000)
>
> df.withColumn(
>     "eventDateTime", 
> transform_date_col("eventDateTime").cast("timestamp")).withColumn(
>     "timingPeriod.start", 
> transform_date_col("timingPeriod.start").cast("timestamp")).withColumn(
>     "timingPeriod.end", 
> transform_date_col("timingPeriod.end").cast("timestamp"))
>
> the timingPeriod fields are not a struct anymore rather they become two
> different fields with names "timingPeriod.start", "timingPeriod.end".
>
> How can I get them as a struct as before?
> Is there a generic way I can modify a single/multiple properties of nested
> structs?
>
> I have hundreds of entities where the long needs to convert to timestamp,
> so a generic implementation will help my data ingestion pipeline a lot.
>
> Regards,
> Felix K Jose
>
>

-- 
Adam Binford

Reply via email to