As of 3.0, the only way to do it is something that will recreate the whole struct: df.withColumn('timingPeriod', f.struct(f.col('timingPeriod.start').cast('timestamp').alias('start'), f.col('timingPeriod.end').cast('timestamp').alias('end')))
There's a new method coming in 3.1 on the column class called withField which was designed for this purpose. I backported it to my personal 3.0 build because of how useful it is. It works something like: df.withColumn('timingPeriod', f.col('timingPeriod').withField('start', f.col('timingPeriod.start').cast('timestamp')).withField('end', f.col('timingPeriod.end'))) And it works on multiple levels of nesting which is nice. On Fri, Jan 29, 2021 at 11:32 AM Felix Kizhakkel Jose < felixkizhakkelj...@gmail.com> wrote: > Hello All, > > I am using pyspark structured streaming and I am getting timestamp fields > as plain long (milliseconds), so I have to modify these fields into a > timestamp type > > a sample json object object: > > { > "id":{ > "value": "f40b2e22-4003-4d90-afd3-557bc013b05e", > "type": "UUID", > "system": "Test" > }, > "status": "Active", > "timingPeriod": { > "startDateTime": 1611859271516, > "endDateTime": null > }, > "eventDateTime": 1611859272122, > "isPrimary": true, > } > > Here I want to convert "eventDateTime" and "startDateTime" and > "endDateTime" as timestamp types > > So I have done following, > > def transform_date_col(date_col): > return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000) > > df.withColumn( > "eventDateTime", > transform_date_col("eventDateTime").cast("timestamp")).withColumn( > "timingPeriod.start", > transform_date_col("timingPeriod.start").cast("timestamp")).withColumn( > "timingPeriod.end", > transform_date_col("timingPeriod.end").cast("timestamp")) > > the timingPeriod fields are not a struct anymore rather they become two > different fields with names "timingPeriod.start", "timingPeriod.end". > > How can I get them as a struct as before? > Is there a generic way I can modify a single/multiple properties of nested > structs? > > I have hundreds of entities where the long needs to convert to timestamp, > so a generic implementation will help my data ingestion pipeline a lot. > > Regards, > Felix K Jose > > -- Adam Binford