Thank you so much for the quick response and great help. @jeff, I will use the library if the 3.1 release is getting delayed. Thank you so much.
On Fri, Jan 29, 2021 at 1:23 PM Jeff Evans <jeffrey.wayne.ev...@gmail.com> wrote: > If you need to do this in 2.x, this library does the trick: > https://github.com/fqaiser94/mse > > On Fri, Jan 29, 2021 at 12:15 PM Adam Binford <adam...@gmail.com> wrote: > >> I think they're voting on the next release candidate starting sometime >> next week. So hopefully barring any other major hurdles within the next few >> weeks. >> >> On Fri, Jan 29, 2021, 1:01 PM Felix Kizhakkel Jose < >> felixkizhakkelj...@gmail.com> wrote: >> >>> Wow, that's really great to know. Thank you so much Adam. Do you know >>> when the 3.1 release is scheduled? >>> >>> Regards, >>> Felix K Jose >>> >>> On Fri, Jan 29, 2021 at 12:35 PM Adam Binford <adam...@gmail.com> wrote: >>> >>>> As of 3.0, the only way to do it is something that will recreate the >>>> whole struct: >>>> df.withColumn('timingPeriod', >>>> f.struct(f.col('timingPeriod.start').cast('timestamp').alias('start'), >>>> f.col('timingPeriod.end').cast('timestamp').alias('end'))) >>>> >>>> There's a new method coming in 3.1 on the column class called withField >>>> which was designed for this purpose. I backported it to my personal 3.0 >>>> build because of how useful it is. It works something like: >>>> df.withColumn('timingPeriod', f.col('timingPeriod').withField('start', >>>> f.col('timingPeriod.start').cast('timestamp')).withField('end', >>>> f.col('timingPeriod.end'))) >>>> >>>> And it works on multiple levels of nesting which is nice. >>>> >>>> On Fri, Jan 29, 2021 at 11:32 AM Felix Kizhakkel Jose < >>>> felixkizhakkelj...@gmail.com> wrote: >>>> >>>>> Hello All, >>>>> >>>>> I am using pyspark structured streaming and I am getting timestamp >>>>> fields as plain long (milliseconds), so I have to modify these fields into >>>>> a timestamp type >>>>> >>>>> a sample json object object: >>>>> >>>>> { >>>>> "id":{ >>>>> "value": "f40b2e22-4003-4d90-afd3-557bc013b05e", >>>>> "type": "UUID", >>>>> "system": "Test" >>>>> }, >>>>> "status": "Active", >>>>> "timingPeriod": { >>>>> "startDateTime": 1611859271516, >>>>> "endDateTime": null >>>>> }, >>>>> "eventDateTime": 1611859272122, >>>>> "isPrimary": true, >>>>> } >>>>> >>>>> Here I want to convert "eventDateTime" and "startDateTime" and >>>>> "endDateTime" as timestamp types >>>>> >>>>> So I have done following, >>>>> >>>>> def transform_date_col(date_col): >>>>> return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000) >>>>> >>>>> df.withColumn( >>>>> "eventDateTime", >>>>> transform_date_col("eventDateTime").cast("timestamp")).withColumn( >>>>> "timingPeriod.start", >>>>> transform_date_col("timingPeriod.start").cast("timestamp")).withColumn( >>>>> "timingPeriod.end", >>>>> transform_date_col("timingPeriod.end").cast("timestamp")) >>>>> >>>>> the timingPeriod fields are not a struct anymore rather they become >>>>> two different fields with names "timingPeriod.start", "timingPeriod.end". >>>>> >>>>> How can I get them as a struct as before? >>>>> Is there a generic way I can modify a single/multiple properties of >>>>> nested structs? >>>>> >>>>> I have hundreds of entities where the long needs to convert to >>>>> timestamp, so a generic implementation will help my data ingestion >>>>> pipeline >>>>> a lot. >>>>> >>>>> Regards, >>>>> Felix K Jose >>>>> >>>>> >>>> >>>> -- >>>> Adam Binford >>>> >>>