Why do you want to use UDF? Regards, Gourav
On Sat, Nov 30, 2019 at 3:06 AM anbutech <anbutec...@outlook.com> wrote: > Hi, > > I have a raw source data frame having 2 columns as below > > timestamp > 2019-11-29 9:30:45 > > message_log > > <123>NOV 29 10:20:35 ips01 sfids: connection: > tcp,bytes:104,user:unknown,url:unknown,host:127.0.0.1 > > how do we break above each key value as separate columns using udf in > pyspark? > > what is the right approach for flattening this type of log data - regex or > python logic? > > Could you please help me the logic to bring flattening the log data? > > Final output dataframe having the below each columns: > > timestamp > 2019-11-29 9:30:45 > > prio > 123 > > msg_ts > NOV 29 10:20:35 > > msg_ids > ips01 > > sfids > > connection > tcp > > bytes > 104 > > user > unknown > > url > unknown > > host > 127.0.0.1 > > > Thanks > Anbu > > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >