I think you could have a Python UDF to turn the properties into JSON string:
import simplejson
def to_json(row):
return simplejson.dumps(row.asDict(recursive=Trye))
to_json_udf = pyspark.sql.funcitons.udf(to_json)
df.select("col_1", "col_2",
to_json_udf(df.properties)).write.format("com.dat
I am generating a set of tables in pyspark SQL from a JSON source dataset. I am
writing those tables to disk as CSVs using
df.write.format(com.databricks.spark.csv).save(…). I have a schema like:
root
|-- col_1: string (nullable = true)
|-- col_2: string (nullable = true)
|-- col_3: timestamp