Greetings,
In version 1.6.0, is it possible to write a partitioned dataframe into parquet format using a UDF function on the partition column? I'm using pyspark. Let's say I have a dataframe with coumn `date`, of type string or int, which contains values such as `20170825`. Is it possible to define a UDF called `by_month` or `by_year`, which could then be used to write the table as parquet, ideally in this way: *dataframe.write.format("parquet").partitionBy(by_month(dataframe["date"])).save("/some/parquet")* I haven't even tried this so I don't know if it's possible. If so, what are the ways by which this can be done? Ideally, without having to resort to add an additional column like `part_id` to the dataframe with the result of `by_month(date)` and partitioning by that column instead. Thanks in advance. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org