Hi, I'm reading in a CSV file, and I would like to write it back as a permanent table, but with partitioning by year, etc. Currently I do this:
from pyspark.sql import HiveContext sqlContext = HiveContext(sc) df = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true').load('/Users/imran/Downloads/intermediate.csv') df.saveAsTable("intermediate") Which works great. I also know I can do this: df.write.partitionBy("year").parquet("path/to/output") But how do I combine the two, to save a permanent table with partitioning, in Parquet format? thanks, imran