I have a SchemaRDD which I've gotten from a parquetFile. Did some transforms on it and now want to save it back out as parquet again.
Getting a SchemaRDD proves challenging because some of my fields can be null/None and SQLContext.inferSchema abjects those. So, I decided to use the schema on the original RDD with SQLContext.applySchema. This works, but only if I add a map function to turn my Row objects into a list. (pyspark) applied = sq.applySchema(transformed_rows.map(lambda r: list(r)), original_parquet_file.schema()) This seems a bit kludgy. Is there a better way? Should there be?