This looks like a bug, we are investigating. On Thu, Sep 18, 2014 at 8:49 AM, Eric Friedman <eric.d.fried...@gmail.com> wrote:
> I have a SchemaRDD which I've gotten from a parquetFile. > > Did some transforms on it and now want to save it back out as parquet > again. > > Getting a SchemaRDD proves challenging because some of my fields can be > null/None and SQLContext.inferSchema abjects those. > > So, I decided to use the schema on the original RDD with > SQLContext.applySchema. > > This works, but only if I add a map function to turn my Row objects into a > list. (pyspark) > > applied = sq.applySchema(transformed_rows.map(lambda r: list(r)), > original_parquet_file.schema()) > > > This seems a bit kludgy. Is there a better way? Should there be? >