Thanks for reporting this, it will be fixed by https://github.com/apache/spark/pull/2448
On Thu, Sep 18, 2014 at 12:32 PM, Michael Armbrust <mich...@databricks.com> wrote: > This looks like a bug, we are investigating. > > On Thu, Sep 18, 2014 at 8:49 AM, Eric Friedman <eric.d.fried...@gmail.com> > wrote: >> >> I have a SchemaRDD which I've gotten from a parquetFile. >> >> Did some transforms on it and now want to save it back out as parquet >> again. >> >> Getting a SchemaRDD proves challenging because some of my fields can be >> null/None and SQLContext.inferSchema abjects those. >> >> So, I decided to use the schema on the original RDD with >> SQLContext.applySchema. >> >> This works, but only if I add a map function to turn my Row objects into a >> list. (pyspark) >> >> applied = sq.applySchema(transformed_rows.map(lambda r: list(r)), >> original_parquet_file.schema()) >> >> >> This seems a bit kludgy. Is there a better way? Should there be? > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org