Thanks! On Thu, Sep 18, 2014 at 1:14 PM, Davies Liu <dav...@databricks.com> wrote:
> Thanks for reporting this, it will be fixed by > https://github.com/apache/spark/pull/2448 > > On Thu, Sep 18, 2014 at 12:32 PM, Michael Armbrust > <mich...@databricks.com> wrote: > > This looks like a bug, we are investigating. > > > > On Thu, Sep 18, 2014 at 8:49 AM, Eric Friedman < > eric.d.fried...@gmail.com> > > wrote: > >> > >> I have a SchemaRDD which I've gotten from a parquetFile. > >> > >> Did some transforms on it and now want to save it back out as parquet > >> again. > >> > >> Getting a SchemaRDD proves challenging because some of my fields can be > >> null/None and SQLContext.inferSchema abjects those. > >> > >> So, I decided to use the schema on the original RDD with > >> SQLContext.applySchema. > >> > >> This works, but only if I add a map function to turn my Row objects > into a > >> list. (pyspark) > >> > >> applied = sq.applySchema(transformed_rows.map(lambda r: list(r)), > >> original_parquet_file.schema()) > >> > >> > >> This seems a bit kludgy. Is there a better way? Should there be? > > > > >