Re: schema for schema

Michael Armbrust Thu, 18 Sep 2014 12:33:46 -0700

This looks like a bug, we are investigating.

On Thu, Sep 18, 2014 at 8:49 AM, Eric Friedman <eric.d.fried...@gmail.com>
wrote:


> I have a SchemaRDD which I've gotten from a parquetFile.
>
> Did some transforms on it and now want to save it back out as parquet
> again.
>
> Getting a SchemaRDD proves challenging because some of my fields can be
> null/None and SQLContext.inferSchema abjects those.
>
> So, I decided to use the schema on the original RDD with
> SQLContext.applySchema.
>
> This works, but only if I add a map function to turn my Row objects into a
> list. (pyspark)
>
> applied = sq.applySchema(transformed_rows.map(lambda r: list(r)),
> original_parquet_file.schema())
>
>
> This seems a bit kludgy.  Is there a better way?  Should there be?
>

Re: schema for schema

Reply via email to