schema for schema

Eric Friedman Thu, 18 Sep 2014 08:50:13 -0700

I have a SchemaRDD which I've gotten from a parquetFile.

Did some transforms on it and now want to save it back out as parquet again.


Getting a SchemaRDD proves challenging because some of my fields can be
null/None and SQLContext.inferSchema abjects those.

So, I decided to use the schema on the original RDD with
SQLContext.applySchema.

This works, but only if I add a map function to turn my Row objects into a
list. (pyspark)

applied = sq.applySchema(transformed_rows.map(lambda r: list(r)),
original_parquet_file.schema())


This seems a bit kludgy.  Is there a better way?  Should there be?

schema for schema

Reply via email to