Re: schema for schema

Davies Liu Thu, 18 Sep 2014 13:15:49 -0700

Thanks for reporting this, it will be fixed by
https://github.com/apache/spark/pull/2448


On Thu, Sep 18, 2014 at 12:32 PM, Michael Armbrust
<mich...@databricks.com> wrote:
> This looks like a bug, we are investigating.
>
> On Thu, Sep 18, 2014 at 8:49 AM, Eric Friedman <eric.d.fried...@gmail.com>
> wrote:
>>
>> I have a SchemaRDD which I've gotten from a parquetFile.
>>
>> Did some transforms on it and now want to save it back out as parquet
>> again.
>>
>> Getting a SchemaRDD proves challenging because some of my fields can be
>> null/None and SQLContext.inferSchema abjects those.
>>
>> So, I decided to use the schema on the original RDD with
>> SQLContext.applySchema.
>>
>> This works, but only if I add a map function to turn my Row objects into a
>> list. (pyspark)
>>
>> applied = sq.applySchema(transformed_rows.map(lambda r: list(r)),
>> original_parquet_file.schema())
>>
>>
>> This seems a bit kludgy.  Is there a better way?  Should there be?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: schema for schema

Reply via email to