Re: schema for schema

Eric Friedman Thu, 18 Sep 2014 17:07:33 -0700

Thanks!

On Thu, Sep 18, 2014 at 1:14 PM, Davies Liu <dav...@databricks.com> wrote:


> Thanks for reporting this, it will be fixed by
> https://github.com/apache/spark/pull/2448
>
> On Thu, Sep 18, 2014 at 12:32 PM, Michael Armbrust
> <mich...@databricks.com> wrote:
> > This looks like a bug, we are investigating.
> >
> > On Thu, Sep 18, 2014 at 8:49 AM, Eric Friedman <
> eric.d.fried...@gmail.com>
> > wrote:
> >>
> >> I have a SchemaRDD which I've gotten from a parquetFile.
> >>
> >> Did some transforms on it and now want to save it back out as parquet
> >> again.
> >>
> >> Getting a SchemaRDD proves challenging because some of my fields can be
> >> null/None and SQLContext.inferSchema abjects those.
> >>
> >> So, I decided to use the schema on the original RDD with
> >> SQLContext.applySchema.
> >>
> >> This works, but only if I add a map function to turn my Row objects
> into a
> >> list. (pyspark)
> >>
> >> applied = sq.applySchema(transformed_rows.map(lambda r: list(r)),
> >> original_parquet_file.schema())
> >>
> >>
> >> This seems a bit kludgy.  Is there a better way?  Should there be?
> >
> >
>

Re: schema for schema

Reply via email to