Re: Spark SQL: Preserving Dataframe Schema

Michael Armbrust Tue, 20 Oct 2015 10:21:53 -0700

For compatibility reasons, we always write data out as nullable in
parquet.  Given that that bit is only an optimization that we don't
actually make much use of, I'm curious why you are worried that its
changing to true?


On Tue, Oct 20, 2015 at 8:24 AM, Jerry Lam <chiling...@gmail.com> wrote:

> Hi Spark users and developers,
>
> I have a dataframe with the following schema (Spark 1.5.1):
>
> StructType(StructField(type,StringType,true),
> StructField(timestamp,LongType,false))
>
> After I save the dataframe in parquet and read it back, I get the
> following schema:
>
> StructType(StructField(timestamp,LongType,true),
> StructField(type,StringType,true))
>
> As you can see the schema does not match. The nullable field is set to
> true for timestamp upon reading the dataframe back. Is there a way to
> preserve the schema so that what we write to will be what we read back?
>
> Best Regards,
>
> Jerry
>

Re: Spark SQL: Preserving Dataframe Schema

Reply via email to