Re: How to set nullable field when create DataFrame using case class

Jacek Laskowski Fri, 05 Aug 2016 04:28:11 -0700

Hi,

Seems so. It's equivalent to


Seq(MyProduct(new Timestamp(0), 10)).toDS.printSchema

(and now I'm wondering why I didn't pick this variant)

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Aug 5, 2016 at 11:29 AM, Mich Talebzadeh
<mich.talebza...@gmail.com> wrote:
> Hi Jacek,
>
> Is this line correct?
>
> spark.createDataset(Seq(MyProduct(new Timestamp(0), 10))).printSchema
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 5 August 2016 at 10:21, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> Hi Michael,
>>
>> Since we're at it, could you please point at the code where the
>> optimization happens? I assume you're talking about Catalyst when
>> whole-gening the code for queries. Is this nullability (NULL value)
>> propagation perhaps? I'd appreciate (hoping that would improve my
>> understanding of the low-level bits quite substantially). Thanks!
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Fri, Aug 5, 2016 at 1:24 AM, Michael Armbrust <mich...@databricks.com>
>> wrote:
>> > Nullable is an optimization for Spark SQL.  It is telling spark to not
>> > even
>> > do an if check when accessing that field.
>> >
>> > In this case, your data is nullable, because timestamp is an object in
>> > java
>> > and you could put null there.
>> >
>> > On Thu, Aug 4, 2016 at 2:56 PM, luismattor <luismat...@gmail.com> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> Consider the following case:
>> >>
>> >> import java.sql.Timestamp
>> >> case class MyProduct(t: Timestamp, a: Float)
>> >> val rdd = sc.parallelize(List(MyProduct(new Timestamp(0), 10))).toDF()
>> >> rdd.printSchema()
>> >>
>> >> The output is:
>> >> root
>> >>  |-- t: timestamp (nullable = true)
>> >>  |-- a: float (nullable = false)
>> >>
>> >> How can I set the timestamp column to be NOT nullable?
>> >>
>> >> Regards,
>> >> Luis
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-nullable-field-when-create-DataFrame-using-case-class-tp27479.html
>> >> Sent from the Apache Spark User List mailing list archive at
>> >> Nabble.com.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How to set nullable field when create DataFrame using case class

Reply via email to