I spent a few minutes poking around in the source code and found this:

The data type representing None, used for the types that cannot be inferred.

https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113

Playing around a bit, this is the only use case that I could immediately
come up with; you have some type of a placeholder field already in data,
but its always null. If you let createDataFrame (and I bet other things
like DataFrameReader would behave similarly) try to infer it directly, it
will error out since it can't infer the schema automatically. Doing
something like below will allow the data to be used. And, if memory serves,
Hive has a concept of a Null data type also for these types of situations.

In [9]: df = spark.createDataFrame([Row(id=1, val=None), Row(id=2,
val=None)], schema=StructType([StructField('id', LongType()),
StructField('val', NullType())]))

In [10]: df.show()
+---+----+
| id| val|
+---+----+
|  1|null|
|  2|null|
+---+----+


In [11]: df.printSchema()
root
 |-- id: long (nullable = true)
 |-- val: null (nullable = true)


Nicholas Szandor Hakobian, Ph.D.
Staff Data Scientist
Rally Health
nicholas.hakob...@rallyhealth.com


On Sun, Feb 11, 2018 at 5:40 AM, Jean Georges Perrin <j...@jgp.net> wrote:

> What is the purpose of DataTypes.NullType, specially as you are building a
> schema? Have anyone used it or seen it as spart of a schema auto-generation?
>
>
> (If I keep asking long enough, I may get an answer, no? :) )
>
>
> > On Feb 4, 2018, at 13:15, Jean Georges Perrin <j...@jgp.net> wrote:
> >
> > Any taker on this one? ;)
> >
> >> On Jan 29, 2018, at 16:05, Jean Georges Perrin <j...@jgp.net> wrote:
> >>
> >> Hi Sparkians,
> >>
> >> Can someone tell me what is the purpose of DataTypes.NullType,
> specially as you are building a schema?
> >>
> >> Thanks
> >>
> >> jg
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to