>
>> >>>> val df = ..... // some code that creates a DataFrame
>> >>>> df.filter( df("columnname").isNotNull() )
>> >>>>
>> >>>> +-+-++
>> >>>> |x|a| y|
>> >>>&g
; >>>> |2|bob|5|
> >>>> +-+---+-+
> >>>>
> >>>>
> >>>> Unfortunetaly and while this is a true for a nullable column
> (according to
> >>>> df.printSchema), it is not true for a column that is not nullable:
&g
lse)
>>>>
>>>> +-+-++
>>>> |x|a| y|
>>>> +-+-++
>>>> |1|hello|null|
>>>> |2| bob| 5|
>>>> +-+-++
>>>>
>>>> such that the output is not affected by the
t;>> A came uo with this:*
>>>
>>> /**
>>>* Set, if a column is nullable.
>>>* @param df source DataFrame
>>>* @param cn is the column name to change
>>>* @param nullable is the flag to set, such that the column is either
&g
either
>> nullable or not
>>*/
>> def setNullableStateOfColumn( df: DataFrame, cn: String, nullable:
>> Boolean) : DataFrame = {
>>
>> val schema = df.schema
>> val newSchema = StructType(schema.map {
>> cas
Boolean) : DataFrame = {
>
> val schema = df.schema
> val newSchema = StructType(schema.map {
> case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t,
> nullable = nullable, m)
> case y: StructField => y
> })
> df.sqlContext.createDataFrame( df.rdd, newSchema)
&g
comments?*
Cheers and thx in advance,
Martin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html
Sent from the Apache Spark User List mailing