Hi all,
1. *Columns in dataframes can be nullable and not nullable. Having a
nullable column of Doubles, I can use the following Scala code to filter all
"non-null" rows:*
val df = ..... // some code that creates a DataFrame
df.filter( df("columnname").isNotNull() )
+-+-----+----+
|x| a| y|
+-+-----+----+
|1|hello|null|
|2| bob| 5|
+-+-----+----+
root
|-- x: integer (nullable = false)
|-- a: string (nullable = true)
|-- y: integer (nullable = true)
And with the filter expression
+-+---+-+
|x| a|y|
+-+---+-+
|2|bob|5|
+-+---+-+
Unfortunetaly and while this is a true for a nullable column (according to
df.printSchema), it is not true for a column that is not nullable:
+-+-----+----+
|x| a| y|
+-+-----+----+
|1|hello|null|
|2| bob| 5|
+-+-----+----+
root
|-- x: integer (nullable = false)
|-- a: string (nullable = true)
|-- y: integer (nullable = false)
+-+-----+----+
|x| a| y|
+-+-----+----+
|1|hello|null|
|2| bob| 5|
+-+-----+----+
such that the output is not affected by the filter. Is this intended?
2. *What is the cheapest (in sense of performance) to turn a non-nullable
column into a nullable column?
A came uo with this:*
/**
* Set, if a column is nullable.
* @param df source DataFrame
* @param cn is the column name to change
* @param nullable is the flag to set, such that the column is either
nullable or not
*/
def setNullableStateOfColumn( df: DataFrame, cn: String, nullable:
Boolean) : DataFrame = {
val schema = df.schema
val newSchema = StructType(schema.map {
case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t,
nullable = nullable, m)
case y: StructField => y
})
df.sqlContext.createDataFrame( df.rdd, newSchema)
}
Is there a cheaper solution?
3. *Any comments?*
Cheers and thx in advance,
Martin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]