Re: [sql] Dataframe how to check null values

2015-04-20 Thread Ted Yu
I found: https://issues.apache.org/jira/browse/SPARK-6573 > On Apr 20, 2015, at 4:29 AM, Peter Rudenko wrote: > > Sounds very good. Is there a jira for this? Would be cool to have in 1.4, > because currently cannot use dataframe.describe function with NaN values, > need to filter manually al

Re: [sql] Dataframe how to check null values

2015-04-20 Thread Peter Rudenko
Sounds very good. Is there a jira for this? Would be cool to have in 1.4, because currently cannot use dataframe.describe function with NaN values, need to filter manually all the columns. Thanks, Peter Rudenko On 2015-04-02 21:18, Reynold Xin wrote: Incidentally, we were discussing this yeste

Re: [sql] Dataframe how to check null values

2015-04-02 Thread Reynold Xin
Incidentally, we were discussing this yesterday. Here are some thoughts on null handling in SQL/DataFrames. Would be great to get some feedback. 1. Treat floating point NaN and null as the same "null" value. This would be consistent with most SQL databases, and Pandas. This would also require some

Re: [sql] Dataframe how to check null values

2015-04-02 Thread Dean Wampler
I'm afraid you're a little stuck. In Scala, the types Int, Long, Float, Double, Byte, and Boolean look like reference types in source code, but they are compiled to the corresponding JVM primitive types, which can't be null. That's why you get the warning about ==. It might be your best choice is

[sql] Dataframe how to check null values

2015-04-02 Thread Peter Rudenko
Hi i need to implement MeanImputor - impute missing values with mean. If i set missing values to null - then dataframe aggregation works properly, but in UDF it treats null values to 0.0. Here’s example: |val df = sc.parallelize(Array(1.0,2.0, null, 3.0, 5.0, null)).toDF df.agg(avg("_1")).firs