The N is much bigger than 1 in my case. Here is an example describes my issue. "select column1, stddev_samp(column2) from table1 group by column1" gives NaN "select column1, cast(stddev_samp(column2) as decimal(16,3)) from table1 group by column1" gives numeric values. e.g. 234.234 "select column1, stddev_pop(column2) from table1 group by column1" gives numeric values. e.g. 123.123123123
The column1, column2, and table1 are same. My guess is that the stddev_samp function returns double type that does not exactly match standard floating point semantics in some case. That's why spark gives NaN. It seems stddev_samp does not handle NaN well. Not like stddev_pop. On Thu, Jul 7, 2016 at 5:57 PM, Sean Owen <so...@cloudera.com> wrote: > Sample standard deviation can't be defined in the case of N=1, because > it has N-1 in the denominator. My guess is that this is the case > you're seeing. A population of N=1 still has a standard deviation of > course (which is 0). > > On Thu, Jul 7, 2016 at 9:51 AM, Mungeol Heo <mungeol....@gmail.com> wrote: >> I know stddev_samp and stddev_pop gives different values, because they >> have different definition. What I want to know is why stddev_samp >> gives "NaN", and not a numeric value. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org