The N is much bigger than 1 in my case.

Here is an example describes my issue.
"select column1, stddev_samp(column2) from table1 group by column1" gives NaN
"select column1, cast(stddev_samp(column2) as decimal(16,3)) from
table1 group by column1" gives numeric values. e.g. 234.234
"select column1, stddev_pop(column2) from table1 group by column1"
gives numeric values. e.g. 123.123123123

The column1, column2, and table1 are same.
My guess is that the stddev_samp function returns double type that
does not exactly match standard floating point semantics in some case.
That's why spark gives NaN.
It seems stddev_samp does not handle NaN well. Not like stddev_pop.

On Thu, Jul 7, 2016 at 5:57 PM, Sean Owen <so...@cloudera.com> wrote:
> Sample standard deviation can't be defined in the case of N=1, because
> it has N-1 in the denominator. My guess is that this is the case
> you're seeing. A population of N=1 still has a standard deviation of
> course (which is 0).
>
> On Thu, Jul 7, 2016 at 9:51 AM, Mungeol Heo <mungeol....@gmail.com> wrote:
>> I know stddev_samp and stddev_pop gives different values, because they
>> have different definition. What I want to know is why stddev_samp
>> gives "NaN", and not a numeric value.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to