Re: [SparkSQL] How to calculate stddev on a DataFrame?
Perhaps this email reference may be able to help from a DataFrame perspective: http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201503.mbox/%3CCALte62ztepahF=5hk9rcfbnyk4z43wkcq4fkdcbwmgf_3_o...@mail.gmail.com%3E On Wed, Mar 25, 2015 at 7:29 PM Haopu Wang wrote: > Hi, > > > > I have a DataFrame object and I want to do types of aggregations like > count, sum, variance, stddev, etc. > > > > DataFrame has DSL to do simple aggregations like count and sum. > > > > How about variance and stddev? > > > > Thank you for any suggestions! > > >
Re: [SparkSQL] How to calculate stddev on a DataFrame?
I would do sum square. This would allow you to keep an ongoing value as an associative operation (in an aggregator) and then calculate the variance & std deviation after the fact. On Wed, Mar 25, 2015 at 10:28 PM, Haopu Wang wrote: > Hi, > > > > I have a DataFrame object and I want to do types of aggregations like > count, sum, variance, stddev, etc. > > > > DataFrame has DSL to do simple aggregations like count and sum. > > > > How about variance and stddev? > > > > Thank you for any suggestions! > > >
[SparkSQL] How to calculate stddev on a DataFrame?
Hi, I have a DataFrame object and I want to do types of aggregations like count, sum, variance, stddev, etc. DataFrame has DSL to do simple aggregations like count and sum. How about variance and stddev? Thank you for any suggestions!