Re: [HACKERS] Weighted Stats

Jeff Janes Sat, 19 Mar 2016 01:48:52 -0700

On Fri, Mar 18, 2016 at 11:34 PM, David Fetter <[email protected]> wrote:
> On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote:


>> Also, I think it might not give the correct answer even without
>> negative weights:
>>
>> create table foo as select floor(random()*10000)::int val from
>> generate_series(1,10000000);
>>
>> create table foo2 as select val, count(*) from foo group by val;
>>
>> Shouldn't these then give the same result:
>>
>> select stddev_samp(val) from foo;
>>     stddev_samp
>> -------------------
>>  2887.054977297105
>>
>> select weighted_stddev_samp(val,count) from foo2;
>>  weighted_stddev_samp
>> ----------------------
>>      2887.19919651336
>>
>> The 5th digit seems too early to be seeing round-off error.
>
> Please pardon me if I've misunderstood, but you appear to be assuming
> that
>
>     SELECT val, count(*) FROM foo GROUP BY val
>
> will produce precisely identical count(*)s at each row, which it
> overwhelmingly likely won't, producing the difference you see above.

I think the count for each val that gets put in foo2.count should be
the same as the weight of that val as it occurs in foo.  Surely they
shouldn't all have the same weight in foo2, unless they all have the
same number of appearances in foo.  Which, as you say, they are not
likely to.  But still, the foo2.count that they do individually get
should be equal to their weight, shouldn't it?

The other two methods (*avg and *stddev_pop) do give the same answers
using the two different methods (unweighted against foo, weighted
against foo2)

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Weighted Stats

Reply via email to