[Bug 894468] Re: Statistics algorithm for sorting ratings looks fishy

Scott Ritchie Fri, 16 Dec 2011 14:56:00 -0800

I had thought I proposed a dampening algorithm above, but maybe it was
on the Cross Validated post.  Anyway, to be clear:


1) Compute the median and count 3 numbers: ratings less than the median (x), 
equal to the median (y), and above the median (z)
2) For large sample sizes, your score approaches the median plus the 
probability a voter rates above the median minus the probability a voter rates 
below it.  For large sample sizes, this approaches median + (z/(x+y+z)) - 
(x/(x+y+z)), which equals median + (z-x)/(x+y+z).
3) Since we're talking about estimates of probabilities, rather than using a 
proportion as an estimator we use the bounds of the wilson score to estimate 
it.  If we use the lower bound of the probability range on the positive end and 
the upper bound of the probability range on the negative end, we'd have 
something analogous to what we do now: new apps with small sample sizes are 
punished slightly, but as votes accrue they approach what's listed in 2.

Note that the probabilities in question cannot exceed 0.5 (even given a
situation where very few people are rating at the actual median), which
means we're neither adding nor subtracting more than half away from the
median.  This makes the sample median the dominant sorting method, but
nevertheless the other ratings are important.

Some quick examples this would produce:
 - If an app received 70% median votes of 3, 10% below median, and 20% above 
median, it would rate about 3.1.
 - If an app had just 10 votes in the same proportion as above, it would 
receive a bit lower rating since the error bars on the wilson score would be 
higher.
 - If an app received some votes of 4 and about equal numbers of 5 and 3, it 
would be rated about 4.0, since the positive and negative portions would cancel 
eachother out.
 - If an app received almost entirely votes of 4, it would also receive about a 
4.0 rating.
 - If an app received a lot but approximately equal number of 3's and 4's, it 
would be rated about 3.5 -- either because its median was 3 and there was about 
a .5 chance someone would rate higher, or because it's median was 4 and there 
was about a .5 chance someone would rate lower.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/894468

Title:
  Statistics algorithm for sorting ratings looks fishy

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/software-center/+bug/894468/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 894468] Re: Statistics algorithm for sorting ratings looks fishy

Reply via email to