P Kishor wrote:
> Good advice on storing often used values to speed up the computation,
> but, as I noted above, I *am* looking for a Bayesian (weighted?)
> ranking system, not the one I ended up inadvertently implying above.

So, Bayesian rating generally works like this:

W = (R*v + C*m) / (v + m)

where W is the desired weighted rating, R is the item's true rating 
(total number of stars / total number of votes for that item), v is the 
number of votes for the item, C is the average rating of all items (sum 
of all true ratings / number of items), and m is a magic number.

The idea is that, with v << m, W ~ C (the null hypothesis in the absence 
of evidence is that all items are about average); with v >> m, W ~ R 
(when the number of votes is significant, the evidence is "believable", 
so we accept the hypothesis that the true rating is the correct one); 
with v=m, W = (R + C) / 2 (so m is the number of votes that is 
"half-believable", so to speak).

The choice of m is more art than science. Two popular approaches are 1) 
just making m some constant, based on what you think is a sufficient 
number of votes to be trusted; 2) making m be some fraction of the 
average number of votes per item (let's call it A, calculated as total 
number of votes overall / number of items): m = p*A. 
http://www.thebroth.com/blog/118/bayesian-rating suggests m = A (p = 1), 
though a smaller p might add more dynamism to the ratings (when you have 
lots of votes for existing items, new items may take a long time 
accumulating enough votes to break away from the average rating, no 
matter how good or bad they might be).

Note that, to calculate W, you don't need individual votes - just the 
total number of stars (S) and the number of votes (v) for each item (or, 
alternatively, R and v, but R is a bit harder to update when a new vote 
is recorded). So, suppose you have a table votes(itemId, S, v). You can 
calculate weighted ratings like this:

select itemId, (S + C * m) / (v + m) as W
from votes, (select :p * avg(v) as m, avg(S/v) as C from votes);

Now, you seem to want to precalculate C and m (or C and A), perhaps for 
performance reasons. You would need to keep track of some global totals. 
Create a separate table totals(number_of_items, number_of_votes, 
sum_of_ratings), insert a single row with all zeros. Suppose you want to 
register a new vote (:id, :stars). That would go something like this:

insert or replace into votes(itemId, S, v)
values (:id,
    coalesce((select S from votes where itemId = :id), 0) + :stars,
    coalesce((select v from votes where itemId = :id), 0) + 1);

update totals set
    number_of_items = number_of_items + (select v = 1 from votes where 
itemId=:id),
    number_of_votes = number_of_votes + 1,
    sum_of_ratings = sum_of_ratings +
        (select S / v - (case when v = 1 then 0 else (S - :stars) / (v - 
1) end) from votes where itemId = :id);


With this data, weighted ratings can be calculated as follows:

select itemId, (S + C * m) / (v + m) as W
from votes, (select :p * number_of_votes/number_of_items as m, 
sum_of_ratings/number_of_items as C from totals);


None of the above is tested, everything is off the top of my head. Use 
with caution.

Igor Tandetnik 



_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to