On Fri, Aug 23, 2013 at 9:24 AM, Lord_Farin <[email protected]>wrote:
> The probability of displaying a "bad" page would be: > > B q ((p B)^N - 1) / (p B - 1) + B (p B)^N > > (modulo errors), where B is the fraction of bad pages, p is the > probability of repeating, q is the probability of displaying (so p+q = > 1), and N is the allowed number of repetitions. > I'm going to rewrite that as: B (1-p) ((p B)^N - 1) / (p B - 1) + B (p B)^N ...and I'm also going to take your word on the math, because my brain is lazy this morning. Let's run the numbers, assuming the 500,000 articles Swedish wiki had in Sept 2012 were all good, and the million articles added since are all bad. Thus B = 2/3. Let's start with N at 5, so worse case we're going to be doing 5x as many SQL queries. p is the tunable parameter. So if: p = 0 prob of getting a bad page = 67% (sanity check, this is what they've got now) p = 0.5 prob of getting a bad page = 50% p = 0.75 prob of getting a bad page = 34% p = 0.80 prob of getting a bad page = 30% p = 0.90 prob of getting a bad page = 20% p = 0.95 prob of getting a bad page = 15% p = 1.00 prob of getting a bad page = 9% (this is set by N) If you let N go up to 10, then: p = 0.90 prob of getting a bad page = 17% p = 0.95 prob of getting a bad page = 10% p = 1.00 prob of getting a bad page = 1% My expectation that about a 10% chance of getting a 'bad page' would make Swedish wikipedians happy, so I'd recommend p=1 N=5. But the knobs can be twiddled. --scott -- (http://cscott.net) _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
