https://bugzilla.wikimedia.org/show_bug.cgi?id=16583
Ilmari Karonen <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #15 from Ilmari Karonen <[email protected]> 2009-11-27 13:19:18 UTC --- (In reply to comment #14) > (In reply to comment #3) > > It appears that the check, as currently coded, will have > > a false positive rate of slightly over 1 in 4096 files, > > assuming a random distribution of octets > > 16 bits, not 12, and you have to multiply by 1024, which gives us a false > positive rate for random files on the order of 2^-6 ~= 1.7 %. The check which Simetrical removed in r58682 matched if the first 1024 bytes of the file contained "<?" followed by one of four possible bytes (' ', '\n', '\t' or '='). Thus, the probability of three random bytes matching this check is 4/(2^8)^3 = 1/2^22, and the probability of 1024 random bytes matching it is approximately 1024/2^22 = 1/2^12 = 1/4096. (Taking into account the possibility of multiple matches and the fact that the last 2 out of 1024 positions can't match makes the probability about 1/4104.5. Most of the difference is due to the latter, since multiple matches are very unlikely events, occurring only for about one in every 2^24 files.) Anyway, marking the bug as fixed: r58682 should reduce the false positive rate enough that what's left (like removing the check entirely?) is mainly just code cleanup. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
