https://bugzilla.wikimedia.org/show_bug.cgi?id=16583


Ilmari Karonen <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




--- Comment #15 from Ilmari Karonen <[email protected]>  2009-11-27 13:19:18 
UTC ---
(In reply to comment #14)
> (In reply to comment #3)
> > It appears that the check, as currently coded, will have 
> > a false positive rate of slightly over 1 in 4096 files, 
> > assuming a random distribution of octets
> 
> 16 bits, not 12, and you have to multiply by 1024, which gives us a false
> positive rate for random files on the order of 2^-6 ~= 1.7 %.

The check which Simetrical removed in r58682 matched if the first 1024 bytes of
the file contained "<?" followed by one of four possible bytes (' ', '\n', '\t'
or '=').  Thus, the probability of three random bytes matching this check is
4/(2^8)^3 = 1/2^22, and the probability of 1024 random bytes matching it is
approximately 1024/2^22 = 1/2^12 = 1/4096.

(Taking into account the possibility of multiple matches and the fact that the
last 2 out of 1024 positions can't match makes the probability about 1/4104.5. 
Most of the difference is due to the latter, since multiple matches are very
unlikely events, occurring only for about one in every 2^24 files.)

Anyway, marking the bug as fixed: r58682 should reduce the false positive rate
enough that what's left (like removing the check entirely?) is mainly just code
cleanup.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to