On 2018-05-29 19:46:24 +1000, Chris Angelico wrote: > On Tue, May 29, 2018 at 6:15 PM, Peter J. Holzer <[email protected]> wrote: > > So if the text is German it will contain more words with > > umlauts and each byte which is part of a correctly spelled German word > > when interpreted according to ISO-8859-1 increases the probability that > > decoding with ISO-8859-1 will produce the correct result. There remains > > a tiny probability that all those matches are mere coincidence, but I > > wrote "almost always", not "always", so I can live with an error rate of > > 0.000001% (or something like that). > > That's basically what the chardet module does, and its error rate is > far FAR higher than that. If you think it's easy to detect encodings, > I'm sure the chardet maintainers will be happy to accept pull > requests!
We were talking about humans, not programs.
hp
--
_ | Peter J. Holzer | we build much bigger, better disasters now
|_|_) | | because we have much more sophisticated
| | | [email protected] | management tools.
__/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
signature.asc
Description: PGP signature
-- https://mail.python.org/mailman/listinfo/python-list
