>> Can you please explain why that is? Web programs should not normally >> have the need to detect the encoding; instead, it should be specified >> always - unless you are talking about browsers specifically, which >> need to support web pages that specify the encoding incorrectly. > > Any program that needs to examine the contents of > documents/feeds/whatever on the web needs to deal with > incorrectly-specified encodings
That's not true. Most programs that need to examine the contents of a web page don't need to guess the encoding. In most such programs, the encoding can be hard-coded if the declared encoding is not correct. Most such programs *know* what page they are webscraping, or else they couldn't extract the information out of it that they want to get at. As for feeds - can you give examples of incorrectly encoded one (I don't ever use feeds, so I honestly don't know whether they are typically encoded incorrectly. I've heard they are often XML, in which case I strongly doubt they are incorrectly encoded) As for "whatever" - can you give specific examples? > (which, sadly, is rather common). The > set of programs of programs that need this functionality is probably the > same set that needs BeautifulSoup--I think that set is larger than just > browsers <grin> Again, can you give *specific* examples that are not web browsers? Programs needing BeautifulSoup may still not need encoding guessing, since they still might be able to hard-code the encoding of the web page they want to process. In any case, I'm very skeptical that a general "guess encoding" module would do a meaningful thing when applied to incorrectly encoded HTML pages. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com