Re: [Python-Dev] Encoding detection in the standard library?

Martin v. Löwis Tue, 22 Apr 2008 14:26:25 -0700

>> Can you please explain why that is? Web programs should not normally
>> have the need to detect the encoding; instead, it should be specified
>> always - unless you are talking about browsers specifically, which
>> need to support web pages that specify the encoding incorrectly.
> 
> Any program that needs to examine the contents of
> documents/feeds/whatever on the web needs to deal with
> incorrectly-specified encodings


That's not true. Most programs that need to examine the contents of
a web page don't need to guess the encoding. In most such programs,
the encoding can be hard-coded if the declared encoding is not
correct. Most such programs *know* what page they are webscraping,
or else they couldn't extract the information out of it that they
want to get at.

As for feeds - can you give examples of incorrectly encoded one
(I don't ever use feeds, so I honestly don't know whether they
are typically encoded incorrectly. I've heard they are often XML,
in which case I strongly doubt they are incorrectly encoded)

As for "whatever" - can you give specific examples?

> (which, sadly, is rather common). The
> set of programs of programs that need this functionality is probably the
> same set that needs BeautifulSoup--I think that set is larger than just
> browsers <grin>

Again, can you give *specific* examples that are not web browsers?
Programs needing BeautifulSoup may still not need encoding guessing,
since they still might be able to hard-code the encoding of the web
page they want to process.

In any case, I'm very skeptical that a general "guess encoding"
module would do a meaningful thing when applied to incorrectly
encoded HTML pages.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

Reply via email to