Some basic helper functions to deal with encodings of files retrieved via HTTP.
Download from http://cthedot.de/encutils/ Currently contained functions: encodingByMediaType(media_type, log=None) Returns for the given Content-Type a default encoding if available, e.g. 'utf-8' for 'application/xml'. getHTTPInfo(httpheaders, log=None) Finds content-type and encoding information from HTTP header dictionary. Returns (Content-Type, encoding) tuple which may be both None. Default encodings of specific Content-Types is used (see encodingByContentType). getMetaInfo(text, log=None) Returns (Content-Type, encoding) tuple from (last) X/HTML meta element. guessEncoding(httpheaders, text, log=None) Tries to find encoding of given text and uses information in httpheaders and textcontent like HTML meta elements or the XML declaration (this is not implemented yet). Returns the explicit or implicit encoding or None. Mismatch reports are written to the log. If there is a similar thing out please let me know (I know the Cookbook XML autodetection script which I like to intregrate). And I would very much appreciate any feedback about spec compliance, errors or other problems with the functions too. (See http://cthedot.de/contact/ or http://cthedot.de/blog/). Thanks a lot! chris -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html