-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 M.-A. Lemburg wrote:
> Shouldn't this encoding guessing be a separate function that you call > on either a file or a seekable stream ? > > After all, detecting encodings is just as useful to have for non-file > streams. Other stream sources typically have out-of-band ways to signal the encoding: only when reading from the filesystem do we pretty much *have* to guess, and in that case the BOM / signature is the best heuristic we have. Also, some non-file streams are not seekable, and so can't be guessed via a pre-pass. > You'd then avoid having to stuff everything into > a single function call and also open up the door for more complex > application specific guess work or defaults. > > The whole process would then have two steps: > > 1. guess encoding > > import codecs > encoding = codecs.guess_file_encoding(filename) Filename is not enough information: or do you mean that API to actually open the stream? > 2. open the file with the found encoding > > f = open(filename, encoding=encoding) > > For seekable streams f, you'd have: > > 1. guess encoding > > import codecs > encoding = codecs.guess_stream_encoding(f) > > 2. wrap the stream with a reader for the found encoding > > reader_class = codecs.getreader(encoding) > g = reader_class(f) > Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktHoU4ACgkQ+gerLs4ltQ5o3QCeLOJ7J91E+5f66vhgu1BUhYh4 9UgAnR2IeCd0BCsPez8ZilGNHJfhRn3Y =SoPb -----END PGP SIGNATURE----- _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com