Re: [Python-3000] BOM handling

Josiah Carlson Thu, 14 Sep 2006 09:25:36 -0700

Blake Winton <[EMAIL PROTECTED]> wrote:
[snip]
> Um, what more data do we need for this use-case?  I'm not going to 
> suggest an API, other than it would be nice if I didn't have to manually 
> figure out/hard code all the encodings.  (It's my belief that I will 
> currently have to do that, or at least special-case XML, to read the 
> encoding attribute.)  Oh, and it would be particularly horrible if I 
> output a shell script in UTF-8, and it included the BOM, since I believe 
> that would break the "magic number" of "#!".


Use the XML tag/attribute "<?xml ... encoding="..." ?> to discover the
encoding and assume utf-8 otherwise as per spec:
http://www.w3.org/TR/2000/REC-xml-20001006#NT-EncodingDecl

Does bash natively support utf-8?  Is there a bash equivalent to Python
coding: directives?  You may be attempting to fix a problem that doesn't
exist.


> Yeah, see, at a business level, I really need to process those all in 
> the same way, and it would be annoying to have to write code to handle 
> them all differently.

So you, or anyone else, can write a module for discovering the encoding
used for a particular file based on XML tags, Python coding: directives,
etc. It could include an extensible registry, and if it is used enough,
could be included in the Python standard library.


 - Josiah

_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] BOM handling

Reply via email to