James G. sack (jim) added the comment: Feature Request REVISION ======================== Upon reflection and more playing around with some test cases, I wish to revise my feature request.
I think the utf8 codecs should accept input with or without the "sig". On output, only the utf_8_sig should write the 3-byte "sig". This behavior change would not seem disruptive to current applications. For utf16, (arguably) a missing BOM should merely assume machian endianess. For utf_16_le, utf_16_be input, both should accept & discard a BOM. On output, I'm not sure; maybe all should write a BOM unless passed a flag signifying no bom? Or to preserve backward compat, could have a parm write_bom defaulting to True for utf16 and False for utf_16_le and utf_16_be. This is a modification of the originial request (for a force_bom flag). Unless I have confused myself with my test cases, the current codecs are slightly inconsistent for the utf8 codecs: utf8 treats "sig" as real data, if present, but.. utf_8_sig works right even without the "sig" (so this one I like as is!) The 16'ers seem to match the (inferred) specs, but for completeness here: utf_16 refuses to proceed w/o BOM (even with correct endian input data) utf_16_le treats BOM as data utf_16_be treats BOM as data Regards, ..jim __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1328> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com