Guido van Rossum <guido <at> python.org> writes:
> 
> On Thu, Jan 7, 2010 at 10:12 PM, Tres Seaver <tseaver <at> palladion.com>
wrote:
> > The BOM should not be seekeable if the file is opened with the proposed
> > "guess encoding from BOM" mode:  it isn't properly part of the stream at
> > all in that case.
> 
> This feels about right to me. There are still questions though:
> immediately after opening a file with a BOM, what should .tell()
> return?

tell() in the context of text I/O is specified to return an "opaque cookie". So
whatever value it returns would probably be fine, as long as seeking to that
value leaves the file in an acceptable state.

Rewinding (seeking to 0) in the presence of a BOM is already reasonably
supported by the TextIOWrapper object:

>>> dec = codecs.getincrementaldecoder('utf-16')()
>>> dec.decode(b'\xff\xfea\x00b\x00')
'ab'
>>> dec.decode(b'\xff\xfea\x00b\x00')
'\ufeffab'
>>> 
>>> bio = io.BytesIO(b'\xff\xfea\x00b\x00')
>>> f = io.TextIOWrapper(bio, encoding='utf-16')
>>> f.read()
'ab'
>>> f.seek(0)
0
>>> f.read()
'ab'

There are tests for this in test_io.py (test_encoded_writes, line 1929, and
test_append_bom and test_seek_bom, line 2045).

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to