Martin Panter <vadmium...@gmail.com> added the comment:

I suspect this is caused by TextIOWrapper guessing if it is writing the start 
of a file versus in the middle, and being confused by “seekable” returning 
False. GzipFile implements some “seek” calls in write mode, but LZMAFile and 
BZ2File do not.

Using this test class:

class Writer(BufferedIOBase):
    def writable(self):
        return True
    def __init__(self, offset):
        self.offset = offset
    def seekable(self):
        result = self.offset is not None
        print('seekable ->', result)
        return result
    def tell(self):
        print('tell ->', self.offset)
        return self.offset
    def write(self, data):
        print('write', repr(data))

a BOM is inserted when “tell” returns zero:

>>> t = io.TextIOWrapper(Writer(0), 'utf-16')
seekable -> True
tell -> 0
>>> t.write('HI'); t.flush()  # Writes BOM
2
write b'\xff\xfeH\x00I\x00'

and not when “tell” returns a positive number:

>>> t = io.TextIOWrapper(Writer(1), 'utf-16')
seekable -> True
tell -> 1
>>> t.write('HI'); t.flush()  # Omits BOM
2
write b'H\x00I\x00'

However the “io” and “_pyio” behaviours differ when “seekable” returns False:

>>> t = io.TextIOWrapper(Writer(None), 'utf-16')
seekable -> False
>>> t.write('HI'); t.flush()  # io omits BOM
2
write b'H\x00I\x00'
>>> t = _pyio.TextIOWrapper(Writer(None), 'utf-16')
seekable -> False
>>> t.write('HI'); t.flush()  # _pyio writes BOM
write b'\xff\xfeH\x00I\x00'
2

IMO the “_pyio” behaviour is more sensible: write a BOM because that’s what the 
UTF-16 codec produces.

----------
nosy: +martin.panter

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to