[issue26158] File truncate() not defaulting to current position as documented

2021-02-26 Thread Eryk Sun


Change by Eryk Sun :


--
type:  -> behavior
versions: +Python 3.10, Python 3.8, Python 3.9 -Python 2.7, Python 3.5, Python 
3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread random832

random832 added the comment:

In the analogous C operations, ftell (analogous to .tell) actually causes the 
underlying file descriptor's position (analogous to the raw stream's position) 
to be reset to be at the same value that ftell has returned. Which means, yes, 
that you lose the benefits of buffering if you're so foolish as to call ftell 
after every read. But in this case the sequence "read / tell / truncate" would 
be analogous to "fread(f) / ftell(f) / ftruncate(fileno(f))

Though, the fact that fread operates on the FILE * whereas truncate operates on 
a file descriptor serves as a red flag to C programmers... arguably since this 
is not the case with Python, truncate on a buffered stream should implicitly 
include this same "reset underlying position" operation before actually 
performing the truncate.

--
nosy: +random832

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Martin Panter

Martin Panter added the comment:

Fornax: Yes, I was suggesting the idea of deprecating truncate() for text 
files! Although a blanket deprecation of all cases may not be realistic. 
Quickly reading the Stack Overflow pages, it seems like there is demand for 
this to work in some cases. Deprecating it in the more awkward situations, such 
as after after reading, and with specific kinds of codecs, might be an option 
though.

Now I think Issue 12215 (read then write) is more closely related to the 
read-then-truncate problem. For the write-then-read bug, it might be a separate 
problem with an easy fix: call flush() before changing to reader mode.

Eryk: If there is no decoder state, and the file data hasn’t changed, maybe it 
is solvable. But I realize now it won’t work in general. We would have to 
construct the encoder state from the decoder state. The same problem as Issue 
12215.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Eryk Sun

Eryk Sun added the comment:

FYI, you can parse the cookie using struct or ctypes. For example:

class Cookie(ctypes.Structure):
_fields_ = (('start_pos', ctypes.c_longlong),
('dec_flags', ctypes.c_int),
('bytes_to_feed', ctypes.c_int),
('chars_to_skip', ctypes.c_int),
('need_eof',  ctypes.c_byte))

In the simple case only the buffer start_pos is non-zero, and the result of 
tell() is just the 64-bit file pointer. In Serhiy's UTF-7 example it needs to 
also convey the bytes_to_feed and chars_to_skip values:

>>> f.tell()
680564735109527527154978616360239628288
>>> cookie_bytes = f.tell().to_bytes(ctypes.sizeof(Cookie), sys.byteorder)
>>> state = Cookie.from_buffer_copy(cookie_bytes)
>>> state.start_pos
0
>>> state.dec_flags
0
>>> state.bytes_to_feed
16
>>> state.chars_to_skip
2
>>> state.need_eof
0

So a seek(0, SEEK_CUR) in this case has to seek the buffer to 0, read and 
decode 16 bytes, and skip 2 characters. 

Isn't this solvable at least for the case of truncating, Martin? It could do a 
tell(), seek to the start_pos, read and decode the bytes_to_feed, re-encode the 
chars_to_skip, seek back to the start_pos, write the encoded characters, and 
then truncate.

>>> f = open('temp.txt', 'w+', encoding='utf-7')
>>> f.write(b'+BDAEMQQyBDMENA-'.decode('utf-7'))
5
>>> _ = f.seek(0); f.read(2)
'аб'
>>> cookie_bytes = f.tell().to_bytes(sizeof(Cookie), byteorder)
>>> state = Cookie.from_buffer_copy(cookie_bytes)
>>> f.buffer.seek(state.start_pos)
0
>>> buf = f.buffer.read(state.bytes_to_feed)
>>> s = buf.decode(f.encoding)[:state.chars_to_skip]
>>> f.buffer.seek(state.start_pos)
0
>>> f.buffer.write(s.encode(f.encoding))
8
>>> f.buffer.truncate()
8
>>> f.close()
>>> open('temp.txt', encoding='utf-7').read()
'аб'

Rewriting the encoded bytes is necessary to properly terminate the UTF-7 
sequence, which makes me doubt whether this simple approach will work for all 
codecs. But something like this is possible, no?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Марк Коренберг

Марк Коренберг added the comment:

text files and seek() offset: issue25849

--
nosy: +mmarkk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread STINNER Victor

Changes by STINNER Victor :


--
nosy:  -haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Fornax

Fornax added the comment:

I don't have a specific use case. This spawned from a tangentially-related 
StackOverflow question (http://stackoverflow.com/questions/34858088), where in 
the answers a behavior difference between Python 2 and 3 was noted. I couldn't 
find any documentation to explain it, so I opened a follow-up question 
(http://stackoverflow.com/questions/34879318), and based on some feedback I got 
there, I opened up this issue.

Just to be sure I understand, you're suggesting deprecating truncate on 
text-mode file objects? And the interleaved read-writes are likely related to 
an issue that will be dealt with elsewhere?

--
type: behavior -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Martin Panter

Martin Panter added the comment:

In theory, TextIOWrapper could rewrite the last bit of the file (or the whole 
file) to have the requested number of characters. But I wonder if it is worth 
it; maybe deprecation is better. Do you have a use case for any of these bugs, 
or are you just playing around to see what the methods do?

In Issue 12922, seek() and tell() were (re-)defined for TextIOBase, but the 
situation with truncate() was apparently not considered.

Perhaps the write()–read() bug is related to Issue 12215.

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

May be. Looking at the code, both Python and C implementations of TextIOWrapper 
look incorrect.

Python implementation:

def truncate(self, pos=None):
self.flush()
if pos is None:
pos = self.tell()
return self.buffer.truncate(pos)

If pos is not specified, self.tell() is used as truncating position for 
underlying binary file. But self.tell() is not an offset in bytes, as seen from 
UTF-7 example. This is complex cookie that includes starting position in binary 
file, a number of bytes that should be read and feed to the decoder, and other 
decoder flags. Needed at least unpack the cookie returned by self.tell(), and 
raise an exception if it doesn't ambiguously point to binary file position.

C implementation is equivalent to:

def truncate(self, pos=None):
self.flush()
return self.buffer.truncate(pos)

It just ignores decoder buffer.

--
stage:  -> needs patch
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Fornax

Fornax added the comment:

Another surprising result:

>>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n')
32
>>> f = open('temp.txt', 'r+')
>>> f.write('test')
4
>>> f.close()
>>> open('temp.txt').read()
'testE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n'

>>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n')
32
>>> f = open('temp.txt', 'r+')
>>> f.write('test')
4
>>> f.read(1)
'A'
>>> f.close()
>>> open('temp.txt').read()
'ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\ntest'

The position of the write in the file depends on whether or not there is a 
subsequent read.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Fornax

Fornax added the comment:

After taking a little time to let this sink in, I'm going to play Devil's 
Advocate just a little more.

It sounds like you're basically saying that any read-write text-based modes 
(e.g. r+, w+) should be used at your own peril. While I understand your UTF-7 
counterexample, and it's a fair point, is it out of line to expect that for 
encodings that operate on full bytes, file positioning should work a bit more 
intuitively? (Which is to say, a write/truncate after a read should take place 
in the position immediately following the end of the read.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Fornax

Fornax added the comment:

Heh... building on Serhiy's example:

>>> f.tell()
680564735109527527154978616360239628288

I'm way out of my depth here. The results seem surprising, but I lack the 
experience to be able to say what they "should" look like. So I guess if it's 
working as intended and just needs clarification in the documentation, so be it.

Thanks for pointing out what's actually causing the behavior.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This is not always possible. Consider following example:

>>> open('temp.txt', 'wb').write(b'+BDAEMQQyBDMENA-')
16
>>> f = open('temp.txt', 'r+', encoding='utf-7')
>>> f.read(2)
'аб'

What should be the result of truncating?

I think it would be better to not implement truncate() for text files at all.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Eryk Sun

Eryk Sun added the comment:

Serhiy, why doesn't truncate do a seek(0, SEEK_CUR) to synchronize the buffer's 
file pointer before calling its truncate method? This also affects writing in 
"+" modes when the two file pointers are out of sync.

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Fornax

Fornax added the comment:

To clarify... the intended behavior is for truncate to default to the current 
position of the buffer, rather than the current position as reported directly 
from the stream by tell?

That seems... surprising.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This is because the file is buffered.

>>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n')
32
>>> f = open('temp.txt', 'r+')
>>> f.readline()
'ABCDE\n'
>>> f.tell()
6
>>> f.buffer.tell()
32
>>> f.buffer.raw.tell()
32

The documentation needs a clarification.

--
assignee:  -> docs@python
components: +Documentation
nosy: +benjamin.peterson, docs@python, haypo, pitrou, serhiy.storchaka, 
steve.dower, stutzbach
versions: +Python 2.7, Python 3.6 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Stephen Paul Chappell

Changes by Stephen Paul Chappell :


--
nosy: +Zero

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26158] File truncate() not defaulting to current position as documented

2016-01-19 Thread Fornax

New submission from Fornax:

io.IOBase.truncate() documentation says:
"Resize the stream to the given size in bytes (or the current position if size 
is not specified). The current stream position isn’t changed. This resizing can 
extend or reduce the current file size. In case of extension, the contents of 
the new file area depend on the platform (on most systems, additional bytes are 
zero-filled). The new file size is returned."

However:
>>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n')
32
>>> f = open('temp.txt', 'r+')
>>> f.readline()
'ABCDE\n'
>>> f.tell()
6   # As expected, current position is 6 after the readline
>>> f.truncate()
32  # ?!

Verified that the document does not get truncated to 6 bytes as expected. 
Adding an explicit f.seek(6) before the truncate causes it to work properly 
(truncate to 6). It also works as expected using a StringIO rather than a file, 
or in Python 2 (used 2.7.9).

Tested in 3.4.3/Windows, 3.4.1/Linux, 3.5.1/Linux.

--
components: IO
messages: 258600
nosy: fornax
priority: normal
severity: normal
status: open
title: File truncate() not defaulting to current position as documented
versions: Python 3.4, Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com