[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-06-01 Thread Cheryl Sabella


Change by Cheryl Sabella :


--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-06-01 Thread Jeffrey Kintscher


Jeffrey Kintscher  added the comment:

Further testing revealed the problem and fix also apply to text mode. I 
submitted a pull request to fix this issue with tests for both binary and text 
mode.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-31 Thread Jeffrey Kintscher


Change by Jeffrey Kintscher :


--
keywords: +patch
pull_requests: +13604
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/13717

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-31 Thread PEW's Corner


PEW's Corner  added the comment:

Your proposal sounds good to me, Jeffrey.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-30 Thread Jeffrey Kintscher


Jeffrey Kintscher  added the comment:

I ran another C test where I called lseek(fd, 0, SEEK_CUR) in-between each of 
the stream I/O functions to see how it reports the buffered and kernel file 
positions:

opening file with wb
writing 3 bytes to file
lseek: 0
ftell(f): expecting 3, got 3
lseek: 3
closing file
opening file with a+b
lseek: 3
ftell(f): expecting 3, got 3
lseek: 3
writing 3 bytes to file
lseek: 3
ftell(f): expecting 6, got 6
lseek: 6
fseek(f, 0, SEEK_SET): expecting 0, got 0
lseek: 0
ftell(f): expecting 0, got 0
lseek: 0
writing 3 bytes to file
lseek: 0
ftell(f): expecting 9, got 9
lseek: 9
writing 3 bytes to file
lseek: 9
ftell(f): expecting 12, got 12
lseek: 12
fseek(f, 0, SEEK_CUR): expecting 0, got 0
lseek: 12
ftell(f): expecting 12, got 12
lseek: 12
fseek(f, 0, SEEK_SET): expecting 0, got 0
lseek: 0
fread(buf, 1, 256, f): expecting 12, got 12
lseek: 12
expecting 'abcdefghijkl', got 'abcdefghijkl'
closing file
removing file

The C library fseek() and ftell() functions update the kernel file position, 
but fwrite() doesn't (because it hasn't overflowed its write buffer). The 
kernel doesn't allow seeking past the end of the file, so it looks like fseek() 
and ftell() are flushing the C library's write buffer because the subsequent 
lseek() calls return the same value.

My observation of Python 3 and C library behavior with read/write append mode:

Bufferred write:
C - write to memory buffer, no kernel position updates unless the buffer is 
flushed to the kernel by buffer overflow
Python - same as C

Buffered seek:
C - write buffer is flushed to the kernel to update the kernel position, then a 
kernel seek operation is performed and the new location reported by the kernel 
is returned
Python - same as C

Buffered tell:
C - identical behavior as buffered seek; semantically a wrapper for fseek(f, 0, 
SEEK_CUR)
Python - the position is calculated using the last queried kernel position and 
the dirty buffer length, the kernel position is queried only if a kernel 
position hasn't been previously queried since the file was opened


Having the buffered I/O library keep track of the file position is attractive 
in that it saves time by skipping a few kernel calls, but, as demonstrated, it 
can easily get out of sync. I'm sure I can come up with a way to fix the 
accounting to work properly without querying the kernel, but I don't think that 
is a good solution.

I see dangers with trying to track the file position in the library independent 
of the kernel. For one, when the file is opened with the append flag, the 
kernel changes the position to the end of the file with every write. Yes, that 
can be tracked with careful code in the library, except when the file is opened 
two or more times and written to using different file descriptors. This is a 
common scenario for log files, with appends being performed by multiple 
processes. The only mitigation is to use file locking to guarantee exclusive 
file access, but the semantics vary from system to system and it is not 
universally supported.

My preferred solution is to follow the C library semantics and turn buffered 
tell() into a wrapper for buffered seek(0, io.SEEK_CUR). Please, let me know if 
there are some semantics I haven't considered.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-30 Thread PEW's Corner


PEW's Corner  added the comment:

Great analysis, Jeffrey. Does the kernel position actually move to the end of 
the file on the f.write() in step 10, or on the flush in step 14? If it's the 
latter, then f.write() should probably call lseek() to set both the kernel 
position and internal position to the end of the file before the write. This 
would have the added benefit of making the Python append behavior independent 
of the kernel behavior (as some kernels apparently handle appending 
differently).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-30 Thread Jeffrey Kintscher


Jeffrey Kintscher  added the comment:

I did some tracing through bpo-36411.py using pdb and lldb.

The problem is that the file position variables stored in the buffered file 
object get out of sync with the kernel in the write operation after the seek(0) 
call. It thinks it is writing to the buffer as if it were relative to the last 
seek position. This is reflected in the incorrect values returned by tell (3 
instead of 9, 6 instead of 12). The second seek operation flushes the buffer to 
disk and synchronizes the position variables with the kernel. The data is 
always written correctly to disk because the kernel maintains its own file 
position and handles the append operations internally.

Now that I have a handle on what is happening, I will dig into the buffered 
writes to find the root cause and hopefully a solution.

A workaround is to call f.seek(0, io.SEEK_CUR) before f.tell() to force the 
file object to synchronize its position variables with the kernel. There is a 
performance hit as f.seek() will flush dirty buffers to disk each time it is 
called, though it probably won't be noticed on a file system buffered by the 
kernel.

Here is what I found line by line (only relevant details are included):

1. The first f.open() creates the file and initializes the internal position 
variables to reflect the beginning of the file.
2. The first f.write() stores three bytes in the buffer and updates the 
internal position variables accordingly.
3. f.tell() correctly returns 3.
4. f.close() flushes the buffer to disk and closes the file.
5. When f.open() is called a second time with append mode, it calls lseek() to 
get the file position from the kernel and sets the internal position variables 
accordingly.
6. f.tell() correctly returns 3 to reflect the 3 bytes already in the file.
7. f.write() stores three bytes into the beginning of the buffer and updates 
the internal position variables accordingly.
8. f.tell() returns 6 as the correct position.
9. f.seek(0) flushes the dirty buffer to disk, calls lseek() so that the kernel 
updates the file position, and resets the internal position variables to 
reflect the beginning of the file.
10. f.write() stores three bytes into the beginning of the buffer, but the 
position variables are updated as if the write is relative to the beginning of 
the file. The position variables are now out of sync with the kernel.
11. f.tell() reflects this discrepancy by returning 3 instead of 9.
12. f.write() stores three bytes into the buffer starting at offset 3 and 
updates the position variables accordingly.
13. f.tell() returns 6 instead of 9.
14. f.seek(0, io.SEEK_CUR) flushes the 6 bytes in the dirty buffer to disk. The 
flush operation correctly appends the 6 bytes to the disk file regardless of 
the position variables because the kernel handles the append operation 
internally based upon its own file position variable. It then calls lseek() to 
determine the new position and updates the position variables accordingly. 
15. f.tell() returns the correct position.
16. The final f.seek(0) calls lseek() to move the file position back to the 
beginning of the file and updates the position variables, but doesn't flush 
anything to disk because the buffer is clean.
17. f.read() reads the entire file from the beginning.
18. f.close() closes the file. No flush operation is performed because the 
buffer is clean.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-28 Thread Jeffrey Kintscher


Jeffrey Kintscher  added the comment:

I saw the bug in the C output after I hit submit. Here is the correct output:

opening file with wb
writing 3 bytes to file
ftell(f): expecting 3, got 3
closing file
opening file with a+b
ftell(f): expecting 3, got 3
writing 3 bytes to file
ftell(f): expecting 6, got 6
fseek(f, 0, SEEK_SET): expecting 0, got 0
ftell(f): expecting 0, got 0
writing 3 bytes to file
ftell(f): expecting 9, got 9
writing 3 bytes to file
ftell(f): expecting 12, got 12
fseek(f, 0, SEEK_CUR): expecting 0, got 0
ftell(f): expecting 12, got 12
fseek(f, 0, SEEK_SET): expecting 0, got 0
fread(buf, 1, 256, f): expecting 12, got 12
expecting 'abcdefghijkl', got 'abcdefghijkl'
closing file
removing file

--
Added file: https://bugs.python.org/file48370/bpo-36411.c

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-28 Thread Jeffrey Kintscher


Change by Jeffrey Kintscher :


Removed file: https://bugs.python.org/file48369/bpo-36411.c

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-28 Thread Jeffrey Kintscher


Change by Jeffrey Kintscher :


Added file: https://bugs.python.org/file48369/bpo-36411.c

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-28 Thread Jeffrey Kintscher


Jeffrey Kintscher  added the comment:

I adapted the test code on StackOverflow to show the expected and actual values.

I get this output using Python 2.7.16:


opening myfile with wb
writing 3 bytes to file
('f.tell(): expecting 3, got', 3)
closing myfile
opening myfile with a+b
('f.tell(): expecting 3, got', 3)
writing 3 bytes to file
('f.tell(): expecting 6, got', 6)
('f.seek(0): expecting 0, got', None)
('f.tell(): expecint 0, got', 0)
writing 3 bytes to file
('f.tell(): expecting 9, got', 9)
writing 3 bytes to file
('f.tell(): expecting 12, got', 12)
('f.seek(0, io.SEEK_CUR): expecting 12, got', None)
('f.tell(): expecting 12, got', 12)
('f.seek(0): expecting 0, got', None)
("f.read(): expecting b'abcdefghijkl', got", 'abcdefghijkl')
closing file
removing file


As described, I get different results using Python 3.7.3 as well as the master 
branch (Python 3.8.0a4+):


closing myfile
opening myfile with a+b
f.tell(): expecting 3, got 3
writing 3 bytes to file
f.tell(): expecting 6, got 6
f.seek(0): expecting 0, got 0
f.tell(): expecint 0, got 0
writing 3 bytes to file
f.tell(): expecting 9, got 3
writing 3 bytes to file
f.tell(): expecting 12, got 6
f.seek(0, io.SEEK_CUR): expecting 12, got 12
f.tell(): expecting 12, got 12
f.seek(0): expecting 0, got 0
f.read(): expecting b'abcdefghijkl', got b'abcdefghijkl'
closing file
removing file


I wrote an equivalent C program using the stream I/O functions and get the same 
results as Python 2.7.16:


opening file with wb
writing 3 bytes to file
ftell(f): expecting 3, got 3
closing file
opening file with a+b
ftell(f): expecting 3, got 3
writing 3 bytes to filftell(f): expecting 6, got 6
fseek(f, 0, SEEK_SET): expecting 0, got 0
ftell(f): expecting 0, got 0
writing 3 bytes to file
ftell(f): expecting 9, got 9
writing 3 bytes to file
ftell(f): expecting 12, got 12
fseek(f, 0, SEEK_CUR): expecting 0, got 0
ftell(f): expecting 12, got 12
fseek(f, 0, SEEK_SET): expecting 0, got 0
fread(buf, 1, 256, f): expecting 12, got 12
expecting 'abcdefghijkl', got 'abcdefghijkl'
closing file
removing file


I consider this behavior to be a bug because Python 2.7 and C agree, but Python 
3.7 behaves differently. Requiring that developers know and work around 
non-standard quirky behavior leads to extra work and error-prone code.

--
nosy: +Jeffrey.Kintscher
Added file: https://bugs.python.org/file48368/bpo-36411.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-24 Thread PEW's Corner


PEW's Corner  added the comment:

But buffers are used - and update tell() correctly - in all other cases than 
binary append+read mode (and even this case works in Python 2). Also, the 
implementation clearly tries to keep tell() updated irrespective of the buffer, 
so isn't it just a matter of reminding it to move the pointer to the end of the 
file when doing an append?

If that isn't feasible, then of course it would be good to document the problem 
in the description of tell().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-23 Thread Windson Yang


Windson Yang  added the comment:

I think we should mention it at the document, like in the tell() function.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-05-23 Thread Windson Yang


Windson Yang  added the comment:

I'm not sure it's a bug. When you write binary data to file (use BufferedIOBase 
by default). It actually writes the data to a buffer. That is why tell() gets 
out of sync. You can follow the instrument belolw. For instance, call flush() 
after writing to get the `correct answer.`

> When writing to this object, data is normally placed into an internal buffer. 
> The buffer will be written out to the underlying RawIOBase object under 
> various conditions, including:

> when the buffer gets too small for all pending data;
> when flush() is called;
> when a seek() is requested (for BufferedRandom objects);
> when the BufferedWriter object is closed or destroyed.

1. https://docs.python.org/3/library/io.html#io.BufferedWriter

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-03-29 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

'Python 3.9' does not exist yet.  This choice is for patches that should only 
be applied to 3.9 when available (like future deprecations or removals).

--
nosy: +terry.reedy
versions:  -Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-03-24 Thread Windson Yang


Windson Yang  added the comment:

This looks interesting. Let me try to fix it.

--
nosy: +Windson Yang
versions: +Python 3.8, Python 3.9 -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-03-24 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +martin.panter, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36411] Python 3 f.tell() gets out of sync with file pointer in binary append+read mode

2019-03-23 Thread PEW's Corner


New submission from PEW's Corner :

When a file is opened in binary append+read mode, i.e. open('file', 'a+b'), and 
a write (i.e. append) operation is performed while the file pointer is not at 
the end of the file (e.g. after a seek(0)), tell() will subsequently return the 
wrong value in Python 3 (but not in Python 2).

See this SO question:
https://stackoverflow.com/questions/51010322/python-3-tell-gets-out-of-sync-with-file-pointer-in-appendread-mode

--
components: IO
messages: 338709
nosy: pewscorner
priority: normal
severity: normal
status: open
title: Python 3 f.tell() gets out of sync with file pointer in binary 
append+read mode
type: behavior
versions: Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com