New submission from Lorenz Hüdepohl <b...@stellardeath.org>:

Since version 3.8, the shutil.copyfile() function tries to make use of 
os.sendfile(). Currently, this is done in a loop that aborts as soon as 
sendfile() returns 0, as it is assumed that this signifies the end of the file.

The problem: the return value of sendfile() is simply the amount of bytes that 
could be successfully copied in that one syscall, with no guarantee that a 
return value of 0 only happens when there is truly no data left to copy. Some 
other urgent task could have interrupted the kernel before any data was copied. 
At least, I could not find documentation to the contrary.

(Note: This might or might not be actual behavior of current Linux kernels 
today, but the spec of sendfile() certainly allows this)


In any case, in that same routine the size of the source file is anyway 
requested in an os.fstat() call, one could thus restructure the loop like this, 
for example:


filesize = os.fstat(infd).st_size
offset = 0
while offset < filesize:
   sent = os.sendfile(outfd, infd, offset, blocksize)
   offset += sent

(Error handling etc. left out for clarity, just to point out the new structure)

This would also optimize the case of an empty input file, in that case the loop 
is never entered and no os.sendfile() call is issued, at all.

In the normal case, it would also save the unnecessary last os.sendfile() call, 
when 'offset' has already grown to 'filesize'. (This was the actual reason for 
me to look into this in the first place, a filesystem bug where sendfile() 
called with an offset set to the file's size returns "EAGAIN" in certain cases. 
But this is another topic entirely and has nothing to do with Python, of 
course.)


Note that in Modules/posixmodule.c os_sendfile_impl() there is also another 
loop around individual actual sendfile() system call, but a return value of 0 
there would also exit that loop and be passed up:

    do {
        Py_BEGIN_ALLOW_THREADS
#ifdef __APPLE__
        ret = sendfile(in_fd, out_fd, offset, &sbytes, &sf, flags);
#else
        ret = sendfile(in_fd, out_fd, offset, count, &sf, &sbytes, flags);
#endif
        Py_END_ALLOW_THREADS
    } while (ret < 0 && errno == EINTR && !(async_err = PyErr_CheckSignals()));


Kind regards,
  Lorenz

----------
components: Library (Lib)
messages: 388027
nosy: lhuedepohl
priority: normal
severity: normal
status: open
title: shutil._fastcopy_sendfile() makes wrong (?) assumption about sendfile() 
return value
versions: Python 3.10, Python 3.8, Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43388>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to