New submission from STINNER Victor <vstin...@redhat.com>:

bpo-26826 added a new os.copy_file_range() function:
https://docs.python.org/dev/library/os.html#os.copy_file_range

As os.sendfile(), this new Linux syscall avoids memory copies between kernel 
space and user space. It matters for performance, especially since Meltdown 
vulnerability required Windows, Linux, FreeBSD, etc. to use a different address 
space for the kernel (like Linux Kernel page-table isolation, KPTI).

shutil has been modified in Python 3.8 to use os.sendfile() on Linux:
https://docs.python.org/dev/whatsnew/3.8.html#optimizations

But according to Pablo Galindo Salgado, copy_file_range() goes further:
"But copy_file_rane can leverage more filesystem features like deduplication 
and copy offload stuff."

https://bugs.python.org/issue26826#msg344582

Giampaolo Rodola' added:

"I think data deduplication / CoW / reflink copy is better implemented via 
FICLONE. "cp --reflink" uses it, I presume because it's older than 
copy_file_range(). I have a working patch adding CoW copy support for Linux and 
OSX (but not Windows). I think that should be exposed as a separate 
shutil.reflink() though, and copyfile() should just do a standard copy."

"Actually "man copy_file_range" claims it can do server-side copy, meaning no 
network traffic between client and server if *src* and *dst* live on the same 
network fs. So I agree copy_file_range() should be preferred over sendfile() 
after all. =)
I have a wrapper for copy_file_range() similar to what I did in shutil in 
issue33671 which I can easily integrate, but I wanted to land this one first:
https://bugs.python.org/issue37096
Also, I suppose we cannot land this in time for 3.8?"

https://bugs.python.org/issue26826#msg344586

--

There was already a discussion about switching shutil to copy-on-write:
https://bugs.python.org/issue33671#msg317989

One problem is that modifying the "copied" file can suddenly become slower if 
it was copied using "cp --reflink".

It seems like adding a new reflink=False parameter to file copy functions to 
control clone/CoW copies is required to prevent bad surprises.

----------
components: Library (Lib)
messages: 344648
nosy: giampaolo.rodola, pablogsal, vstinner
priority: normal
severity: normal
status: open
title: shutil: add reflink=False to file copy functions to control clone/CoW 
copies (use copy_file_range)
type: performance
versions: Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37157>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to