Greetings,

Many years ago, I developed a set of patches to add a number of features to cp 
and md5sum including multi-threading, partial copies, direct i/o, asynchronous 
read/writes, checksum during copy, multi-host ssh-/MPI-based copies, Lustre 
support, preallocation, files over stdin, and stats output.  These offer 
significant performance benefits along with greater flexibility for use in 
other purposes (in particular, the partial copy and files over stdin features). 
 You can see details here:

    https://pkolano.github.io/projects/mutil.html

The code is stable and has been used for almost 10 years in production at the 
NASA Advanced Supercomputing division to transfer many, many PBs of scientific 
data.  It is also used as one of the underlying transports in a separate 
project (https://pkolano.github.io/projects/shift.html) to provide high 
performance tar creation/extraction and integrity verification/rectification.

I do not have time to keep it in sync with every coreutils release so it is 
still based on 8.22, but is usually straightforward to bring it up to date.  
Just wanted to inquire if there was any interest in incorporating some/all of 
these patches into the mainline cp/md5sum code so that the greater coreutils 
base of users can benefit from them.  I can assist in updating the code to the 
latest coreutils, pruning out features of interest, etc.  Please let me know if 
there is any interest.

thanks,

--Paul

Reply via email to