Re: question about parallelism in cp command
On 2019/06/28 04:52, Marc Roos wrote: > > There are always exceptions like with clustered filesystem etc etc. That > is why I wrote 'most used'. If you take all the issued 'cp' commands of > today in the world. I would bet 80%-95% of them would not benefit from > some sort of parallel processing. Single disks already benefit from some parallel processing, and could benefit more as the write process and the on-disk cache process is increased. That's why many hard disks are moving to a SSD+HD combo with a NVMe SSD being able to handle near-memory speeds of up to 64K/microsecond. The benefits of parallelism involve being able to order reads and writes to minimize the need for disk seeking and start/stop overhead and moving to disk streaming of tracks where disk speeds can begin to reach I/O transfer limits. With higher capacities, comes a higher write speed since the disk run at the same linear speed / technology. The idea is for "cp -r" to take 1/10th the iops to copy the same data due to it being re-organizable by the OS and by drivers, but that can only be done if all of the data on tracks can be rewritten. Since the data on tracks rarely even comes from the same file, you need multiple threads to 1) read all the separate files storing data in a track, 2) write all the separate files storing something in the target. The key is scaling memory usage to allow for a thread to completely fill its memory buffer between reads or writes to the device. Unfortunately, cp rarely uses the memory it could due to concerns of voiding some cache that may be used sometime in the future...someday. A tunable might be deciding how much memory to allocate to something like cp, it could write out entire files in 1 iop (if the driver allows). This type of throughput might involve regular defragmenting of disks to allow multiple files transfered to/from disk at once if they were all small enough to fit on, say 1 track, but to do that a demand for all of those files needs to be there for underlying fs-drivers to r/w multiple full tracks at a time while performing only 1 iop to write multiple tracks. > > > -Original Message- > From: L A Walsh [mailto:coreut...@tlinx.org] > Sent: vrijdag 28 juni 2019 13:15 > To: Marc Roos > Cc: aglo; coreutils > Subject: Re: question about parallelism in cp command > > On 2019/06/06 09:25, Marc Roos wrote: >> >> Hmmm without being a maintainer. I would say cp -r is most used on >> single disk, so one thread is using the maximum disk iops taking y >> time to copy. > --- > not exactly true, if the 1 disk as a 20 disk raid10. > > You can target 10 areas at a time and get considerable benefit if they > are spread across multiple disks in the raid. > > >
Re: question about parallelism in cp command
On Fri, Jun 28, 2019 at 04:15:22AM -0700, L A Walsh wrote: You can target 10 areas at a time and get considerable benefit if they are spread across multiple disks in the raid. Alternatively, the kernel can hide this behind readahead.
RE: question about parallelism in cp command
There are always exceptions like with clustered filesystem etc etc. That is why I wrote 'most used'. If you take all the issued 'cp' commands of today in the world. I would bet 80%-95% of them would not benefit from some sort of parallel processing. -Original Message- From: L A Walsh [mailto:coreut...@tlinx.org] Sent: vrijdag 28 juni 2019 13:15 To: Marc Roos Cc: aglo; coreutils Subject: Re: question about parallelism in cp command On 2019/06/06 09:25, Marc Roos wrote: > > Hmmm without being a maintainer. I would say cp -r is most used on > single disk, so one thread is using the maximum disk iops taking y > time to copy. --- not exactly true, if the 1 disk as a 20 disk raid10. You can target 10 areas at a time and get considerable benefit if they are spread across multiple disks in the raid.
Re: question about parallelism in cp command
On 2019/06/06 09:25, Marc Roos wrote: > > Hmmm without being a maintainer. I would say cp -r is most used on > single disk, so one thread is using the maximum disk iops taking y time > to copy. --- not exactly true, if the 1 disk as a 20 disk raid10. You can target 10 areas at a time and get considerable benefit if they are spread across multiple disks in the raid.
Re: question about parallelism in cp command
On Thu, Jun 6, 2019 at 2:44 PM Assaf Gordon wrote: > > > -Original Message- > > From: Olga Kornievskaia [mailto:a...@umich.edu] > > > > Is there something philosophically incorrect in making a “cp” > > multi-threaded and allow for parallel copies when “cp -r” is done? If > > it’s something that’s possible, are there any plans in making a > > multi-threaded cp? > > On Thu, Jun 06, 2019 at 02:17:40PM -0400, Olga Kornievskaia wrote: > > The use case I'm consider are network file systems. So perhaps a > > default can be a single threaded system for the local filesystems but > > add an option to cp for the -r case that would enable network file > > system to copy files in parallel. > > In an interesting coincidence, see recent post by Paul Kolano here: > https://lists.gnu.org/archive/html/coreutils/2019-06/msg00011.html > > (Note that his suggestions have not been reviewed yet, so this is > neither endorsement nor criticism of his code.) > Interesting! Thank you for the link (since I'm not on the mailing list). I'm going to try out this code and see how it performs (Thank you Paul Kolano). It would be great if the maintainers of the coreutils would consider adding this multi-threaded cp functionality in.
Re: question about parallelism in cp command
> -Original Message- > From: Olga Kornievskaia [mailto:a...@umich.edu] > > Is there something philosophically incorrect in making a “cp” > multi-threaded and allow for parallel copies when “cp -r” is done? If > it’s something that’s possible, are there any plans in making a > multi-threaded cp? On Thu, Jun 06, 2019 at 02:17:40PM -0400, Olga Kornievskaia wrote: > The use case I'm consider are network file systems. So perhaps a > default can be a single threaded system for the local filesystems but > add an option to cp for the -r case that would enable network file > system to copy files in parallel. In an interesting coincidence, see recent post by Paul Kolano here: https://lists.gnu.org/archive/html/coreutils/2019-06/msg00011.html (Note that his suggestions have not been reviewed yet, so this is neither endorsement nor criticism of his code.) regards, - assaf
Re: question about parallelism in cp command
The use case I'm consider are network file systems. So perhaps a default can be a single threaded system for the local filesystems but add an option to cp for the -r case that would enable network file system to copy files in parallel. On Thu, Jun 6, 2019 at 12:25 PM Marc Roos wrote: > > > Hmmm without being a maintainer. I would say cp -r is most used on > single disk, so one thread is using the maximum disk iops taking y time > to copy. What would solve using multiple threads each taking their share > of the maximum disk iops, and because of the scheduling and other > overhead finishing later than y time? > > > > -Original Message- > From: Olga Kornievskaia [mailto:a...@umich.edu] > Sent: donderdag 6 juni 2019 17:39 > To: coreutils@gnu.org > Subject: question about parallelism in cp command > > Hi folks, > > Is there something philosophically incorrect in making a “cp” > multi-threaded and allow for parallel copies when “cp -r” is done? If > it’s something that’s possible, are there any plans in making a > multi-threaded cp? > > I’m not a member of the list so I kindly request you cc me on the > reply. > > Thank you. > > >
RE: question about parallelism in cp command
Hmmm without being a maintainer. I would say cp -r is most used on single disk, so one thread is using the maximum disk iops taking y time to copy. What would solve using multiple threads each taking their share of the maximum disk iops, and because of the scheduling and other overhead finishing later than y time? -Original Message- From: Olga Kornievskaia [mailto:a...@umich.edu] Sent: donderdag 6 juni 2019 17:39 To: coreutils@gnu.org Subject: question about parallelism in cp command Hi folks, Is there something philosophically incorrect in making a “cp” multi-threaded and allow for parallel copies when “cp -r” is done? If it’s something that’s possible, are there any plans in making a multi-threaded cp? I’m not a member of the list so I kindly request you cc me on the reply. Thank you.
question about parallelism in cp command
Hi folks, Is there something philosophically incorrect in making a “cp” multi-threaded and allow for parallel copies when “cp -r” is done? If it’s something that’s possible, are there any plans in making a multi-threaded cp? I’m not a member of the list so I kindly request you cc me on the reply. Thank you.