Re: I/O spin loop

Ole Tange Wed, 09 Mar 2022 16:04:43 -0800

On Wed, Mar 9, 2022 at 7:21 PM Anderson, Stuart B. <s...@caltech.edu> wrote:
> > On Mar 8, 2022, at 3:35 PM, Ole Tange <o...@tange.dk> wrote:
> > On Tue, Mar 8, 2022 at 11:22 PM Anderson, Stuart B. <s...@caltech.edu> 
> > wrote:
> >> parallel version 20190922 from EPEL 8 running a Rocky Linux 8 system 
> >> occasionally gets into an I/O spin loop writing 8193 bytes of "x" to a 
> >> deleted TMPDIR file and then immediately truncating it, e.g.,
> > This is by design:
> > https://www.gnu.org/software/parallel/parallel_design.html#disk-full
>
> Any reason not to call statvfs() to see if there is free disk space?


It is unclear to me if statvfs is supported on all supported
platforms. You should feel free to examine this.

If it is not supported or if it requires installing more than the
basic Perl package, you will need very convincing evidence.

> > GNU Parallel does an exponential back off:
> >
> > https://www.gnu.org/software/parallel/parallel_design.html#exponentially-back-off
>
> There seems to be a problem with this for some long running jobs,
> perhaps limited to nested parallel (see below).

I have the feeling the problem is not related to the long running job,
but that the nested parallels run short lived jobs.

It will be helpful if you can follow
https://www.gnu.org/software/parallel/man.html#reporting-bugs

> >> Any thoughts on how to avoid this seemingly useless spin loop with high 
> >> CPU and I/O resource consumption?
> >
> > Can you show that there really is an I/O consumption? Do the writes
> > actually reach your disk?
>
> I can confirm this is not throttled once per second

And no one suggested it was. It is using an exponential back off, so
it will *at*least* run once per second, and *at*least* once per job.
Immediately after starting a job it will run ~1 per ms, but it will
exponentially back off, so if no new job is started after ~10 seconds,
it will run around once per second.

> and leads to maxing out the I/O on a fast NVMe device with very high CPU 
> utilization.

The high CPU utilization can be explained by having short lived jobs.
Expect 2-10 ms CPU time per job.

Your documentation does not show if data is actually written to the
NVMe. On my systems the IO stays in RAM: It never reaches the physical
disk.

strace will not show this. But to document this, you can try this:

    seq 1000000 | parallel true &
    # The utilization is expected to stay at 0%
    # If iostat shows the disk of $TMPDIR going from 0% to 100%
utilization, your point is proven
    iostat -dkx 1
    # If you do not have iostat, vmstat will give you some indication, too.
    # The above should give no increase in 'bo'. If it increases
significantly, your point is proven.
    vmstat 1

It is expected that parallel in the above example will use 100% of a
single CPU thread.

> The following example shows it happening twice per millisecond.

How on earth can 2*8KB*1000 (= 2000 iops and 16 MB/s) saturate your NVMe?

I still think you are only seeing CPU usage (which is expected due to
short lived jobs), and you are not even seeing the 16 MB/s I/O on your
NVMe.

> [root@zfs1 ~]# ps auxw w| grep 836133
> root      836133 98.2  0.0  55488 17672 pts/6    R+   Mar08 1277:14 
> /usr/bin/perl /usr/bin/parallel --delay 2 -P /tmp/parallel.zfs.backup3 
> parallel -I// syncoid {} // ::: $LIST

How long does a single syncoid take? How many are run in parallel?

Twice per millisecond is expected immediately after a job is started,
so if syncoid are short lived, you should see this all the time. If,
however, a synciod is only started every 10 seconds, then you should
see that behavior whenever a new job starts.


/Ole

Re: I/O spin loop

Reply via email to