On Sat, Jan 19, 2019 at 09:30:12PM +0900, Bryan Linton wrote:

> Hello tech@,
> 
> I'd appreciate it if someone could review both this patch and my
> analysis.
> 
> 
> PROBLEM
> -------
> When running systat, the vmstat page shows inaccurate numbers for
> disk i/o speed (likely only on multiprocessor systems) when
> compared to both the iostat page, as well as the statistics
> reported by dd.
> 
> To test this, open two xterms.  Run "systat" in one, and
> "systat iostat" in the other.  Then run something like,
> "dd if=/dev/zero of=zero.deleteme bs=64k" (or anything that will
> generate a lot of disk i/o).
> 
> Observe that "systat" shows a different number for the reported
> i/o speed et. al., whereas "systat iostat" and pressing ^T while
> dd is running both return numbers in agreement with each other.
> 
> On my 4-core system, the numbers returned by "systat" for "seeks",
> "xfers", and "speed" all appear to be exactly 4-times lower than
> are indicated in "systat iostat" as well as the transfer rate
> reported by dd.
> 
> 
> TENTATIVE ANALYSIS
> __________________
> (I am not an expert on OpenBSD's internals, and did not dig all
> the way into the depths of the kernel, so some of this analysis is
> based on conjecture).
> 
> In /usr/src/usr.bin/systat/vmstat.c on line 349, (via some
> printf() debugging) the variable "etime" appears to be set to the
> sum total of time elapsed on ALL CPUs in the system.
> 
>       etime = 0;
>       for (i = 0; i < CPUSTATES; i++) {
>               X(cpustats.cs_time);
>               etime += s.cpustats.cs_time[i];
>       }
> 
> This causes an error in calculations later on when it is used to
> calculate disk i/o statistics in vmstat.c (but not in iostat).
> 
> vmstat.c:
> 
>       /* # of K transferred */
>       words = (cur.dk_rbytes[dn] + cur.dk_wbytes[dn]) / 1024.0;
> 
>       putint((int)((float)cur.dk_seek[dn]/etime+0.5), DISKROW + 1, c, 5);
>       putint((int)((float)(cur.dk_rxfer[dn] + cur.dk_wxfer[dn])/etime+0.5),
>               DISKROW + 2, c, 5);
>       putintmk((int)(words/etime + 0.5), DISKROW + 3, c, 5);
>       putfloat(atime/etime, DISKROW + 4, c, 5, 1, 1);
> 
> Compare the above to iostat.c, where 
>       etime = naptime;
> is set on line 143 before showtotal() prints the values.
> 
>       for (dn = 0; dn < cur.dk_ndrive; dn++) {
>               rsum += cur.dk_rbytes[dn] / etime;
>               wsum += cur.dk_wbytes[dn] / etime;
>               rtsum += cur.dk_rxfer[dn] / etime;
>               wtsum += cur.dk_wxfer[dn] / etime;
>               mssum += ATIME(cur.dk_time, dn) / etime;
>       }
> 
>       print_fld_str(FLD_IO_DEVICE, "Totals");
>       print_fld_size(FLD_IO_READ, rsum);
>       print_fld_size(FLD_IO_WRITE, wsum);
>       print_fld_size(FLD_IO_RTPS, rtsum);
>       print_fld_size(FLD_IO_WTPS, wtsum);
>       print_fld_float(FLD_IO_SEC, mssum, 1);
> 
> 
> SOLUTION
> ________
> 
> Set "etime = naptime;" in vmstat.c as well, as indicated in the
> attached patch.
> 
> 
> ADDENDUM
> ________
> 
> I'm curious why iostat.c uses cur.dk_*[dn] directly whereas
> vmstat.c adds 0.5 to it.
>       rsum += cur.dk_rbytes[dn] / etime;

rsum is a double. When adding a double to a double, there's no
rounding or truncation going on.

> vs.
>       words = (cur.dk_rbytes[dn] + cur.dk_wbytes[dn]) / 1024.0;
>       putintmk((int)(words/etime + 0.5), DISKROW + 3, c, 5);
>                                    ^^^^^^

When converting a double (or float) to an int, truncation happens. So
add 0.5 to turn it into rounding.


> 
> Also, there are several casts in lines like
>       putint((int)((float)cur.dk_seek[dn]/etime+0.5), DISKROW + 1, c, 5)
> 
> I understand why the value would need to be cast to an (int),
> since putint() is expecting an int, but since etime is already a
> (double), wouldn't the C language's automatic type-promotion rules
> automatically promote everything to a (double) anyway?

Those casts are indeed unneeded,

        -Otto


> 
> Casting to a (float) would only seem to reduce precision, and then
> since it's immediately being cast to an (int) anyway, is the
> (float) cast really necessary?  Removing it seems to cause no
> immediately noticeable ill effects on my system.
> 
> 
> I would appreciate any advice or explanation anyone can give regarding
> the above.
> 
> Thank you.
> 
> -- 
> Bryan
> 

> Index: vmstat.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/systat/vmstat.c,v
> retrieving revision 1.89
> diff -u -r1.89 vmstat.c
> --- vmstat.c  17 Nov 2018 23:10:08 -0000      1.89
> +++ vmstat.c  19 Jan 2019 11:37:33 -0000
> @@ -678,6 +678,8 @@
>  {
>       double words, atime;
>  
> +     etime = naptime;
> +
>       c += DISKCOL;
>  
>       /* time busy in disk activity */

Reply via email to