Hi Brendan, I get the same behaviour you are seeing, at least initially. Over time, the number of segvn_faults returns to ~129. I don't use /tmp, as this may also skew your output (because it uses memory). Maybe the writes of pages in the file system cache are skewing results...
max (PS. I'll try using your correct first name in the future. Sorry...). On Fri, 2006-01-13 at 05:59, Brendan Gregg wrote: > G'Day Max, > > Thanks for your reply. Things have become a little stranger... > > On Thu, 12 Jan 2006 [EMAIL PROTECTED] wrote: > > > Hi Greg, > > > > OK. I start the segvn_fault.d script, then in a second window > > a "dd if=/dev/dsk/c0d0s2 of=/dev/null bs=8k" > > and then in a third window do the cp of the 1000 page file. > > Now I get ~1000 segvn_faults for the cp. > > I expected to get a larger count because the dd is > > contending with the cp. > > So, either your disk is very slow or there are other > > busy processes on your old system. > > If my disk was slow or busy when cp wanted to read, then I too would > expect more faults - as cp can fault faster than disk-read-ahead. > But cp isn't reading from disk! A modified segvn.d (attached), > > # segvn.d > Sampling... > ^C > segvn_fault > ----------- > CMD FILE COUNT > [...] > cp /extra1/1000 1000 > > CMD FILE BYTES > [...] > cp /extra1/1000 4096000 > > io:::start > ---------- > CMD FILE DIR BYTES > cp /extra1/1000 R 12288 > cp /extra2/1000 W 4096000 > > The input file /extra1/1000 is not being read from disk (only 12 Kb). > > Repeating your test kicked my Ultra 5 from a consistant 132 segvn_faults > to 1000 segvn_faults, but this remained at 1000 for subsequent tests > without the dd running. I suspect dd to /dev/dsk thrashed the cache > (or changed how it held pages) such that cache-read-ahead stopped working > (even though the pages were still cached). Ok, this is still sounding far > fetched. > > Returning my Ultra 5 to a consistant 132 segvn_faults state was a real > challenge: mount remount didn't work, nor did init 6! > What did work was rewriting my /extra1/1000 file using dd. Hmmm. > > It appears that using dd to WRITE to a file, leaves that file cached in a > cache-read-ahead optimal way (eg, repeatable 132 segvn_faults). Then > either remount or dd the /dev/dsk device (both affecting the cache) and > we go to a repeatable 1000 segvn_faults. > > I rewrote my /extra1/1000 file on my x86 server, and yes - it now > consistantly faults at 129. Phew! > > ... > > I came up with the following simple test to check this out, > > # dd if=/dev/urandom bs=4k of=/extra1/5000 count=5000 > 0+5000 records in > 0+5000 records out > # ptime cp -f /extra1/5000 /tmp > > real 0.077 --- fast, as we just created it > user 0.001 > sys 0.075 > # ptime cp -f /extra1/5000 /tmp > > real 0.076 --- still fast... > user 0.001 > sys 0.074 > # ptime cp -f /extra1/5000 /tmp > > real 0.076 --- still fast... > user 0.001 > sys 0.074 > # umount /extra1; mount /extra1 > # ptime cp -f /extra1/5000 /tmp > > real 0.129 --- slow, as we just remounted the FS > user 0.001 > sys 0.099 > # ptime cp -f /extra1/5000 /tmp > > real 0.084 --- faster, as the file is now cached > user 0.001 > sys 0.081 > # ptime cp -f /extra1/5000 /tmp > > real 0.084 --- hrm... > user 0.001 > sys 0.081 > # ptime cp -f /extra1/5000 /tmp > > real 0.084 --- not getting any faster than this. > user 0.001 > sys 0.081 > > So after creation, /extra1/5000 is copied in 0.076 secs (consistantly). > > After remounting, /extra1/5000 is copied in 0.084 secs (consistantly). > > I haven't found a way to convince a file to be cached as well as it is on > creation. It seems that newly created files are blessed. > > How does this all sound? :) > > cheers, > > Brendan > > > > > > > > max > > > > Quoting Brendan Gregg <[EMAIL PROTECTED]>: > [...] > > >> For instance, if the file system brings in 56k, (14 pages > > >> on my amd64 box), and my disk is reasonably fast, > > >> by the time cp gets a bit into the first 56k, I suspect > > >> that all of the data is in memory and there is no > > >> trapping into the kernel at all until the next 56k > > >> needs to be read in. > > > > > > That would make sense. In this case (testing here anyway) it's not going > > > near disk for reading (only writing the destination file), > > > > > > x86, > > > extended device statistics > > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > > 0.0 74.0 0.0 3999.8 0.0 0.1 0.0 0.9 0 6 c0d0 > > > sparc, > > > extended device statistics > > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > > 0.0 65.0 0.0 7995.6 23.6 1.8 363.7 27.7 88 90 c0t0d0 > > > > > > So, considering both systems undercount expected faults - one fault must > > > be triggering some form of "read ahead" from the page cache, not disk. > > > I'm thinking the path is somthing like, > > > > > > ufs_getpage_ra -> pvn_read_kluster -> page_create_va -> (read many?) > > > > > >> (I guess I am assuming the hat > > >> layer is setting up pte's as the pages are brought in, > > >> not as cp is accessing them). > > > > > > Yep - and that sort of problem would be the very thing that throws a > > > spanner in the works. If it always waited for cp to access them, then I'd > > > have consistant events to trace... > > > > > > Thanks Max! :) > > > > > > Brendan > > > > > > > > > > > > > > > > > > > > _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org