Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
Hello Jim! Actually i can repeat it... every time i did run some d script to collect some data i got some (how do you call it? nasty :) values. Look: Fri Dec 5 10:19:32 BRST 2008 Fri Dec 5 10:29:34 BRST 2008 NFSv3 read/write distributions (us): read value - Distribution - count 2 | 0 4 | 1092 8 |@93773 16 | 64481 32 |@@ 11713 64 |@7590 128 | 1156 256 | 698 512 | 1394 1024 | 1729 2048 | 805 4096 |@2732 8192 |@@@ 14893 16384 |@@ 9351 32768 |@2988 65536 | 647 131072 | 119 262144 | 29 524288 | 30 1048576 | 28 2097152 | 0 write value - Distribution - count 64 | 0 128 | 8 256 | 2418 512 |@@@ 22679 1024 | 28442 2048 | 59887 4096 |@68852 8192 |@65152 16384 | 32224 32768 |@@ 11554 65536 | 3162 131072 | 1100 262144 | 446 524288 | 105 1048576 | 70 2097152 | 11 4194304 | 0 8388608 | 0 16777216 | 0 33554432 | 0 67108864 | 0 134217728 | 0 268435456 | 0 536870912 | 0 1073741824 | 0 2147483648 | 0 4294967296 | 0 8589934592 | 0 17179869184 | 0 34359738368 | 0 68719476736 | 0 137438953472 | 11 274877906944 | 0 NFSv3 read/write by host (total us): x.16.0.x 4707246 x.16.0.x 28397213 x.16.0.x 40901275 x.16.0.x 68333664 x.16.0.x 89357734 x.16.0.x 125890329 x.16.0.x 127848295 x.16.0.x 132248305 x.16.0.x 135161278 x.16.0.x 138579146 x.16.0.x 146275507 x.16.0.x 156761509 x.16.0.x 566154684 x.16.0.x 185948455950 x.16.0.x 186184056503 x.16.0.x 186341816343 x.16.0.x1488962592532
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
HmmmSomething is certainly wrong. 11 writes at 137k - 275k seconds (which is where your 1.5M seconds sum is coming from) is bogus. What version of Solaris is this ('uname -a' and 'cat /etc/release')? Your running this on an NFS server, right (not client)? Is this a benchmark? I ask because the file names (file1, file2, etc) seem like names used in synthetic benchmarks. I need to take a closer look at the code and figure out how to chase this. Either there's a bug somewhere, or we're missing something. Thanks, /jim Marcelo Leal wrote: > Hello Jim! > Actually i can repeat it... every time i did run some d script to collect > some data i got some (how do you call it? nasty :) values. Look: > > Fri Dec 5 10:19:32 BRST 2008 > Fri Dec 5 10:29:34 BRST 2008 > > > NFSv3 read/write distributions (us): > > read >value - Distribution - count >2 | 0 >4 | 1092 >8 |@93773 > 16 | 64481 > 32 |@@ 11713 > 64 |@7590 > 128 | 1156 > 256 | 698 > 512 | 1394 > 1024 | 1729 > 2048 | 805 > 4096 |@2732 > 8192 |@@@ 14893 >16384 |@@ 9351 >32768 |@2988 >65536 | 647 > 131072 | 119 > 262144 | 29 > 524288 | 30 > 1048576 | 28 > 2097152 | 0 > > write >value - Distribution - count > 64 | 0 > 128 | 8 > 256 | 2418 > 512 |@@@ 22679 > 1024 | 28442 > 2048 | 59887 > 4096 |@68852 > 8192 |@65152 >16384 | 32224 >32768 |@@ 11554 >65536 | 3162 > 131072 | 1100 > 262144 | 446 > 524288 | 105 > 1048576 | 70 > 2097152 | 11 > 4194304 | 0 > 8388608 | 0 > 16777216 | 0 > 33554432 | 0 > 67108864 | 0 >134217728 | 0 >268435456 | 0 >536870912 | 0 > 1073741824 | 0 > 2147483648 | 0 > 4294967296 | 0 > 8589934592 | 0 > 17179869184 | 0 > 34359738368 | 0 > 68719476736 | 0 > 137438953472 | 11 > 274877906944 | 0 > > NFSv3 read/write by host (total us): > > x.16.0.x 4707246 > x.16.0.x 28397213 > x.16.0.x 40901275 > x.16.0.x 68333664 > x.16.0.x 89357734 > x.16.0.x 125890329 > x.16.0.x
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
Hello Jim, this is not a benchmark. The filenames i did change for privacy... This is a NFS server, yes. # uname -a SunOS test 5.11 snv_89 i86pc i386 i86pc # cat /etc/release Solaris Express Community Edition snv_89 X86 Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 06 May 2008 -- This message posted from opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
Also (I meant to ask) - are you having performance problems, or just monitoring with the NFS provider scripts? Thanks, /jim Marcelo Leal wrote: > Hello Jim, this is not a benchmark. The filenames i did change for privacy... > This is a NFS server, yes. > > # uname -a > SunOS test 5.11 snv_89 i86pc i386 i86pc > > # cat /etc/release >Solaris Express Community Edition snv_89 X86 >Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. > Use is subject to license terms. > Assembled 06 May 2008 > ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
Hi Marcelo and Jim, Marcelo Leal wrote: > Hello Jim, this is not a benchmark. The filenames i did change for privacy... > This is a NFS server, yes. > > # uname -a > SunOS test 5.11 snv_89 i86pc i386 i86pc > > # cat /etc/release >Solaris Express Community Edition snv_89 X86 >Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. > Use is subject to license terms. > Assembled 06 May 2008 > Out of curiosity, what happens if you turn off all the processors except 1 (via psradm(1)), and then run the script? max ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
Hi, I have looked at the script, and there is no correspondence between start and done. So, I am not sure how this script is supposed to work. I think there should be a predicate in the done probes... The way the script is written, it assumes that for any start, the done that fires after it is for the same noi_xid. Current script: nfsv3:::op-read-start, nfsv3:::op-write-start { start[args[1]->noi_xid] = timestamp; } nfsv3:::op-read-done, nfsv3:::op-write-done { ... Changed script: nfsv3:::op-read-start, nfsv3:::op-write-start { start[args[1]->noi_xid] = timestamp; } nfsv3:::op-read-done, nfsv3:::op-write-done /start[args[1]->noi_xid] != 0/ { That way, you don't have the done probe clause executing for id's where the start has not fired first. (This still does not match start/done for a given xid). But what do I know... max Jim Mauro wrote: > Also (I meant to ask) - are you having performance problems, or > just monitoring with the NFS provider scripts? > > Thanks, > /jim > > > Marcelo Leal wrote: > >> Hello Jim, this is not a benchmark. The filenames i did change for privacy... >> This is a NFS server, yes. >> >> # uname -a >> SunOS test 5.11 snv_89 i86pc i386 i86pc >> >> # cat /etc/release >>Solaris Express Community Edition snv_89 X86 >>Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. >> Use is subject to license terms. >> Assembled 06 May 2008 >> >> > ___ > dtrace-discuss mailing list > dtrace-discuss@opensolaris.org > > ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
Oops, that would be a nice test, but something i cannot do. ;-) [http://www.eall.com.br/blog] Leal. -- This message posted from opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
Some kind of both... ;-) I was investigating a "possible" performance problem, that i'm not sure if is the NFS server or not. So, i was faced with that weird numbers. I think one thing is not related with the other, but we need to fix whatever is the problem with the script or the provider, to have confidence on the tool. Except that numbers, the other latency values seems to be fine, don't you agree? Leal [http://www.eall.com.br/blog] -- This message posted from opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
D'oh! Good spot Max - feeling sheepish that I missed that. Marcelo - add the predicate to the "done" probes as per Max's message, and let's see where that takes us Thanks, /jim [EMAIL PROTECTED] wrote: > Hi, > I have looked at the script, and there is no correspondence between > start and done. > So, I am not sure how this script is supposed to work. > I think there should be a predicate in the done probes... > The way the script is written, it assumes that for any start, the done > that fires after it is for the same noi_xid. > > Current script: > > nfsv3:::op-read-start, > nfsv3:::op-write-start > { >start[args[1]->noi_xid] = timestamp; > } > > nfsv3:::op-read-done, > nfsv3:::op-write-done > { > ... > > Changed script: > > nfsv3:::op-read-start, > nfsv3:::op-write-start > { >start[args[1]->noi_xid] = timestamp; > } > > nfsv3:::op-read-done, > nfsv3:::op-write-done > /start[args[1]->noi_xid] != 0/ > { > > That way, you don't have the done probe clause executing > for id's where the start has not fired first. (This still does not > match start/done for a given xid). > But what do I know... > > max > > Jim Mauro wrote: >> Also (I meant to ask) - are you having performance problems, or >> just monitoring with the NFS provider scripts? >> >> Thanks, >> /jim >> >> >> Marcelo Leal wrote: >> >>> Hello Jim, this is not a benchmark. The filenames i did change for >>> privacy... >>> This is a NFS server, yes. >>> >>> # uname -a >>> SunOS test 5.11 snv_89 i86pc i386 i86pc >>> >>> # cat /etc/release >>>Solaris Express Community Edition snv_89 X86 >>>Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. >>> Use is subject to license terms. >>> Assembled 06 May 2008 >>> >> ___ >> dtrace-discuss mailing list >> dtrace-discuss@opensolaris.org >> >> > ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
Hi Marcelo, Marcelo Leal wrote: > Some kind of both... ;-) > I was investigating a "possible" performance problem, that i'm not sure if > is the NFS server or not. > So, i was faced with that weird numbers. I think one thing is not related > with the other, but we need to fix whatever is the problem with the script or > the provider, to have confidence on the tool. Except that numbers, the other > latency values seems to be fine, don't you agree? > Yes. And I think it is a problem with the script. Did you try modifying the script as I suggested in my earlier email? max > Leal > [http://www.eall.com.br/blog] > ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
Marcelo Leal wrote: > Oops, that would be a nice test, but something i cannot do. ;-) > What is it that you cannot do? > [http://www.eall.com.br/blog] > > Leal. > ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
Hi Brendan, Brendan Gregg - Sun Microsystems wrote: > On Tue, Dec 09, 2008 at 05:04:49PM +0100, [EMAIL PROTECTED] wrote: > >> Hi, >> I have looked at the script, and there is no correspondence between >> start and done. >> So, I am not sure how this script is supposed to work. >> I think there should be a predicate in the done probes... >> The way the script is written, it assumes that for any start, the done >> that fires after it is for the same noi_xid. >> >> Current script: >> >> nfsv3:::op-read-start, >> nfsv3:::op-write-start >> { >> start[args[1]->noi_xid] = timestamp; >> } >> >> nfsv3:::op-read-done, >> nfsv3:::op-write-done >> { >> ... >> >> Changed script: >> >> nfsv3:::op-read-start, >> nfsv3:::op-write-start >> { >> start[args[1]->noi_xid] = timestamp; >> } >> >> nfsv3:::op-read-done, >> nfsv3:::op-write-done >> /start[args[1]->noi_xid] != 0/ >> { >> >> That way, you don't have the done probe clause executing >> for id's where the start has not fired first. >> > > Thanks, unusual for me to miss that. Anyway, now fixed on the wiki. > > >> (This still does not >> match start/done for a given xid). >> But what do I know... >> > > The xid is the key in the hash... > > Yes, and I think the check for non-zero is sufficient... > Brendan > > ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Is the nfs dtrace script right (from nfsv3 provider wiki)?
On Tue, Dec 09, 2008 at 05:04:49PM +0100, [EMAIL PROTECTED] wrote: > Hi, > I have looked at the script, and there is no correspondence between > start and done. > So, I am not sure how this script is supposed to work. > I think there should be a predicate in the done probes... > The way the script is written, it assumes that for any start, the done > that fires after it is for the same noi_xid. > > Current script: > > nfsv3:::op-read-start, > nfsv3:::op-write-start > { > start[args[1]->noi_xid] = timestamp; > } > > nfsv3:::op-read-done, > nfsv3:::op-write-done > { > ... > > Changed script: > > nfsv3:::op-read-start, > nfsv3:::op-write-start > { > start[args[1]->noi_xid] = timestamp; > } > > nfsv3:::op-read-done, > nfsv3:::op-write-done > /start[args[1]->noi_xid] != 0/ > { > > That way, you don't have the done probe clause executing > for id's where the start has not fired first. Thanks, unusual for me to miss that. Anyway, now fixed on the wiki. > (This still does not > match start/done for a given xid). > But what do I know... The xid is the key in the hash... Brendan -- Brendan Gregg, Sun Microsystems Fishworks.http://blogs.sun.com/brendan ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Trying to identify writer and/or reason for iowrite.
On Fri, Dec 05, 2008 at 05:15:09PM -0500, Jim Mauro wrote: > The problem you're running into is disk IO operations tend to occur > asynchronously to the thread that initiated the IO, so when the IO > provider probe fires, execname shows the process name for PID 0. > This is not uncommon when chasing disk and network IOs. You > need to capture the write further up the stack. Yes; This happens in particular with ZFS, which uses a pipeline and task queues to dispatch writes. It's a great FS design, but makes debugging more difficult. # dtrace -n 'io:::start { @[stack()] = count(); }' [...] genunix`ldi_strategy+0x54 zfs`vdev_disk_io_start+0x1a5 zfs`zio_vdev_io_start+0x148 zfs`zio_execute+0x7f zfs`vdev_queue_io_done+0x98 zfs`vdev_disk_io_done+0x2b zfs`zio_vdev_io_done+0x22 zfs`zio_execute+0x7f genunix`taskq_thread+0x1a7 unix`thread_start+0x8 1500 Brendan -- Brendan Gregg, Sun Microsystems Fishworks.http://blogs.sun.com/brendan ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org