Re: NFS problem (non-sleepable locks held)

2003-11-11 Thread Robert Watson

On Tue, 11 Nov 2003, cosmin wrote:

> I'm getting the following message when transfering data to a
> freebsd-current server via an nfs mount from another fbsd client. 
> 
> malloc() of "64" with the following non-sleepable locks held:  exclusive
> sleep mutex inp r = 0 (0xc1d250ac) locked @
> /usr/src/sys/netinet/udp_usrreq.c:378
> 
> The message shows up 12 times and then it doesn't show up anymore, even
> if I stop the transfer and start it again.  This server uses the nge
> driver for its network card.  It's running the sources from yesterday,
> Nov 10 2003. 
> 
> I've been having problems with one of our machines freezing up during
> long nfs transfers, and now i'm trying to reproduce the freeze on this
> test machine.  So far no luck, and the only oddity i've been getting is
> the above message. 
> 
> Could the above message be causing the freezes ? 

Could you hook up a serial console and turn on debug.witness_ddb.  When
you get this warning, you'll drop into the console debugger.  Type in
"trace" to get a stack trace.  You can then continue and turn it off again
(or drop into the debugger a few more times until you're able to run it
:-).  Basically, something is calling malloc with M_WAITOK while holding a
mutex.  Potentially this could cause stalls or resource deadlocks, but I
think it's likely not the source of your freezes.  On the other hand, it's
definitely worth fixing, and if it fixes the symptoms you're seeing, even
better :-).

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


NFS problem (non-sleepable locks held)

2003-11-11 Thread cosmin
I'm getting the following message when transfering data to a freebsd-current server 
via an nfs mount from another fbsd client.  

malloc() of "64" with the following non-sleepable locks held:
exclusive sleep mutex inp r = 0 (0xc1d250ac) locked @ 
/usr/src/sys/netinet/udp_usrreq.c:378

The message shows up 12 times and then it doesn't show up anymore, even if I stop the 
transfer and start it again.  This server uses the nge driver for its network card.  
It's running the sources from yesterday, Nov 10 2003.

I've been having problems with one of our machines freezing up during long nfs 
transfers, and now i'm trying to reproduce the freeze on this test machine.   So far 
no luck, and the only oddity i've been getting is the above message.

Could the above message be causing the freezes ?

Cosmin Stroe.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS problem

2003-07-17 Thread Terry Lambert
S³awek ¯ak wrote:
> Now I guess it's Solaris specific. If you want some more details, let me know.

Wish you'd said "Solaris" first; but of course, we probably would
have told you "Go ask on the Solaris-current mailing list at
Solaris.org -- oops, sorry, Sun charges for support" 8-).

As someone else pointed out, exporting it/mounting it NFSv2 only
will fix it for you.  And it's directory iteration vs. stat.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS problem

2003-07-17 Thread Sławek Żak
Peter Edwards <[EMAIL PROTECTED]> writes:

> Hi,
>
>> > All the files are 0-sized, dates are set back to the epoch and
>> > directories are seen as files. Exporting ufs2 filesystems works as
>> > expected.
>
> I've had problems like this exporting CDs via NFS to solaris.
> Sorry the details are murky, but if its the same problem, there's a 
> work-around.
> Check the dmesg output: does it complain about an "RRIP field" from the cd9660 
> code? From the source, I think it was
>
> "RRIP without PX field?"

Yep. Same thing here.

> The CDs in question were official Sun CDs with Solaris applications (which, of 
> course, doesn't mean their properly compliant to a standard, just that it's 
> likely others will run into the same problem)

Mine is Forte 7. It's from Sun too.

> If this is the issue, then mounting it with NFS v2 actually fixed the problem 
> for me: I assume the richer operations from v3 were tickling a problem not 
> noticed with v2.

Indeed. Works fine with version two. I don't know why it gets the file stats
wrong for CD9660 and ok for ufs2. It should be above the ISO9660 layer when nfsd
sees the files.

/S
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS problem

2003-07-17 Thread Peter Edwards
Hi,

> > All the files are 0-sized, dates are set back to the epoch and
> > directories are seen as files. Exporting ufs2 filesystems works as
> > expected.

I've had problems like this exporting CDs via NFS to solaris.
Sorry the details are murky, but if its the same problem, there's a 
work-around.
Check the dmesg output: does it complain about an "RRIP field" from the cd9660 
code? From the source, I think it was

"RRIP without PX field?"

The CDs in question were official Sun CDs with Solaris applications (which, of 
course, doesn't mean their properly compliant to a standard, just that it's 
likely others will run into the same problem)

If this is the issue, then mounting it with NFS v2 actually fixed the problem 
for me: I assume the richer operations from v3 were tickling a problem not 
noticed with v2.
-- 
Peter




___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS problem

2003-07-17 Thread Sławek Żak
Terry Lambert <[EMAIL PROTECTED]> writes:
>> I guess there is something wrong with exporting iso9660 CD's over NFS. I've added
>> 
>>   /cdrom -ro -mapall=root
>> 
>> to /etc/exports, restarted mountd and after mounting the CD on Solaris 8. All the
>> files are 0-sized, dates are set back to the epoch and directories are seen as
>> files. Exporting ufs2 filesystems works as expected.
> [ ... ]
>> Any thoughts?

[...]

> You are certain you don't see these same attributes on /cdrom
> itself, form a shell when you cd to /cdrom? 

I'm positive. The actual listing of /cdrom is:

thirst(2057)# ls -la
total 82
dr-xr-xr-x6 root  wheel   2048 Aug 28  2002 .
drwxr-xr-x   22 root  wheel512 Jul 16 16:11 ..
dr-xr-xr-x4 root  wheel   2048 Aug 28  2002 .install
dr-xr-xr-x3 root  wheel   2048 Aug 28  2002 .jvm
lr-xr-xr-x1 root  wheel 15 Aug 28  2002 Copyright -> image/Copyright
-r-xr-xr-x1 root  wheel263 Aug 28  2002 autorun
-r-xr-xr-x1 root  wheel 92 Aug 28  2002 autorun.inf
-r--r--r--1 root  wheel133 Aug 28  2002 cd.info
dr-xr-xr-x4 root  wheel   2048 Aug 28  2002 image
-r-xr-xr-x1 root  wheel   4361 Aug 28  2002 installer
lr-xr-xr-x1 root  wheel 20 Aug 28  2002 installing.pdf -> image/installing.pdf
lr-xr-xr-x1 root  wheel 23 Aug 28  2002 release_notes.txt -> 
image/release_notes.txt
dr-xr-xr-x  310 root  wheel  38912 Jan  1  1970 rr_moved
-r-xr-xr-x1 root  wheel  28672 Aug 28  2002 setup.exe
-r-xr-xr-x1 root  wheel   1646 Aug 28  2002 volstart

If I mount it from other FreeBSD or Tru64 host, it's also seen properly. I guess
it's just Solaris problem. I tried Solaris 7,8,9, Tru64 5.0,5.1,5.1a and FreeBSD
4.7,4.8 and 5.0.

> If your answer is "no", then it's definitely the externalization of the stat
> structure and things like struct direct.  Note that the NFS over-the-wire stat
> structure is *not* the same as the FFS version which it exports to the stat(2)
> and fstat(2) system calls.  Probably the thing to do is to look at the
> differences in the code, and not assume that the VFS client is always the
> system call layer.

Now I guess it's Solaris specific. If you want some more details, let me know.

/S
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS problem

2003-07-16 Thread Terry Lambert
S³awek ¯ak wrote:
> I guess there is something wrong with exporting iso9660 CD's over NFS. I've added
> 
>   /cdrom -ro -mapall=root
> 
> to /etc/exports, restarted mountd and after mounting the CD on Solaris 8. All the
> files are 0-sized, dates are set back to the epoch and directories are seen as
> files. Exporting ufs2 filesystems works as expected.
[ ... ]
> Any thoughts?

Yes.  Exporting FS's is not as simple as simply doing it.  The
API in FreeBSD needs changed somewhat to support this at a higher
layer.  Even so, the hacked ISO9660 code is actually OK for the
volume export.  The problem comes in that the consumer of the FS
is assumed to have the ability to translate certain attributes
when the struct direct and stat structures are externalized to
the NFS server code, which is a VFS client.

You are certain you don't see these same attributes on /cdrom
itself, form a shell when you cd to /cdrom?  If your answer is
"no", then it's definitely the externalization of the stat
structure and things like struct direct.  Note that the NFS
over-the-wire stat structure is *not* the same as the FFS
version which it exports to the stat(2) and fstat(2) system
calls.  Probably the thing to do is to look at the differences
in the code, and not assume that the VFS client is always the
system call layer.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


NFS problem

2003-07-16 Thread Sławek Żak
I guess there is something wrong with exporting iso9660 CD's over NFS. I've added

  /cdrom -ro -mapall=root

to /etc/exports, restarted mountd and after mounting the CD on Solaris 8. All the
files are 0-sized, dates are set back to the epoch and directories are seen as
files. Exporting ufs2 filesystems works as expected.

Example listing looks like this:

rohan:root> ls -la
total 5
dr-xr-xr-x   6 root root2048 Aug 28  2002 .
dr-xr-xr-x   3 root root   3 Jul 16 12:42 ..
-r-xr-xr-x   1 root root   0 Jan  1  1970 .install
-r-xr-xr-x   1 root root   0 Jan  1  1970 .jvm
-r-xr-xr-x   1 root root   0 Jan  1  1970 Copyright
-r-xr-xr-x   1 root root   0 Jan  1  1970 autorun
-r-xr-xr-x   1 root root   0 Jan  1  1970 autorun.inf
-r-xr-xr-x   1 root root   0 Jan  1  1970 cd.info
-r-xr-xr-x   1 root root   0 Jan  1  1970 image
-r-xr-xr-x   1 root root   0 Jan  1  1970 installer
-r-xr-xr-x   1 root root   0 Jan  1  1970 installing.pdf
-r-xr-xr-x   1 root root   0 Jan  1  1970 release_notes.txt
-r-xr-xr-x   1 root root   0 Jan  1  1970 rr_moved
-r-xr-xr-x   1 root root   0 Jan  1  1970 setup.exe
-r-xr-xr-x   1 root root   0 Jan  1  1970 volstart

Any thoughts?

I attach output file from tcpdump generated with:

  tcpdump -w /tmp/tcpdump-s-1500.out -s 1500 -ln host rohan and not port 22

/S



tcpdump-s-1500.out
Description: Binary data
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Serious server-side NFS problem

1999-12-18 Thread Matthew Dillon

:Hmm, interesting.  Do you have a 3C905B kicking around there somewhere 
:that you could repeat the profiling run with?  I must admit I hadn't had 
:a chance to look at a profile dump using fxp, and this comes as a bit of 
:a surprise.

I have two but they are both on slow machines (read: can't saturate
the network) with old (pre signal changes) kernels and I don't have
time to upgrade them, so no profiling is possible for now.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-18 Thread Mike Smith

> :That's interesting then, since your results are somewhat at odds with 
> :what I've seen so far regarding interrupt load for network traffic.  Do 
> :you have any profiling results that point the finger more directly at 
> :anything?
> :
> :-- 
> 
> Ok, here is the kernel gprof output for one of my -current test
> boxes.  This one is a duel 450 MHz P-III but running a UP kernel,
> and a built-in intel ethernet.
...
> I've included the entire gprof output below, but the pertainant section
> is #8 and #9 indicating that 19.8% of the cpu is being eaten in the 
> fxp interrupt code.
> 
> The lion's share appears to be fxp_add_rfabuf(), which takes 10% of
> the cpu all by itself (see #11), and most of that appears to be in
> the splx() code, which seems bogus but that is what it says.  I 
> presume the splimp()/splx() calls it is making are coming from the
> MBUF macros.

Hmm, interesting.  Do you have a 3C905B kicking around there somewhere 
that you could repeat the profiling run with?  I must admit I hadn't had 
a chance to look at a profile dump using fxp, and this comes as a bit of 
a surprise.


-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  [EMAIL PROTECTED]
\\ and he'll hate you for a lifetime. \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-17 Thread Mike Smith

> 
> :< said:
> :
> :> the IP and UDP checksum guessing, but more that I think you'll find that 
> :> a considerable amount of the inbound NFS traffic handling is actually 
> :> performed in the interrupt context
> :
> :If it is, then there is a serious bug.
> 
> No serious NFS traffic handling is done in the interrupt context.  The
> packets are essentially just queued up for nfsd to deal with.

That's interesting then, since your results are somewhat at odds with 
what I've seen so far regarding interrupt load for network traffic.  Do 
you have any profiling results that point the finger more directly at 
anything?

-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  [EMAIL PROTECTED]
\\ and he'll hate you for a lifetime. \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-17 Thread Andrew Gallatin


Kenneth D. Merry writes:
 > 
 > 
 > Another advantage with gigabit ethernet is that if you can do jumbo frames,
 > you can fit an entire 8K NFS packet in one frame.
 > 
 > I'd like to see NFS numbers from two 21264 Alphas with GigE cards, zero
 > copy, checksum offloading and a big striped array on one end at least.  I

Well.. maybe this will work for you ;-)

2 21264 alphas (500MHz XP1000S), 640MB RAM, Myrinet/Trapeze using
64-bit Myrinet cards, 8K cluster mbufs, UDP checksums disabled (we can
do checksum offloading at the receiver only).  We have a 56K MTU.
Using this setup, *without* zero copy, we get roughly 140MB/sec out of
TCP:

% netperf -Hbroil-my
TCP STREAM TEST to broil-my : histogram
Recv   SendSend  
Socket Socket  Message  Elapsed  
Size   SizeSize Time Throughput  
bytes  bytes   bytessecs.10^6bits/sec  

524288 524288 52428810.011135.20   

And about 900Mb/sec (112MB/sec) out of UDP using an 8k message size:

% netperf -Hbroil-my -tUDP_STREAM -- -m 8192
UDP UNIDIRECTIONAL SEND TEST to broil-my : histogram
Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

 573448192   10.00  165619  01084.94
 65535   10.00  137338899.68


I have exported a local disk on broil-my and created a 512MB file
(zot).  Both machines have 640MB of ram and the test file is fully
cached on the server.  When reading the file from the client, I have
found the best I can do is roughly 57MB/sec:

# mount_nfs -a 3 -r 16384 boil-my:/var/tmp /mnt
# dd if=/mnt/zot of=/dev/null bs=64k
8192+0 records in
8192+0 records out
536870912 bytes transferred in 9.658521 secs (55585209 bytes/sec)
# umount /mnt
# mount_nfs -a 3 -r 32768 boil-my:/var/tmp /mnt
# if=/mnt/zot of=/dev/null bs=64k
8192+0 records in
8192+0 records out
536870912 bytes transferred in 9.513517 secs (56432433 bytes/sec)

Emperically, it seems that -a 3 performs better than -a 2 or -a 4.
Also, the bandwidth seems to max out with a 16k read size.  Increasing 
much beyond that doesn't seem to help.  Varying the number if nfsiods 
across between 2,4 & 20 doesn't seem to matter much.  

Running iprobe on the client (http://www.cs.duke.edu/ari/iprobe.html)
shows us that we are spending:

- 29.4% in bcopy -- this doesn't change a lot if I enable/disable
vfs_ioopt.  I suspect that this is from bcopy'ing data out of mbufs,
not crossing the user/kernel boundary.  In either case, there's not
much that can be done to reduce this in a generic manner.

-  5.5% tsleep (contention between nfsiods?)

The "top" functions/components are:

Name Count   Pct   Pct
--   -   ---   ---
kernel412890.0 

bcopy_samealign_lp1347  32.6  29.4 
procrunnable   279   6.8   6.1 
tsleep 256   6.2   5.6 
Lidle2 195   4.7   4.3 
m_freem 89   2.2   1.9 
soreceive   73   1.8   1.6 
lockmgr 63   1.5   1.4 
brelse  60   1.5   1.3 
vm_page_free_toq55   1.3   1.2 
ovbcopy 51   1.2   1.1 
wakeup  43   1.0   0.9 
acquire 42   1.0   0.9 
bcopy_da_lp 42   1.0   0.9 
nfs_request 41   1.0   0.9 
ip_input40   1.0   0.9 
biodone 39   0.9   0.9 
nfs_readrpc 38   0.9   0.8 
vm_page_alloc   36   0.9   0.8 
<...>
--
/modules/tpz.ko435 9.5 

tpz.ko is the myrinet device driver.  This is saying that the system
spent 90% of its time in the static kernel, 9.5% in the device driver, 
and 0.5% in userland.

The server is also close to maxed-out.  I can provide an iprobe
breakdown for it as well, and/or complete breakdowns for the client
and server.  


Cheers,

Drew


--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-17 Thread Matthew Dillon


:< said:
:
:> the IP and UDP checksum guessing, but more that I think you'll find that 
:> a considerable amount of the inbound NFS traffic handling is actually 
:> performed in the interrupt context
:
:If it is, then there is a serious bug.
:
:-GAWollman
:
:--
:Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same

No serious NFS traffic handling is done in the interrupt context.  The
packets are essentially just queued up for nfsd to deal with.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-17 Thread Rodney W. Grimes

...
> (200-300 MHz) clients.  That's *with* packet loss (for some reason when
> my fxp ethernets pump data out that quickly they tend to cause packet
> loss in other parts of my HUBed network, which I find quite annoying).

Interesting you should say that  I've been playing with some broadcom
based ASIC 100BaseTX full duplex switches and I actually loose more packets
due to overrunning the buffers in the switch than I do if I used a half duplex
standard hub.  :-(

Performance for most things overall on the network is better with the
switch, but direct high bandwidth traffic between 2 machines has suffered
due to the conversion to a fully switched network.

Seems FreeBSD (using dc21143 based cards) can pump data around so damn
fast that the switch can't keep up :-(.  I need to do some more testing
to find out if this occurs between ports on the same ASIC or only when
packets have to go out to the ASIC to ASIC bridge bus.

Also how do the fxp and dc based cards respond to flow control?
Do we obey it?  Do the cards even understand it?

-- 
Rod Grimes - KD7CAX @ CN85sl - (RWG25)   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-17 Thread Garrett Wollman

< said:

> the IP and UDP checksum guessing, but more that I think you'll find that 
> a considerable amount of the inbound NFS traffic handling is actually 
> performed in the interrupt context

If it is, then there is a serious bug.

-GAWollman

--
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
[EMAIL PROTECTED]  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-17 Thread Doug Rabson

On Wed, 15 Dec 1999, Matthew Dillon wrote:

> Here's a general update on this bug report to -current.  It took all day
> but I was finally able to reproduce Andrew's bug.
> 
> You guys are going to *love* this.
> 
> NFS uses the kernel 'boottime' structure to generate its version id.
> Now normally you might believe that this structure, once set, will
> never change.  The authors of NFS certainly make that assumption!
> 
> No such luck.  If you happen to be running, oh, xntpd for example,
> the kernel adjusts the boottime structure to take into account time
> changes, including PLL changes so, in fact, the boottime structure
> can change quite often - once each tick, in fact.

Nice catch, Matt.

--
Doug Rabson Mail:  [EMAIL PROTECTED]
Nonlinear Systems Ltd.  Phone: +44 181 442 9037




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-17 Thread Mike Smith

> 
> We've solved most of the performance issues, but NFS is still
> eating a little too much cpu for my tastes.  Unfortunately it is getting to
> the point where a significant portion of the performance loss is occuring
> in the network driver itself.  Some of my cards eat 25% of the cpu just
> in 'interrupt' (at 10 MBytes/sec half duplex), not even counting the
> TCP or UDP stacks.  This is mainly due to the MTU being too small (i.e.
> packet fragmentation takes it toll on the interrupt subsystem).  SCSI 
> cards are way ahead of NIC cards in regards to reducing interrupt 
> overhead (though gigabit NICs have caught up some).

Actually, I'm not sure I buy this at all.  Both the EtherExpress and 
3C905 families give less than one interrupt per datagram, and the 
other overheads on them are comparably small.

I think you'll want to do some profiling before getting too concerned
about the network drivers themselves; gigabit hardware isn't really any
lighter on the CPU than good 100Mbps hardware, and we can handle better
than 400MBps UDP inbound on a reasonable (400MHz) system right now.
(Lots better with jumbo frames.)

My guesses (based on some of the profiling that Bill Paul did) would be 
the IP and UDP checksum guessing, but more that I think you'll find that 
a considerable amount of the inbound NFS traffic handling is actually 
performed in the interrupt context (ie. I don't think that stuff is 
being handed off to a softnet handler), blowing out the numbers a bit.

-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  [EMAIL PROTECTED]
\\ and he'll hate you for a lifetime. \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Kenneth D. Merry

On Thu, Dec 16, 1999 at 19:28:34 -0800, Matthew Dillon wrote:
> :Matthew Dillon writes:
> :
> : > We're already testing a patch.
> :
> :Thanks again Matt!
> :
> :The latest rev of nfs_serv.c has fixed it.  
> :
> :I'm now seeing FreeBSD UDP client read bandwidth of 9.2MB/sec & write
> :bandwidth of 10.9MB/sec.  Solaris clients are writing over TCP at
> :10.1MB/sec (and that is across a router!) and are reading at 7MB/sec.

[ ... ]

> :Andrew Gallatin, Sr Systems Programmer   http://www.cs.duke.edu/~gallatin
> :Duke University  Email: [EMAIL PROTECTED]
> 
> Those are really quite excellent results!  Linux eat your heart out!  I
> get 9.5 to 10.5 MBytes/sec on my half-duplex network between two fast
> machines.  I tend to get between 7.5 and 9 MBytes/sec when using slower
> (200-300 MHz) clients.  That's *with* packet loss (for some reason when
> my fxp ethernets pump data out that quickly they tend to cause packet
> loss in other parts of my HUBed network, which I find quite annoying).
> 
> We've solved most of the performance issues, but NFS is still
> eating a little too much cpu for my tastes.  Unfortunately it is getting to
> the point where a significant portion of the performance loss is occuring
> in the network driver itself.  Some of my cards eat 25% of the cpu just
> in 'interrupt' (at 10 MBytes/sec half duplex), not even counting the
> TCP or UDP stacks.  This is mainly due to the MTU being too small (i.e.
> packet fragmentation takes it toll on the interrupt subsystem).  SCSI 
> cards are way ahead of NIC cards in regards to reducing interrupt 
> overhead (though gigabit NICs have caught up some).


Another advantage with gigabit ethernet is that if you can do jumbo frames,
you can fit an entire 8K NFS packet in one frame.

I'd like to see NFS numbers from two 21264 Alphas with GigE cards, zero
copy, checksum offloading and a big striped array on one end at least.  I
bet you could get pretty good performance with a setup like that.  (Now
don't anybody go out and do it unless you really want to spend the time.)

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Matthew Dillon


:Matthew Dillon writes:
:
: > We're already testing a patch.
:
:Thanks again Matt!
:
:The latest rev of nfs_serv.c has fixed it.  
:
:I'm now seeing FreeBSD UDP client read bandwidth of 9.2MB/sec & write
:bandwidth of 10.9MB/sec.  Solaris clients are writing over TCP at
:10.1MB/sec (and that is across a router!) and are reading at 7MB/sec.
:
:Awesome!
:
:Thanks,
:
:Drew
:
:--
:Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin
:Duke UniversityEmail: [EMAIL PROTECTED]

Those are really quite excellent results!  Linux eat your heart out!  I
get 9.5 to 10.5 MBytes/sec on my half-duplex network between two fast
machines.  I tend to get between 7.5 and 9 MBytes/sec when using slower
(200-300 MHz) clients.  That's *with* packet loss (for some reason when
my fxp ethernets pump data out that quickly they tend to cause packet
loss in other parts of my HUBed network, which I find quite annoying).

We've solved most of the performance issues, but NFS is still
eating a little too much cpu for my tastes.  Unfortunately it is getting to
the point where a significant portion of the performance loss is occuring
in the network driver itself.  Some of my cards eat 25% of the cpu just
in 'interrupt' (at 10 MBytes/sec half duplex), not even counting the
TCP or UDP stacks.  This is mainly due to the MTU being too small (i.e.
packet fragmentation takes it toll on the interrupt subsystem).  SCSI 
cards are way ahead of NIC cards in regards to reducing interrupt 
overhead (though gigabit NICs have caught up some).

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Andrew Gallatin


Matthew Dillon writes:

 > We're already testing a patch.

Thanks again Matt!

The latest rev of nfs_serv.c has fixed it.  

I'm now seeing FreeBSD UDP client read bandwidth of 9.2MB/sec & write
bandwidth of 10.9MB/sec.  Solaris clients are writing over TCP at
10.1MB/sec (and that is across a router!) and are reading at 7MB/sec.

Awesome!

Thanks,

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Mike Smith

> 
>
> On Thu, 16 Dec 1999, Warner Losh wrote:
> 
> > In message <[EMAIL PROTECTED]> Poul-Henning Kamp writes:
> > : If people do a "settimeofday" we change the boot time since the
> > : amount of time we've been up *IS* known for sure, whereas the boottime
> > : is only an estimate.
> > 
> > There is one problem with this.  The amount of uptime isn't the same
> > as the amount of time since the machine booted.  How can this happen?
> > When a laptop suspends, it doesn't update the update while it is
> > asleep, nor does it update the uptime by the amount of time that has
> > been slept.  IS this a bug in the apm code?
> 
> IIRC it does update uptime properly after a suspend in 2.2.8 but does not
> do so in 3.X and -current on my ThinkPad 770.

Not updating uptime to account for time slept is the "correct" behaviour 
given the way the kernel currently thinks about things, where "correct" 
is defined as "most survivable".

-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  [EMAIL PROTECTED]
\\ and he'll hate you for a lifetime. \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Andrew Kenneth Milton

+[ Warner Losh ]-
|
| There is one problem with this.  The amount of uptime isn't the same
| as the amount of time since the machine booted.  How can this happen?
| When a laptop suspends, it doesn't update the update while it is
| asleep, nor does it update the uptime by the amount of time that has
| been slept.  IS this a bug in the apm code?

The machine is neither up or down until you collapse the even horizon and
unsuspend it to observe its state d8)

-- 
Totally Holistic Enterprises Internet|  P:+61 7 3870 0066   | Andrew Milton
The Internet (Aust) Pty Ltd  |  F:+61 7 3870 4477   | 
ACN: 082 081 472 |  M:+61 416 022 411   | Carpe Daemon
PO Box 837 Indooroopilly QLD 4068|[EMAIL PROTECTED]| 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Tom Bartol



On Thu, 16 Dec 1999, Warner Losh wrote:

> In message <[EMAIL PROTECTED]> Tom Bartol 
>writes:
> : I tried 3.0-current after this merge, suspend and resume worked fine on my
> : 770 with the exception of uptime.
> 
> I can confirm that uptime, at least as reported by uptime(1), isn't
> increased in the latest -current.
> 
> Warner
> 

I confirm this as well.  Perhaps after suspend we need:

Allow ntp to update time and adjust boottime as necessary.
Then set uptime = time-boottime

Tom




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Warner Losh

In message <[EMAIL PROTECTED]> Tom Bartol 
writes:
: I tried 3.0-current after this merge, suspend and resume worked fine on my
: 770 with the exception of uptime.

I can confirm that uptime, at least as reported by uptime(1), isn't
increased in the latest -current.

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Tom Bartol



On Thu, 16 Dec 1999, Warner Losh wrote:

> In message <[EMAIL PROTECTED]> Tom Bartol 
>writes:
> : IIRC it does update uptime properly after a suspend in 2.2.8 but does not
> : do so in 3.X and -current on my ThinkPad 770.
> 
> define correctly.  Eg, if I suspend for an hour it adds an hour?
> 
> Warner
> 

Yeah, that's what I meant by "correctly".  I don't recall seeing a
"thundering herd" effect afterwards.  Hmmm... which reminds me, I believe
this was not stock 2.2.8 but rather 2.2.8-PAO.  I had thought that the
lion's share of PAO code got merged into 3.0-current at some point.  When
I tried 3.0-current after this merge, suspend and resume worked fine on my
770 with the exception of uptime.

Tom




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Warner Losh

In message <[EMAIL PROTECTED]> Tom Bartol 
writes:
: IIRC it does update uptime properly after a suspend in 2.2.8 but does not
: do so in 3.X and -current on my ThinkPad 770.

define correctly.  Eg, if I suspend for an hour it adds an hour?

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Warner Losh

In message <[EMAIL PROTECTED]> Poul-Henning Kamp writes:
: Well, I don't think anybody has seriously thought about what the right
: semantics for APM is, and consequently the code we have is rather evil.

Don't know if I'd go so far as to say evil, but there are some pola
issues.

: What to do is a definition question more than anything, and I guess the
: answer to the question:
: 
:   if I call timeout(bla bla bla, 3600*hz) and suspend the machine
:   for half an hour, how long time after it resumes will I be
:   called ?
: 
: will point the direction.

It will be called 3600*hz softclock ticks after the original timeout
was called, which could be >> a wall time of 1 hour.  I'd guess that it
would be after 1 hour and 30 minutes.

: In other words:
:   Do routes expire while suspended ?  
:   Do TCP timers tick ?
:
: I would say "they sure should do, because they relates to external
: events" (if we accept that as the answer we need to to call softclock
: a LOT of times when we come out of suspend).

Uggg.  That's right.  The current approach is to step the current time
by the amount of time we slept, based on rtc measurements (high
precision time keeping at its absolute worst).

: In reality we have not clear definition of "suspend" for a unix system,
: and the kernel may need to learn about "timeouts on the kernel consious 
: timescale" vs. "timeouts on the wallclock timescale" and similar hair.

Yes.  Some timeouts don't matter over a suspend (eg make sure that
this card isn't wedged) while some should take the suspend into
account (need to expire routes, arp entries, etc)

Warner



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Nate Williams

> : If people do a "settimeofday" we change the boot time since the
> : amount of time we've been up *IS* known for sure, whereas the boottime
> : is only an estimate.
> 
> There is one problem with this.  The amount of uptime isn't the same
> as the amount of time since the machine booted.  How can this happen?
> When a laptop suspends, it doesn't update the update while it is
> asleep, nor does it update the uptime by the amount of time that has
> been slept.

FWIW, we had code in the tree (just before the timeout_ch changes) that
did update all of the timeouts to 'fire' when the laptop was resumed.

This caused a 'thundering herd' problem at resume, but I don't see any
way around it...  However, it was lost when we changed to the different
timeout code.




Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Tom Bartol



On Thu, 16 Dec 1999, Warner Losh wrote:

> In message <[EMAIL PROTECTED]> Poul-Henning Kamp writes:
> : If people do a "settimeofday" we change the boot time since the
> : amount of time we've been up *IS* known for sure, whereas the boottime
> : is only an estimate.
> 
> There is one problem with this.  The amount of uptime isn't the same
> as the amount of time since the machine booted.  How can this happen?
> When a laptop suspends, it doesn't update the update while it is
> asleep, nor does it update the uptime by the amount of time that has
> been slept.  IS this a bug in the apm code?
> 
> Warner
> 

IIRC it does update uptime properly after a suspend in 2.2.8 but does not
do so in 3.X and -current on my ThinkPad 770.

Tom






To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, Warner Losh writes:
>In message <[EMAIL PROTECTED]> Poul-Henning Kamp writes:
>: If people do a "settimeofday" we change the boot time since the
>: amount of time we've been up *IS* known for sure, whereas the boottime
>: is only an estimate.
>
>There is one problem with this.  The amount of uptime isn't the same
>as the amount of time since the machine booted.  How can this happen?
>When a laptop suspends, it doesn't update the update while it is
>asleep, nor does it update the uptime by the amount of time that has
>been slept.  IS this a bug in the apm code?

Well, I don't think anybody has seriously thought about what the right
semantics for APM is, and consequently the code we have is rather evil.

What to do is a definition question more than anything, and I guess the
answer to the question:

if I call timeout(bla bla bla, 3600*hz) and suspend the machine
for half an hour, how long time after it resumes will I be
called ?

will point the direction.

In other words:
Do routes expire while suspended ?  
Do TCP timers tick ?

I would say "they sure should do, because they relates to external
events" (if we accept that as the answer we need to to call softclock
a LOT of times when we come out of suspend).

In reality we have not clear definition of "suspend" for a unix system,
and the kernel may need to learn about "timeouts on the kernel consious 
timescale" vs. "timeouts on the wallclock timescale" and similar hair.

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Warner Losh

In message <[EMAIL PROTECTED]> Poul-Henning Kamp writes:
: If people do a "settimeofday" we change the boot time since the
: amount of time we've been up *IS* known for sure, whereas the boottime
: is only an estimate.

There is one problem with this.  The amount of uptime isn't the same
as the amount of time since the machine booted.  How can this happen?
When a laptop suspends, it doesn't update the update while it is
asleep, nor does it update the uptime by the amount of time that has
been slept.  IS this a bug in the apm code?

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Matthew Dillon


:
:
:>Yeah, uptime is moving which makes it difficult for me too. When new
:>machines enter the network, they need to announce a number which is used to
:>decice who will become the master if the current master disappears. I could
:>just announce currenttime-uptime, but that's got a slightly different
:>meaning that I'll have to consider.
:
:just announce uptime, the one with the largest number wins.
:
:--
:Poul-Henning Kamp FreeBSD coreteam member
:[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
:FreeBSD -- It will take a long time before progress goes too far!

Go outside, kill the circuit breaker, then turn it back on.

It's easier just to use the IP address.  Highest number wins.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Poul-Henning Kamp


>Yeah, uptime is moving which makes it difficult for me too. When new
>machines enter the network, they need to announce a number which is used to
>decice who will become the master if the current master disappears. I could
>just announce currenttime-uptime, but that's got a slightly different
>meaning that I'll have to consider.

just announce uptime, the one with the largest number wins.


--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Kevin Day

> 
> > In message <[EMAIL PROTECTED]>, Kevin Day writes:
> > 
> > >Ack, I was using this very same thing for several devices in an isolated
> > >peer-to-peer network to decide who the 'master' was. (Whoever had been up
> > >longest knew more about the state of the network) Having this change could
> > >cause weirdness for me too... I assumed (without checking *thwap*) that
> > >boottime was a constant.
> > >
> > >Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
> > >'boottime' gets initialized so that others can use it, not just NFS? :)
> > 
> > no, I think that is a bad idea.  In your case you want to use the
> > "uptime" which *is* a measure of how long the system has been
> > running.
> 
> Uptime is also a constantly changing number.  Forgive me for my
> ignorance, but why does bootime constantly change?  I would have thought
> it would be a constant?  I've got software that also uses this to
> determine when a new copy of it exists (although I do keep a local cache
> of the value in case my software crashes, since it can recover from a
> crash, but not a reboot).
> 
> I would think that boottime would be constant, since you didn't keep
> booting at a different time...
> 

Yeah, uptime is moving which makes it difficult for me too. When new
machines enter the network, they need to announce a number which is used to
decice who will become the master if the current master disappears. I could
just announce currenttime-uptime, but that's got a slightly different
meaning that I'll have to consider.

Anyway, enough of my proprietary mess, but... I do see a few uses for a
non-moving boottime, but won't argue here or now. :) This behaviour is
documented in time(9) though, so I really can't complain. :)

Kevin


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Matthew Reimer

Matt, you are a tenacious, fearsome bug hunter!

Matt

Matthew Dillon wrote:
> 
> Here's a general update on this bug report to -current.  It took all day
> but I was finally able to reproduce Andrew's bug.
> 
> You guys are going to *love* this.
> 
> NFS uses the kernel 'boottime' structure to generate its version id.
> Now normally you might believe that this structure, once set, will
> never change.  The authors of NFS certainly make that assumption!
> 
> No such luck.  If you happen to be running, oh, xntpd for example,
> the kernel adjusts the boottime structure to take into account time
> changes, including PLL changes so, in fact, the boottime structure
> can change quite often - once each tick, in fact.
> 
> Now, the effect of boottime changing on NFS is rather drastic.  You
> see, the version id controls whether NFS clients must reset their
> state machines for NFS data writes.  If a client has done a stage 1
> write and is ready to do the stage 2 commit, and the version id
> changes, the client must revert back to stage 1.
> 
> And so Andrews bug report comes into the light!   His poor client
> (and mine once I reproduced the bug) got into a state, due to the
> server returning a different version id for virtually every packet,
> where it resent the same write data over the network over and over
> and over and over and over again.
> 
> I think recent changes to the way the kernel clocks work in -current
> brought the bug out into the open, but it's definitely a problem in
> both -stable and -current.
> 
> Doh!  I gotta say that if I hadn't happened to have been running xntpd
> on my test box I would have *never* figured it out.
> 
> -Matt
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, Nate Williams writes:
>> In message <[EMAIL PROTECTED]>, Kevin Day writes:
>> 
>> >Ack, I was using this very same thing for several devices in an isolated
>> >peer-to-peer network to decide who the 'master' was. (Whoever had been up
>> >longest knew more about the state of the network) Having this change could
>> >cause weirdness for me too... I assumed (without checking *thwap*) that
>> >boottime was a constant.
>> >
>> >Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
>> >'boottime' gets initialized so that others can use it, not just NFS? :)
>> 
>> no, I think that is a bad idea.  In your case you want to use the
>> "uptime" which *is* a measure of how long the system has been
>> running.
>
>Uptime is also a constantly changing number.  Forgive me for my
>ignorance, but why does bootime constantly change?  I would have thought
>it would be a constant?  I've got software that also uses this to
>determine when a new copy of it exists (although I do keep a local cache
>of the value in case my software crashes, since it can recover from a
>crash, but not a reboot).
>
>I would think that boottime would be constant, since you didn't keep
>booting at a different time...

Well, our timekeeping is done my having two estimates:  The amount of
time on the UTC timescale since we booted and the time we did so in
UTC.

If people do a "settimeofday" we change the boot time since the
amount of time we've been up *IS* known for sure, whereas the boottime
is only an estimate.

The ntp pll adjusts the frequency of our clock, since that is a
frequency adjustment.

The sticky but is the gross "adjtime" call.  Since the primary user
of this is the timed daemon, which issues phase adjustments, not
frequency adjustments, it fiddles boottime, but still using the
"slowly converge" method so we don't step the clock more than we
need to.

The reason why xntpd used tickadj was that enabling the kernel
pll for xntpd was rather obscure and people never did.  This problem
is now gone with NTPv4 (Thanks Roberto!)

So after today, the problem is gone, unless you use timed/tickadj
or other broken clock synchronizers.

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Peter Wemm

Nate Williams wrote:
> > In message <[EMAIL PROTECTED]>, Kevin Day writes:
> > 
> > >Ack, I was using this very same thing for several devices in an isolated
> > >peer-to-peer network to decide who the 'master' was. (Whoever had been up
> > >longest knew more about the state of the network) Having this change could
> > >cause weirdness for me too... I assumed (without checking *thwap*) that
> > >boottime was a constant.
> > >
> > >Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
> > >'boottime' gets initialized so that others can use it, not just NFS? :)
> > 
> > no, I think that is a bad idea.  In your case you want to use the
> > "uptime" which *is* a measure of how long the system has been
> > running.
> 
> Uptime is also a constantly changing number.  Forgive me for my
> ignorance, but why does bootime constantly change?  I would have thought
> it would be a constant?  I've got software that also uses this to
> determine when a new copy of it exists (although I do keep a local cache
> of the value in case my software crashes, since it can recover from a
> crash, but not a reboot).
> 
> I would think that boottime would be constant, since you didn't keep
> booting at a different time...

Uptime is a monotonically increasing time starting at zero.  Whenever the
time-of-day adjusts to add or remove time, rather than changing the
"uptime", we change the "origin" of timeofday and boottime.  This means that
we don't have to walk the entire process list and intercept all the timers and
adjust them for the changing number of ticks in uptime etc.

Cheers,
-Peter




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Nate Williams

> In message <[EMAIL PROTECTED]>, Kevin Day writes:
> 
> >Ack, I was using this very same thing for several devices in an isolated
> >peer-to-peer network to decide who the 'master' was. (Whoever had been up
> >longest knew more about the state of the network) Having this change could
> >cause weirdness for me too... I assumed (without checking *thwap*) that
> >boottime was a constant.
> >
> >Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
> >'boottime' gets initialized so that others can use it, not just NFS? :)
> 
> no, I think that is a bad idea.  In your case you want to use the
> "uptime" which *is* a measure of how long the system has been
> running.

Uptime is also a constantly changing number.  Forgive me for my
ignorance, but why does bootime constantly change?  I would have thought
it would be a constant?  I've got software that also uses this to
determine when a new copy of it exists (although I do keep a local cache
of the value in case my software crashes, since it can recover from a
crash, but not a reboot).

I would think that boottime would be constant, since you didn't keep
booting at a different time...



Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Andrew Gallatin


Matthew Dillon writes:
 > 
 > And so Andrews bug report comes into the light!   His poor client
 > (and mine once I reproduced the bug) got into a state, due to the
 > server returning a different version id for virtually every packet,
 > where it resent the same write data over the network over and over
 > and over and over and over again.

Very nice catch!  

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, Kevin Day writes:

>Ack, I was using this very same thing for several devices in an isolated
>peer-to-peer network to decide who the 'master' was. (Whoever had been up
>longest knew more about the state of the network) Having this change could
>cause weirdness for me too... I assumed (without checking *thwap*) that
>boottime was a constant.
>
>Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
>'boottime' gets initialized so that others can use it, not just NFS? :)

no, I think that is a bad idea.  In your case you want to use the
"uptime" which *is* a measure of how long the system has been
running.

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-16 Thread Matthew Dillon


:
:> 
:> In message <[EMAIL PROTECTED]>, Matthew Dillon writes:
:> >
:> >:>NFS uses the kernel 'boottime' structure to generate its version id.
:> >:>Now normally you might believe that this structure, once set, will
:> >:>never change.  The authors of NFS certainly make that assumption!
:> >:
:> >:Is this another case of "lets assume the time of day is a random number" or
:> >:is there any underlying assumption about time in this ?
:> >:
:> >:--
:> >:Poul-Henning Kamp FreeBSD coreteam member
:> >:[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
:> >
:> >It basically needs to be a unique for each server reboot in order
:> >to allow clients to resynchronize.
:> 
:> Ok, then I suggest that you cache a copy of the boottime in the NFS
:> code for this purpose.
:> 
:
:Ack, I was using this very same thing for several devices in an isolated
:peer-to-peer network to decide who the 'master' was. (Whoever had been up
:longest knew more about the state of the network) Having this change could
:cause weirdness for me too... I assumed (without checking *thwap*) that
:boottime was a constant.
:
:Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
:'boottime' gets initialized so that others can use it, not just NFS? :)
:
:
:Kevin

We're already testing a patch.

For the moment it is going to be NFS specific, because there's
no time right now to do it right.

Hopefully I can get this in tomorrow and be done with NFS for the
release.  Then I can spend a little time figuring out what's
wrong with VN (which doesn't work in current at the moment).  Again.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Kevin Day

> 
> In message <[EMAIL PROTECTED]>, Matthew Dillon writes:
> >
> >:>NFS uses the kernel 'boottime' structure to generate its version id.
> >:>Now normally you might believe that this structure, once set, will
> >:>never change.  The authors of NFS certainly make that assumption!
> >:
> >:Is this another case of "lets assume the time of day is a random number" or
> >:is there any underlying assumption about time in this ?
> >:
> >:--
> >:Poul-Henning Kamp FreeBSD coreteam member
> >:[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
> >
> >It basically needs to be a unique for each server reboot in order
> >to allow clients to resynchronize.
> 
> Ok, then I suggest that you cache a copy of the boottime in the NFS
> code for this purpose.
> 

Ack, I was using this very same thing for several devices in an isolated
peer-to-peer network to decide who the 'master' was. (Whoever had been up
longest knew more about the state of the network) Having this change could
cause weirdness for me too... I assumed (without checking *thwap*) that
boottime was a constant.

Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
'boottime' gets initialized so that others can use it, not just NFS? :)


Kevin


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, Matthew Dillon writes:
>
>:>NFS uses the kernel 'boottime' structure to generate its version id.
>:>Now normally you might believe that this structure, once set, will
>:>never change.  The authors of NFS certainly make that assumption!
>:
>:Is this another case of "lets assume the time of day is a random number" or
>:is there any underlying assumption about time in this ?
>:
>:--
>:Poul-Henning Kamp FreeBSD coreteam member
>:[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
>
>It basically needs to be a unique for each server reboot in order
>to allow clients to resynchronize.

Ok, then I suggest that you cache a copy of the boottime in the NFS
code for this purpose.

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Matthew Dillon


:>NFS uses the kernel 'boottime' structure to generate its version id.
:>Now normally you might believe that this structure, once set, will
:>never change.  The authors of NFS certainly make that assumption!
:
:Is this another case of "lets assume the time of day is a random number" or
:is there any underlying assumption about time in this ?
:
:--
:Poul-Henning Kamp FreeBSD coreteam member
:[EMAIL PROTECTED]   "Real hackers run -current on their laptop."

It basically needs to be a unique for each server reboot in order
to allow clients to resynchronize.  The time has historically been
used for this purpose since NFS networks tend to require ntp 
synchronization anyway.  The time was used even before systems 
had realtime clocks -- the kernel would load it's initial
time from the last access time stamp in the root filesystem (or
superblock, I forget which it was).

Under NFSv2 it wasn't as critical.  Under NFSv3 the protocol will
break badly if the number stays the same across a reboot - there
would be a massive loss of data.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, Matthew Dillon writes:
>Here's a general update on this bug report to -current.  It took all day
>but I was finally able to reproduce Andrew's bug.
>
>You guys are going to *love* this.
>
>NFS uses the kernel 'boottime' structure to generate its version id.
>Now normally you might believe that this structure, once set, will
>never change.  The authors of NFS certainly make that assumption!

Is this another case of "lets assume the time of day is a random number" or
is there any underlying assumption about time in this ?

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Matthew Dillon

Here's a general update on this bug report to -current.  It took all day
but I was finally able to reproduce Andrew's bug.

You guys are going to *love* this.

NFS uses the kernel 'boottime' structure to generate its version id.
Now normally you might believe that this structure, once set, will
never change.  The authors of NFS certainly make that assumption!

No such luck.  If you happen to be running, oh, xntpd for example,
the kernel adjusts the boottime structure to take into account time
changes, including PLL changes so, in fact, the boottime structure
can change quite often - once each tick, in fact.

Now, the effect of boottime changing on NFS is rather drastic.  You
see, the version id controls whether NFS clients must reset their
state machines for NFS data writes.  If a client has done a stage 1
write and is ready to do the stage 2 commit, and the version id
changes, the client must revert back to stage 1.

And so Andrews bug report comes into the light!   His poor client
(and mine once I reproduced the bug) got into a state, due to the
server returning a different version id for virtually every packet,
where it resent the same write data over the network over and over
and over and over and over again.

I think recent changes to the way the kernel clocks work in -current
brought the bug out into the open, but it's definitely a problem in
both -stable and -current.

Doh!  I gotta say that if I hadn't happened to have been running xntpd
on my test box I would have *never* figured it out.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Andrew Gallatin


Matthew Dillon writes:
 > 
 > :
 > :
 > :Matthew Dillon writes:
 > : > This is very odd.  Does it lockup with UDP or only with TCP?   And only
 > : > with a solaris client?
 > :
 > :This appears to be solaris only.  I just tried a UDP mount & I see the
 > :same problem.   Is there anything else I can do?
 > 
 > Yes, see if you can repeat the problem with a shorter dd count -- see
 > how small a count you can achieve and still produce the problem, then
 > do a nice long protocol trace.
 > 
 >  -Matt

I've tried.  I can get the file to take a hell of a long time to close
with a count as short as 512 64k chunks.  But it eventually completes.
Perhaps the large files would eventually complete too, if I were
patient enough..

I've left you a full trace on freefall in ~gallatin of a 384MB test
(client has 320MB, server has 384MB).

large.gz :   trace from file creation to server reboot  Server is
 running kernel from today.

large_reboot.gz: trace after server is rebooted into ~July kernel
 and the writes complete.

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Andrew Gallatin


Matthew Dillon writes:
 > 
 > :Also, while read performance has improved by 44%, write performance
 > :has degraded by between 50 - 70% (FreeBSD clients)!  Here are some
 > :quick benchmarks.  Note that the file size of 512MB is larger than
 > :memory on both the server and client.  Also note that the disk array
 > :on the server will read at 50MB/sec and write at 40MB/sec, so we are
 > :not disk bound ;-)
 > :
 > :- UDP NFS write performance from a FreeBSD client:
 > :
 > :July's kernel:  
 > :% dd if=/dev/zero of=zot bs=1024k count=512
 > :512+0 records in
 > :512+0 records out
 > :536870912 bytes transferred in 52.780773 secs (10171714 bytes/sec)
 > :
 > :Today's kernel::
 > :% dd if=/dev/zero of=zot bs=1024k count=512
 > :512+0 records in
 > :512+0 records out
 > :536870912 bytes transferred in 141.593458 secs (3791636 bytes/sec)
 > 
 > Question on these:  Is it the client you are running the old and new
 > kernels on or the server?  Also, make sure you are running the same
 > number of nfsiod's on each (and also try running a different number
 > of nfsiod's).
 > 
 > At the moment I am assuming you ran these tests with the client running
 > the old and new kernel.  I have some ideas there in regards to inefficient
 > context switching when the nfsiod's are saturated that I am testing.
 > 
 >  -Matt

Sorry I wasn't clear.  The variable was the server's kernel version.
The client's kernel remained constant.  It was from sources built late
week.   

When you are measuring server write performance, are you looking at
the numbers from the client's perspective, or are you looking at the
numbers from the point of view of the server's disks.

I ask because if I watch the server via systat, I see a steady
10-11MB/sec hitting the disk.  However, if I time the process, or look 
at the output from dd, I see the poor (< 4MB/sec) numbers I reported.  


Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Matthew Dillon


:Also, while read performance has improved by 44%, write performance
:has degraded by between 50 - 70% (FreeBSD clients)!  Here are some
:quick benchmarks.  Note that the file size of 512MB is larger than
:memory on both the server and client.  Also note that the disk array
:on the server will read at 50MB/sec and write at 40MB/sec, so we are
:not disk bound ;-)
:
:- UDP NFS write performance from a FreeBSD client:
:
:July's kernel: 
:% dd if=/dev/zero of=zot bs=1024k count=512
:512+0 records in
:512+0 records out
:536870912 bytes transferred in 52.780773 secs (10171714 bytes/sec)
:
:Today's kernel::
:% dd if=/dev/zero of=zot bs=1024k count=512
:512+0 records in
:512+0 records out
:536870912 bytes transferred in 141.593458 secs (3791636 bytes/sec)

Question on these:  Is it the client you are running the old and new
kernels on or the server?  Also, make sure you are running the same
number of nfsiod's on each (and also try running a different number
of nfsiod's).

At the moment I am assuming you ran these tests with the client running
the old and new kernel.  I have some ideas there in regards to inefficient
context switching when the nfsiod's are saturated that I am testing.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Matthew Dillon


:
:
:Matthew Dillon writes:
: > This is very odd.  Does it lockup with UDP or only with TCP?   And only
: > with a solaris client?
:
:This appears to be solaris only.  I just tried a UDP mount & I see the
:same problem.   Is there anything else I can do?

Yes, see if you can repeat the problem with a shorter dd count -- see
how small a count you can achieve and still produce the problem, then
do a nice long protocol trace.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Andrew Gallatin


Matthew Dillon writes:
 > This is very odd.  Does it lockup with UDP or only with TCP?   And only
 > with a solaris client?

This appears to be solaris only.  I just tried a UDP mount & I see the
same problem.   Is there anything else I can do?


<...>
 > :
 > :- UDP NFS write performance from a FreeBSD client:
 > 
 > Ok, I'll take a look at it.  I get 9 MBytes/sec writing over UDP links
 > but only with fast (400MHz class) clients.  With a 200 MHz class client
 > I am only getting 4-5 MBytes/sec.  Frankly, I should be getting 9 MB/sec
 > even with the slower client!

The client and server here are both 450MHz machines.  The server
and client are identical (450MHz PII (or PIII), etherexpress pro nics) 
except that the server has 384MB of ram and the client has
64MB.  The client was running a kernel from last week.  My desktop
(196 MB 300MHZ PII) shows similar behaviour -- a drop from 7MB/s to
3MB/s.  It is running a kernel from Monday.

<...>

 > Yes, read performance has been improved in just the last few days
 > simply by adding a read heuristic to the server side for transfers off
 > the physical media.

Nice!

Thanks,

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Serious server-side NFS problem

1999-12-15 Thread Matthew Dillon

:However, I'm seeing a showstopping problem when running newer kernels:
:When writing a large file via TCP, a Solaris 2.7 client pauses when
:closing the file, and appears to become stuck in an infinate loop.
:Eg:
:
:dd if=/dev/zero of=zot bs=64k count=8192
:8192+0 records in
:8192+0 records out
:^C <- wedge
:
:The process does not exit, and there is a flurry of activity between
:the client & server:
:
:solaris -> freebsd NFS C WRITE3 FH=F5CB at 369655808 for 32768 (ASYNC)
:
:freebsd -> solaris TCP D=843 S=2049 Ack=1906313520 Seq=94979688 Len=164 Win=33176
:freebsd -> solaris RPC R (#5146) XID=299504169 Success
:freebsd -> solaris NFS R WRITE3 OK 32768 (ASYNC)
:

This is very odd.  Does it lockup with UDP or only with TCP?   And only
with a solaris client?

It's possible that the problem may be related to changes in the TCP
stack rather then related to NFS.  Looking at your trace output
I don't see anything wrong beyond the solaris client apparently repeating
write requests.  Another possibility is that we somehow nixed the
commit rpc.  All the repeated writes occur after the commit rpc.  In
the protocol trace, though, the commit rpc succeeds.  Very odd!

:I would think that this is not our fault, except things work just dandy
:with a kernel from July.  In fact, the only way out of this situation
:is to reboot the FreeBSD NFS server into an older kernel.
:
:In the trace (about 30 seconds or so of the activity after dd
:finished, but before it exited) there are ~21,000 packets.  There is a
:grand total of:
:
:NFS C WRITE3:  11024
:NFS R WRITE3 OK:   10499
:NFS C COMMIT3: 1
:NFS R COMMIT3 OK:  1
:
:In case more details are needed, I've left the complete trace in
:~gallatin/nfs-trace.gz on freefall.

I've looked at it -- it looks normal except for the repeated writes.

:Also, while read performance has improved by 44%, write performance
:has degraded by between 50 - 70% (FreeBSD clients)!  Here are some
:quick benchmarks.  Note that the file size of 512MB is larger than
:memory on both the server and client.  Also note that the disk array
:on the server will read at 50MB/sec and write at 40MB/sec, so we are
:not disk bound ;-)
:
:- UDP NFS write performance from a FreeBSD client:

Ok, I'll take a look at it.  I get 9 MBytes/sec writing over UDP links
but only with fast (400MHz class) clients.  With a 200 MHz class client
I am only getting 4-5 MBytes/sec.  Frankly, I should be getting 9 MB/sec
even with the slower client!

:UDP NFS Read performance has gotten better:
:
:July's kernel:
:% dd if=zot of=/dev/null bs=64k
:8192+0 records in
:8192+0 records out
:536870912 bytes transferred in 84.621477 secs (6344381 bytes/sec)
:
:Today's kernel:
:dd if=zot of=/dev/null bs=64k
:8192+0 records in
:8192+0 records out
:536870912 bytes transferred in 58.544409 secs (9170319 bytes/sec)
:
:Cheers,
:
:Drew

Yes, read performance has been improved in just the last few days
simply by adding a read heuristic to the server side for transfers off
the physical media.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>

:--
:Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin
:Duke UniversityEmail: [EMAIL PROTECTED]
:Department of Computer Science Phone: (919) 660-6590
:
:
:
:To Unsubscribe: send mail to [EMAIL PROTECTED]
:with "unsubscribe freebsd-current" in the body of the message
:



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Serious server-side NFS problem

1999-12-15 Thread Andrew Gallatin


I have a few "scratch" servers which are running -current from early
July.  They serve large, fast scratch filesystems striped over 4 large
IDE drives.   With the recent improvements to the NFS code & the ATA
code, I was hoping to get them running a more recent -current.

However, I'm seeing a showstopping problem when running newer kernels:
When writing a large file via TCP, a Solaris 2.7 client pauses when
closing the file, and appears to become stuck in an infinate loop.
Eg:

dd if=/dev/zero of=zot bs=64k count=8192
8192+0 records in
8192+0 records out
^C  <- wedge

The process does not exit, and there is a flurry of activity between
the client & server:

solaris -> freebsd ETHER Type=0800 (IP), size = 1514 bytes
solaris -> freebsd IP  D=152.3.X.Z S=152.3.X.Y LEN=1500, ID=16922
solaris -> freebsd TCP D=2049 S=843 Ack=94978376 Seq=1906025252 Len=1460 Win=8760
solaris -> freebsd RPC C XID=299504169 PROG=13 (NFS) VERS=3 PROC=7
solaris -> freebsd NFS C WRITE3 FH=F5CB at 369655808 for 32768 (ASYNC)

<>
freebsd -> solaris ETHER Type=0800 (IP), size = 218 bytes
freebsd -> solaris IP  D=152.3.X.Y S=152.3.X.Z LEN=204, ID=34565
freebsd -> solaris TCP D=843 S=2049 Ack=1906313520 Seq=94979688 Len=164 Win=33176
freebsd -> solaris RPC R (#5146) XID=299504169 Success
freebsd -> solaris NFS R WRITE3 OK 32768 (ASYNC)

<>

solaris -> freebsd ETHER Type=0800 (IP), size = 1514 bytes
solaris -> freebsd IP  D=152.3.X.Z S=152.3.X.Y LEN=1500, ID=49895
solaris -> freebsd TCP D=2049 S=843 Ack=96156528 Seq=2140928624 Len=1460 Win=8760
solaris -> freebsd RPC C XID=299511401 PROG=13 (NFS) VERS=3 PROC=7
solaris -> freebsd NFS C WRITE3 FH=F5CB at 369655808 for 32768 (ASYNC)
<...>

freebsd -> solaris ETHER Type=0800 (IP), size = 218 bytes
freebsd -> solaris IP  D=152.3.X.Y S=152.3.X.Z LEN=204, ID=51011
freebsd -> solaris TCP D=843 S=2049 Ack=2140968864 Seq=96156856 Len=164 Win=33176
freebsd -> solaris RPC R (#18995) XID=299511401 Success
freebsd -> solaris NFS R WRITE3 OK 32768 (A

As you can see, the client seems to write the same block multiple
times. 

I would think that this is not our fault, except things work just dandy
with a kernel from July.  In fact, the only way out of this situation
is to reboot the FreeBSD NFS server into an older kernel.

In the trace (about 30 seconds or so of the activity after dd
finished, but before it exited) there are ~21,000 packets.  There is a
grand total of:

NFS C WRITE3:   11024
NFS R WRITE3 OK:10499
NFS C COMMIT3:  1
NFS R COMMIT3 OK:   1

In case more details are needed, I've left the complete trace in
~gallatin/nfs-trace.gz on freefall.


Also, while read performance has improved by 44%, write performance
has degraded by between 50 - 70% (FreeBSD clients)!  Here are some
quick benchmarks.  Note that the file size of 512MB is larger than
memory on both the server and client.  Also note that the disk array
on the server will read at 50MB/sec and write at 40MB/sec, so we are
not disk bound ;-)


- UDP NFS write performance from a FreeBSD client:

July's kernel:  
% dd if=/dev/zero of=zot bs=1024k count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 52.780773 secs (10171714 bytes/sec)

Today's kernel::
% dd if=/dev/zero of=zot bs=1024k count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 141.593458 secs (3791636 bytes/sec)


-- TCP NFS write performnace from a FreeBSD client:

July's kernel:
% dd if=/dev/zero of=zot bs=1024k count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 69.935044 secs (7676708 bytes/sec)

Today's kernel:
% dd if=/dev/zero of=zot bs=1024k count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 162.074402 secs (3312497 bytes/sec)


UDP NFS Read performance has gotten better:

July's kernel:
% dd if=zot of=/dev/null bs=64k
8192+0 records in
8192+0 records out
536870912 bytes transferred in 84.621477 secs (6344381 bytes/sec)

Today's kernel:
dd if=zot of=/dev/null bs=64k
8192+0 records in
8192+0 records out
536870912 bytes transferred in 58.544409 secs (9170319 bytes/sec)

Cheers,

Drew
--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



NFS problem

1999-03-04 Thread Frank Bonnet

I run a mailhub at 3.1 

The problem is the /var/mail directory is NFS mounted
with HPUX 10.20 clients 

there is a file locking problem due to the HPUX /bin/mail 
command which try to create a user.lock file in the NFS mounted 
/var/mail directory ...

Is there a NFS workaround or do I have to give up now ?

TIA
--
Frank Bonnet
Groupe ESIEE Paris
http://www.esiee.fr/~bonnetf/


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message



Re: NFS problem found - pleaes try this patch.

1999-01-19 Thread Bjoern Fischer
On Tue, Jan 19, 1999 at 01:01:34PM +0200, Sheldon Hearn wrote:
[...]
> > But there's still something wrong: When shutting down the server
> > it still sometimes panics in vinvalbuf() complaining 'bout dirty
> > pages.
> 
> I'm not sure this has anything to do with NFS. I got this after last
> night's fresh world and kernel install. The vinvalbuf message occured
> after the ``syncing disks ...done'' message but before the ``Rebooting''
> message and.

Then why is the panic on the server triggered by the vi SEGV?
No vi SEGV -> server goes down normally;
vi SEGV -> server panics on shutdown.

> However, I cannot reproduce the message since that reboot, with or
> without NFS activity.

It is somewhat tricky. There's actually only one file I've got that
causes vi to SEGV (and the server to panic on shutdown). I'll
have to look into vi source to find out how the vi.recover file
is created. Maybe some locking is involvled, too.

  Bjoern

-- 
(sig_t*)NULL

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message


Re: NFS problem found - pleaes try this patch.

1999-01-19 Thread D. Rock
This patch seems to fix my NFS problems. I started a make release yesterday
and it is still running (It's a slow machine). No problems so far.
The chroot dir is NFSv2/UDP mounted.

Thanks,

Daniel

Luoqi Chen schrieb:
> 
> The check is correct and should be there, the B_CACHE bit was cleared because
> I made a mistake when setting the valid bit in the vm page.
> 
> Index: vfs_bio.c
> ===
> RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
> retrieving revision 1.192
> diff -u -r1.192 vfs_bio.c
> --- vfs_bio.c   1999/01/12 11:59:34 1.192
> +++ vfs_bio.c   1999/01/18 14:45:33
> @@ -2171,7 +2171,7 @@
> (vm_offset_t) (soff & PAGE_MASK),
> (vm_offset_t) (eoff - soff));
> sv = (bp->b_offset + bp->b_validoff + DEV_BSIZE - 1) & 
> ~(DEV_BSIZE - 1);
> -   ev = (bp->b_offset + bp->b_validend) & ~(DEV_BSIZE - 1);
> +   ev = (bp->b_offset + bp->b_validend + DEV_BSIZE - 1) & 
> ~(DEV_BSIZE - 1);
> soff = qmax(sv, soff);
> eoff = qmin(ev, eoff);
> }
> 
> Note the calculation of ev, the original code was a round-up and I changed it
> to round-down in my -r1.188 commit (I thought it was a bug in the original
> code, but it was actually me who didn't understand the nfs code well enough).
> 
> -lq
> 
> To Unsubscribe: send mail to majord...@freebsd.org
> with "unsubscribe freebsd-current" in the body of the message

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message


Re: NFS problem found - pleaes try this patch.

1999-01-19 Thread Sheldon Hearn


On Tue, 19 Jan 1999 11:38:50 +0100, Bjoern Fischer wrote:

> But there's still something wrong: When shutting down the server
> it still sometimes panics in vinvalbuf() complaining 'bout dirty
> pages.

I'm not sure this has anything to do with NFS. I got this after last
night's fresh world and kernel install. The vinvalbuf message occured
after the ``syncing disks ...done'' message but before the ``Rebooting''
message and.

However, I cannot reproduce the message since that reboot, with or
without NFS activity.

Ciao,
Sheldon.

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message


Re: NFS problem found - pleaes try this patch.

1999-01-19 Thread Bjoern Fischer
On Mon, Jan 18, 1999 at 10:05:50AM -0500, Luoqi Chen wrote:
> The check is correct and should be there, the B_CACHE bit was cleared because
> I made a mistake when setting the valid bit in the vm page.
[...]
> Note the calculation of ev, the original code was a round-up and I changed it
> to round-down in my -r1.188 commit (I thought it was a bug in the original
> code, but it was actually me who didn't understand the nfs code well enough).

The patch seems to solve the problem as nfs behaves as it did
prior to -r1.188. Thanks.

But there's still something wrong: When shutting down the server
it still sometimes panics in vinvalbuf() complaining 'bout dirty
pages. On the client side vi dies of SEGV (edited file and
/var/tmp/vi.recover on nfs fs) generating a wrong sized recover
file. After that the server panics on shutdown. Without triggering
the bug it shuts down gracefully.

I'll try to receipe a situation for easily reproducing this.

  Bjoern

-- 
-BEGIN GEEK CODE BLOCK-
GCS d--(+) s++: a- C+++(-) UBLOSI$ P+++(-) L+++(-) !E W- N+ o>+
K- !w !O !M !V  PS++  PE-  PGP++  t+++  !5 X++ tv- b+++ D++ G e+ h-- y+ 
--END GEEK CODE BLOCK--

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message


Re: NFS problem found - pleaes try this patch.

1999-01-18 Thread Matthew Dillon
A.  Yes, I see.  I will unapply my patch and apply this one and test
it.

I'm not sure what the use of having m->valid and m->clean bits are at all
if we have to munge them like this.  Perhaps we should change these
vm_page_t to a byte range in -4.0.

I think we also need to redefine the way dirty bp's are handled, though,
and at least panic if it tries to clear B_CACHE on something that B_CACHE
should not be cleared on.

-Matt
Matthew Dillon 


:The check is correct and should be there, the B_CACHE bit was cleared because
:I made a mistake when setting the valid bit in the vm page.
:
:Index: vfs_bio.c
:===
:RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
:retrieving revision 1.192
:diff -u -r1.192 vfs_bio.c
:--- vfs_bio.c  1999/01/12 11:59:34 1.192
:+++ vfs_bio.c  1999/01/18 14:45:33
:@@ -2171,7 +2171,7 @@
:   (vm_offset_t) (soff & PAGE_MASK),
:   (vm_offset_t) (eoff - soff));
:   sv = (bp->b_offset + bp->b_validoff + DEV_BSIZE - 1) & 
~(DEV_BSIZE - 1);
:-  ev = (bp->b_offset + bp->b_validend) & ~(DEV_BSIZE - 1);
:+  ev = (bp->b_offset + bp->b_validend + DEV_BSIZE - 1) & 
~(DEV_BSIZE - 1);
:   soff = qmax(sv, soff);
:   eoff = qmin(ev, eoff);
:   }
:
:Note the calculation of ev, the original code was a round-up and I changed it
:to round-down in my -r1.188 commit (I thought it was a bug in the original
:code, but it was actually me who didn't understand the nfs code well enough).
:
:-lq


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message


Re: NFS problem found - pleaes try this patch.

1999-01-18 Thread Luoqi Chen
The check is correct and should be there, the B_CACHE bit was cleared because
I made a mistake when setting the valid bit in the vm page.

Index: vfs_bio.c
===
RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
retrieving revision 1.192
diff -u -r1.192 vfs_bio.c
--- vfs_bio.c   1999/01/12 11:59:34 1.192
+++ vfs_bio.c   1999/01/18 14:45:33
@@ -2171,7 +2171,7 @@
(vm_offset_t) (soff & PAGE_MASK),
(vm_offset_t) (eoff - soff));
sv = (bp->b_offset + bp->b_validoff + DEV_BSIZE - 1) & 
~(DEV_BSIZE - 1);
-   ev = (bp->b_offset + bp->b_validend) & ~(DEV_BSIZE - 1);
+   ev = (bp->b_offset + bp->b_validend + DEV_BSIZE - 1) & 
~(DEV_BSIZE - 1);
soff = qmax(sv, soff);
eoff = qmin(ev, eoff);
}

Note the calculation of ev, the original code was a round-up and I changed it
to round-down in my -r1.188 commit (I thought it was a bug in the original
code, but it was actually me who didn't understand the nfs code well enough).

-lq

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message


Re: NFS problem found - pleaes try this patch.

1999-01-18 Thread Chris Timmons

Good work!  I have to run at the moment but it looks like you nailed this
one.  Your explanation coincides perfectly with the symptoms.

Thanks!

-Chris


On Mon, 18 Jan 1999, Matthew Dillon wrote:

> Ok, I believe I have found the bug.  Please test the patch included below.
> I was able to make /usr/ports/x11/XFree86-contrib after applying this 
> patch ( and it was screwing up prior to that ).
> 
> The problem is in getblk() - code was added to validate the buffer and
> to clear B_CACHE if the bp was not entirely valid.  The problem is 
> that NFS uses B_CACHE to flag a dirty buffer that needs to be written out!
> Additionally, a write() to an NFS based file may write data that is not
> on a DEV_BSIZE'd boundry which causes a subsequent read() to improperly
> clear B_CACHE.
> 
> There are almost certainly more problems like this -- using B_CACHE to
> mark a buffer dirty is just plain dumb, it's no wonder NFS is so screwed 
> up!
> 
>   -Matt
> 
>   Matthew Dillon 
>   
> 
> Index: kern/vfs_bio.c
> ===
> RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
> retrieving revision 1.192
> diff -u -r1.192 vfs_bio.c
> --- vfs_bio.c 1999/01/12 11:59:34 1.192
> +++ vfs_bio.c 1999/01/18 13:25:27
> @@ -1364,6 +1364,7 @@
>   break;
>   }
>   }
> +
>   boffset = (i << PAGE_SHIFT) - (bp->b_offset & PAGE_MASK);
>   if (boffset < bp->b_dirtyoff) {
>   bp->b_dirtyoff = max(boffset, 0);
> @@ -1457,7 +1458,14 @@
>   }
>   KASSERT(bp->b_offset != NOOFFSET, 
>   ("getblk: no buffer offset"));
> +#if 0
>   /*
> +  * XXX REMOVED XXX - this is bogus.  It will cause the
> +  * B_CACHE flag to be cleared for a partially constituted
> +  * dirty buffer (NFS) that happens to have a write that is
> +  * not on a DEV_BSIZE boundry!!  XXX REMOVED 
> +  */
> + /*
>* Check that the constituted buffer really deserves for the
>* B_CACHE bit to be set.  B_VMIO type buffers might not
>* contain fully valid pages.  Normal (old-style) buffers
> @@ -1478,6 +1486,7 @@
>   poffset = 0;
>   }
>   }
> +#endif
>  
>   if (bp->b_usecount < BUF_MAXUSE)
>   ++bp->b_usecount;
> 


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message


NFS problem found - pleaes try this patch.

1999-01-18 Thread Matthew Dillon
::On the server I downgraded vfs_bio.c to rev 1.187 & rebooted; no luck.  I
::then installed the same kernel (with the downgraded vfs_bio.c) to the
::client.  Bingo.  With both NFS client & server machine running rev 1.187,
::...
::-Chris
:
:Hmm.  r1.88 are Luoqi's fixes to the handling of misaligned buffers.  It is
:quite possible that there is a bug in there or with assumptions made in
:the NFS code in regards to how buffers are handled, but most of those
:...
:   -Matt

Ok, I believe I have found the bug.  Please test the patch included below.
I was able to make /usr/ports/x11/XFree86-contrib after applying this 
patch ( and it was screwing up prior to that ).

The problem is in getblk() - code was added to validate the buffer and
to clear B_CACHE if the bp was not entirely valid.  The problem is 
that NFS uses B_CACHE to flag a dirty buffer that needs to be written out!
Additionally, a write() to an NFS based file may write data that is not
on a DEV_BSIZE'd boundry which causes a subsequent read() to improperly
clear B_CACHE.

There are almost certainly more problems like this -- using B_CACHE to
mark a buffer dirty is just plain dumb, it's no wonder NFS is so screwed 
up!

-Matt

Matthew Dillon 


Index: kern/vfs_bio.c
===
RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
retrieving revision 1.192
diff -u -r1.192 vfs_bio.c
--- vfs_bio.c   1999/01/12 11:59:34 1.192
+++ vfs_bio.c   1999/01/18 13:25:27
@@ -1364,6 +1364,7 @@
break;
}
}
+
boffset = (i << PAGE_SHIFT) - (bp->b_offset & PAGE_MASK);
if (boffset < bp->b_dirtyoff) {
bp->b_dirtyoff = max(boffset, 0);
@@ -1457,7 +1458,14 @@
}
KASSERT(bp->b_offset != NOOFFSET, 
("getblk: no buffer offset"));
+#if 0
/*
+* XXX REMOVED XXX - this is bogus.  It will cause the
+* B_CACHE flag to be cleared for a partially constituted
+* dirty buffer (NFS) that happens to have a write that is
+* not on a DEV_BSIZE boundry!!  XXX REMOVED 
+*/
+   /*
 * Check that the constituted buffer really deserves for the
 * B_CACHE bit to be set.  B_VMIO type buffers might not
 * contain fully valid pages.  Normal (old-style) buffers
@@ -1478,6 +1486,7 @@
poffset = 0;
}
}
+#endif
 
if (bp->b_usecount < BUF_MAXUSE)
++bp->b_usecount;

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message