subject:"Re\: Serious server\-side NFS problem"

Re: Serious server-side NFS problem

1999-12-17 Thread Mike Smith


 
 We've solved most of the performance issues, but NFS is still
 eating a little too much cpu for my tastes.  Unfortunately it is getting to
 the point where a significant portion of the performance loss is occuring
 in the network driver itself.  Some of my cards eat 25% of the cpu just
 in 'interrupt' (at 10 MBytes/sec half duplex), not even counting the
 TCP or UDP stacks.  This is mainly due to the MTU being too small (i.e.
 packet fragmentation takes it toll on the interrupt subsystem).  SCSI 
 cards are way ahead of NIC cards in regards to reducing interrupt 
 overhead (though gigabit NICs have caught up some).

Actually, I'm not sure I buy this at all.  Both the EtherExpress and 
3C905 families give less than one interrupt per datagram, and the 
other overheads on them are comparably small.

I think you'll want to do some profiling before getting too concerned
about the network drivers themselves; gigabit hardware isn't really any
lighter on the CPU than good 100Mbps hardware, and we can handle better
than 400MBps UDP inbound on a reasonable (400MHz) system right now.
(Lots better with jumbo frames.)

My guesses (based on some of the profiling that Bill Paul did) would be 
the IP and UDP checksum guessing, but more that I think you'll find that 
a considerable amount of the inbound NFS traffic handling is actually 
performed in the interrupt context (ie. I don't think that stuff is 
being handed off to a softnet handler), blowing out the numbers a bit.

-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  [EMAIL PROTECTED]
\\ and he'll hate you for a lifetime. \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-17 Thread Doug Rabson


On Wed, 15 Dec 1999, Matthew Dillon wrote:

 Here's a general update on this bug report to -current.  It took all day
 but I was finally able to reproduce Andrew's bug.
 
 You guys are going to *love* this.
 
 NFS uses the kernel 'boottime' structure to generate its version id.
 Now normally you might believe that this structure, once set, will
 never change.  The authors of NFS certainly make that assumption!
 
 No such luck.  If you happen to be running, oh, xntpd for example,
 the kernel adjusts the boottime structure to take into account time
 changes, including PLL changes so, in fact, the boottime structure
 can change quite often - once each tick, in fact.

Nice catch, Matt.

--
Doug Rabson Mail:  [EMAIL PROTECTED]
Nonlinear Systems Ltd.  Phone: +44 181 442 9037




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-17 Thread Garrett Wollman


On Fri, 17 Dec 1999 00:55:26 -0800, Mike Smith [EMAIL PROTECTED] said:

 the IP and UDP checksum guessing, but more that I think you'll find that 
 a considerable amount of the inbound NFS traffic handling is actually 
 performed in the interrupt context

If it is, then there is a serious bug.

-GAWollman

--
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
[EMAIL PROTECTED]  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-17 Thread Rodney W. Grimes


...
 (200-300 MHz) clients.  That's *with* packet loss (for some reason when
 my fxp ethernets pump data out that quickly they tend to cause packet
 loss in other parts of my HUBed network, which I find quite annoying).

Interesting you should say that  I've been playing with some broadcom
based ASIC 100BaseTX full duplex switches and I actually loose more packets
due to overrunning the buffers in the switch than I do if I used a half duplex
standard hub.  :-(

Performance for most things overall on the network is better with the
switch, but direct high bandwidth traffic between 2 machines has suffered
due to the conversion to a fully switched network.

Seems FreeBSD (using dc21143 based cards) can pump data around so damn
fast that the switch can't keep up :-(.  I need to do some more testing
to find out if this occurs between ports on the same ASIC or only when
packets have to go out to the ASIC to ASIC bridge bus.

Also how do the fxp and dc based cards respond to flow control?
Do we obey it?  Do the cards even understand it?

-- 
Rod Grimes - KD7CAX @ CN85sl - (RWG25)   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-17 Thread Matthew Dillon



:On Fri, 17 Dec 1999 00:55:26 -0800, Mike Smith [EMAIL PROTECTED] said:
:
: the IP and UDP checksum guessing, but more that I think you'll find that 
: a considerable amount of the inbound NFS traffic handling is actually 
: performed in the interrupt context
:
:If it is, then there is a serious bug.
:
:-GAWollman
:
:--
:Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same

No serious NFS traffic handling is done in the interrupt context.  The
packets are essentially just queued up for nfsd to deal with.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-17 Thread Andrew Gallatin



Kenneth D. Merry writes:
  
  
  Another advantage with gigabit ethernet is that if you can do jumbo frames,
  you can fit an entire 8K NFS packet in one frame.
  
  I'd like to see NFS numbers from two 21264 Alphas with GigE cards, zero
  copy, checksum offloading and a big striped array on one end at least.  I

Well.. maybe this will work for you ;-)

2 21264 alphas (500MHz XP1000S), 640MB RAM, Myrinet/Trapeze using
64-bit Myrinet cards, 8K cluster mbufs, UDP checksums disabled (we can
do checksum offloading at the receiver only).  We have a 56K MTU.
Using this setup, *without* zero copy, we get roughly 140MB/sec out of
TCP:

% netperf -Hbroil-my
TCP STREAM TEST to broil-my : histogram
Recv   SendSend  
Socket Socket  Message  Elapsed  
Size   SizeSize Time Throughput  
bytes  bytes   bytessecs.10^6bits/sec  

524288 524288 52428810.011135.20   

And about 900Mb/sec (112MB/sec) out of UDP using an 8k message size:

% netperf -Hbroil-my -tUDP_STREAM -- -m 8192
UDP UNIDIRECTIONAL SEND TEST to broil-my : histogram
Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

 573448192   10.00  165619  01084.94
 65535   10.00  137338899.68


I have exported a local disk on broil-my and created a 512MB file
(zot).  Both machines have 640MB of ram and the test file is fully
cached on the server.  When reading the file from the client, I have
found the best I can do is roughly 57MB/sec:

# mount_nfs -a 3 -r 16384 boil-my:/var/tmp /mnt
# dd if=/mnt/zot of=/dev/null bs=64k
8192+0 records in
8192+0 records out
536870912 bytes transferred in 9.658521 secs (55585209 bytes/sec)
# umount /mnt
# mount_nfs -a 3 -r 32768 boil-my:/var/tmp /mnt
# if=/mnt/zot of=/dev/null bs=64k
8192+0 records in
8192+0 records out
536870912 bytes transferred in 9.513517 secs (56432433 bytes/sec)

Emperically, it seems that -a 3 performs better than -a 2 or -a 4.
Also, the bandwidth seems to max out with a 16k read size.  Increasing 
much beyond that doesn't seem to help.  Varying the number if nfsiods 
across between 2,4  20 doesn't seem to matter much.  

Running iprobe on the client (http://www.cs.duke.edu/ari/iprobe.html)
shows us that we are spending:

- 29.4% in bcopy -- this doesn't change a lot if I enable/disable
vfs_ioopt.  I suspect that this is from bcopy'ing data out of mbufs,
not crossing the user/kernel boundary.  In either case, there's not
much that can be done to reduce this in a generic manner.

-  5.5% tsleep (contention between nfsiods?)

The "top" functions/components are:

Name Count   Pct   Pct
--   -   ---   ---
kernel412890.0 

bcopy_samealign_lp1347  32.6  29.4 
procrunnable   279   6.8   6.1 
tsleep 256   6.2   5.6 
Lidle2 195   4.7   4.3 
m_freem 89   2.2   1.9 
soreceive   73   1.8   1.6 
lockmgr 63   1.5   1.4 
brelse  60   1.5   1.3 
vm_page_free_toq55   1.3   1.2 
ovbcopy 51   1.2   1.1 
wakeup  43   1.0   0.9 
acquire 42   1.0   0.9 
bcopy_da_lp 42   1.0   0.9 
nfs_request 41   1.0   0.9 
ip_input40   1.0   0.9 
biodone 39   0.9   0.9 
nfs_readrpc 38   0.9   0.8 
vm_page_alloc   36   0.9   0.8 
...
--
/modules/tpz.ko435 9.5 

tpz.ko is the myrinet device driver.  This is saying that the system
spent 90% of its time in the static kernel, 9.5% in the device driver, 
and 0.5% in userland.

The server is also close to maxed-out.  I can provide an iprobe
breakdown for it as well, and/or complete breakdowns for the client
and server.  


Cheers,

Drew


--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-17 Thread Mike Smith


 
 :On Fri, 17 Dec 1999 00:55:26 -0800, Mike Smith [EMAIL PROTECTED] said:
 :
 : the IP and UDP checksum guessing, but more that I think you'll find that 
 : a considerable amount of the inbound NFS traffic handling is actually 
 : performed in the interrupt context
 :
 :If it is, then there is a serious bug.
 
 No serious NFS traffic handling is done in the interrupt context.  The
 packets are essentially just queued up for nfsd to deal with.

That's interesting then, since your results are somewhat at odds with 
what I've seen so far regarding interrupt load for network traffic.  Do 
you have any profiling results that point the finger more directly at 
anything?

-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  [EMAIL PROTECTED]
\\ and he'll hate you for a lifetime. \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Matthew Dillon



:
: 
: In message [EMAIL PROTECTED], Matthew Dillon writes:
: 
: :NFS uses the kernel 'boottime' structure to generate its version id.
: :Now normally you might believe that this structure, once set, will
: :never change.  The authors of NFS certainly make that assumption!
: :
: :Is this another case of "lets assume the time of day is a random number" or
: :is there any underlying assumption about time in this ?
: :
: :--
: :Poul-Henning Kamp FreeBSD coreteam member
: :[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
: 
: It basically needs to be a unique for each server reboot in order
: to allow clients to resynchronize.
: 
: Ok, then I suggest that you cache a copy of the boottime in the NFS
: code for this purpose.
: 
:
:Ack, I was using this very same thing for several devices in an isolated
:peer-to-peer network to decide who the 'master' was. (Whoever had been up
:longest knew more about the state of the network) Having this change could
:cause weirdness for me too... I assumed (without checking *thwap*) that
:boottime was a constant.
:
:Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
:'boottime' gets initialized so that others can use it, not just NFS? :)
:
:
:Kevin

We're already testing a patch.

For the moment it is going to be NFS specific, because there's
no time right now to do it right.

Hopefully I can get this in tomorrow and be done with NFS for the
release.  Then I can spend a little time figuring out what's
wrong with VN (which doesn't work in current at the moment).  Again.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Poul-Henning Kamp


In message [EMAIL PROTECTED], Kevin Day writes:

Ack, I was using this very same thing for several devices in an isolated
peer-to-peer network to decide who the 'master' was. (Whoever had been up
longest knew more about the state of the network) Having this change could
cause weirdness for me too... I assumed (without checking *thwap*) that
boottime was a constant.

Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
'boottime' gets initialized so that others can use it, not just NFS? :)

no, I think that is a bad idea.  In your case you want to use the
"uptime" which *is* a measure of how long the system has been
running.

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Andrew Gallatin



Matthew Dillon writes:
  
  And so Andrews bug report comes into the light!   His poor client
  (and mine once I reproduced the bug) got into a state, due to the
  server returning a different version id for virtually every packet,
  where it resent the same write data over the network over and over
  and over and over and over again.

Very nice catch!  

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Nate Williams


 In message [EMAIL PROTECTED], Kevin Day writes:
 
 Ack, I was using this very same thing for several devices in an isolated
 peer-to-peer network to decide who the 'master' was. (Whoever had been up
 longest knew more about the state of the network) Having this change could
 cause weirdness for me too... I assumed (without checking *thwap*) that
 boottime was a constant.
 
 Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
 'boottime' gets initialized so that others can use it, not just NFS? :)
 
 no, I think that is a bad idea.  In your case you want to use the
 "uptime" which *is* a measure of how long the system has been
 running.

Uptime is also a constantly changing number.  Forgive me for my
ignorance, but why does bootime constantly change?  I would have thought
it would be a constant?  I've got software that also uses this to
determine when a new copy of it exists (although I do keep a local cache
of the value in case my software crashes, since it can recover from a
crash, but not a reboot).

I would think that boottime would be constant, since you didn't keep
booting at a different time...



Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Matthew Reimer


Matt, you are a tenacious, fearsome bug hunter!

Matt

Matthew Dillon wrote:
 
 Here's a general update on this bug report to -current.  It took all day
 but I was finally able to reproduce Andrew's bug.
 
 You guys are going to *love* this.
 
 NFS uses the kernel 'boottime' structure to generate its version id.
 Now normally you might believe that this structure, once set, will
 never change.  The authors of NFS certainly make that assumption!
 
 No such luck.  If you happen to be running, oh, xntpd for example,
 the kernel adjusts the boottime structure to take into account time
 changes, including PLL changes so, in fact, the boottime structure
 can change quite often - once each tick, in fact.
 
 Now, the effect of boottime changing on NFS is rather drastic.  You
 see, the version id controls whether NFS clients must reset their
 state machines for NFS data writes.  If a client has done a stage 1
 write and is ready to do the stage 2 commit, and the version id
 changes, the client must revert back to stage 1.
 
 And so Andrews bug report comes into the light!   His poor client
 (and mine once I reproduced the bug) got into a state, due to the
 server returning a different version id for virtually every packet,
 where it resent the same write data over the network over and over
 and over and over and over again.
 
 I think recent changes to the way the kernel clocks work in -current
 brought the bug out into the open, but it's definitely a problem in
 both -stable and -current.
 
 Doh!  I gotta say that if I hadn't happened to have been running xntpd
 on my test box I would have *never* figured it out.
 
 -Matt
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-current" in the body of the message


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Kevin Day


 
  In message [EMAIL PROTECTED], Kevin Day writes:
  
  Ack, I was using this very same thing for several devices in an isolated
  peer-to-peer network to decide who the 'master' was. (Whoever had been up
  longest knew more about the state of the network) Having this change could
  cause weirdness for me too... I assumed (without checking *thwap*) that
  boottime was a constant.
  
  Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
  'boottime' gets initialized so that others can use it, not just NFS? :)
  
  no, I think that is a bad idea.  In your case you want to use the
  "uptime" which *is* a measure of how long the system has been
  running.
 
 Uptime is also a constantly changing number.  Forgive me for my
 ignorance, but why does bootime constantly change?  I would have thought
 it would be a constant?  I've got software that also uses this to
 determine when a new copy of it exists (although I do keep a local cache
 of the value in case my software crashes, since it can recover from a
 crash, but not a reboot).
 
 I would think that boottime would be constant, since you didn't keep
 booting at a different time...
 

Yeah, uptime is moving which makes it difficult for me too. When new
machines enter the network, they need to announce a number which is used to
decice who will become the master if the current master disappears. I could
just announce currenttime-uptime, but that's got a slightly different
meaning that I'll have to consider.

Anyway, enough of my proprietary mess, but... I do see a few uses for a
non-moving boottime, but won't argue here or now. :) This behaviour is
documented in time(9) though, so I really can't complain. :)

Kevin


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Poul-Henning Kamp



Yeah, uptime is moving which makes it difficult for me too. When new
machines enter the network, they need to announce a number which is used to
decice who will become the master if the current master disappears. I could
just announce currenttime-uptime, but that's got a slightly different
meaning that I'll have to consider.

just announce uptime, the one with the largest number wins.


--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Matthew Dillon



:
:
:Yeah, uptime is moving which makes it difficult for me too. When new
:machines enter the network, they need to announce a number which is used to
:decice who will become the master if the current master disappears. I could
:just announce currenttime-uptime, but that's got a slightly different
:meaning that I'll have to consider.
:
:just announce uptime, the one with the largest number wins.
:
:--
:Poul-Henning Kamp FreeBSD coreteam member
:[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
:FreeBSD -- It will take a long time before progress goes too far!

Go outside, kill the circuit breaker, then turn it back on.

It's easier just to use the IP address.  Highest number wins.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Warner Losh


In message [EMAIL PROTECTED] Poul-Henning Kamp writes:
: If people do a "settimeofday" we change the boot time since the
: amount of time we've been up *IS* known for sure, whereas the boottime
: is only an estimate.

There is one problem with this.  The amount of uptime isn't the same
as the amount of time since the machine booted.  How can this happen?
When a laptop suspends, it doesn't update the update while it is
asleep, nor does it update the uptime by the amount of time that has
been slept.  IS this a bug in the apm code?

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Poul-Henning Kamp


In message [EMAIL PROTECTED], Warner Losh writes:
In message [EMAIL PROTECTED] Poul-Henning Kamp writes:
: If people do a "settimeofday" we change the boot time since the
: amount of time we've been up *IS* known for sure, whereas the boottime
: is only an estimate.

There is one problem with this.  The amount of uptime isn't the same
as the amount of time since the machine booted.  How can this happen?
When a laptop suspends, it doesn't update the update while it is
asleep, nor does it update the uptime by the amount of time that has
been slept.  IS this a bug in the apm code?

Well, I don't think anybody has seriously thought about what the right
semantics for APM is, and consequently the code we have is rather evil.

What to do is a definition question more than anything, and I guess the
answer to the question:

if I call timeout(bla bla bla, 3600*hz) and suspend the machine
for half an hour, how long time after it resumes will I be
called ?

will point the direction.

In other words:
Do routes expire while suspended ?  
Do TCP timers tick ?

I would say "they sure should do, because they relates to external
events" (if we accept that as the answer we need to to call softclock
a LOT of times when we come out of suspend).

In reality we have not clear definition of "suspend" for a unix system,
and the kernel may need to learn about "timeouts on the kernel consious 
timescale" vs. "timeouts on the wallclock timescale" and similar hair.

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Nate Williams


 : If people do a "settimeofday" we change the boot time since the
 : amount of time we've been up *IS* known for sure, whereas the boottime
 : is only an estimate.
 
 There is one problem with this.  The amount of uptime isn't the same
 as the amount of time since the machine booted.  How can this happen?
 When a laptop suspends, it doesn't update the update while it is
 asleep, nor does it update the uptime by the amount of time that has
 been slept.

FWIW, we had code in the tree (just before the timeout_ch changes) that
did update all of the timeouts to 'fire' when the laptop was resumed.

This caused a 'thundering herd' problem at resume, but I don't see any
way around it...  However, it was lost when we changed to the different
timeout code.




Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Warner Losh


In message [EMAIL PROTECTED] Poul-Henning Kamp writes:
: Well, I don't think anybody has seriously thought about what the right
: semantics for APM is, and consequently the code we have is rather evil.

Don't know if I'd go so far as to say evil, but there are some pola
issues.

: What to do is a definition question more than anything, and I guess the
: answer to the question:
: 
:   if I call timeout(bla bla bla, 3600*hz) and suspend the machine
:   for half an hour, how long time after it resumes will I be
:   called ?
: 
: will point the direction.

It will be called 3600*hz softclock ticks after the original timeout
was called, which could be  a wall time of 1 hour.  I'd guess that it
would be after 1 hour and 30 minutes.

: In other words:
:   Do routes expire while suspended ?  
:   Do TCP timers tick ?
:
: I would say "they sure should do, because they relates to external
: events" (if we accept that as the answer we need to to call softclock
: a LOT of times when we come out of suspend).

Uggg.  That's right.  The current approach is to step the current time
by the amount of time we slept, based on rtc measurements (high
precision time keeping at its absolute worst).

: In reality we have not clear definition of "suspend" for a unix system,
: and the kernel may need to learn about "timeouts on the kernel consious 
: timescale" vs. "timeouts on the wallclock timescale" and similar hair.

Yes.  Some timeouts don't matter over a suspend (eg make sure that
this card isn't wedged) while some should take the suspend into
account (need to expire routes, arp entries, etc)

Warner



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Tom Bartol




On Thu, 16 Dec 1999, Warner Losh wrote:

 In message [EMAIL PROTECTED] Tom Bartol 
writes:
 : IIRC it does update uptime properly after a suspend in 2.2.8 but does not
 : do so in 3.X and -current on my ThinkPad 770.
 
 define correctly.  Eg, if I suspend for an hour it adds an hour?
 
 Warner
 

Yeah, that's what I meant by "correctly".  I don't recall seeing a
"thundering herd" effect afterwards.  Hmmm... which reminds me, I believe
this was not stock 2.2.8 but rather 2.2.8-PAO.  I had thought that the
lion's share of PAO code got merged into 3.0-current at some point.  When
I tried 3.0-current after this merge, suspend and resume worked fine on my
770 with the exception of uptime.

Tom




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Warner Losh


In message [EMAIL PROTECTED] Tom Bartol 
writes:
: I tried 3.0-current after this merge, suspend and resume worked fine on my
: 770 with the exception of uptime.

I can confirm that uptime, at least as reported by uptime(1), isn't
increased in the latest -current.

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Tom Bartol




On Thu, 16 Dec 1999, Warner Losh wrote:

 In message [EMAIL PROTECTED] Tom Bartol 
writes:
 : I tried 3.0-current after this merge, suspend and resume worked fine on my
 : 770 with the exception of uptime.
 
 I can confirm that uptime, at least as reported by uptime(1), isn't
 increased in the latest -current.
 
 Warner
 

I confirm this as well.  Perhaps after suspend we need:

Allow ntp to update time and adjust boottime as necessary.
Then set uptime = time-boottime

Tom




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Andrew Kenneth Milton


+[ Warner Losh ]-
|
| There is one problem with this.  The amount of uptime isn't the same
| as the amount of time since the machine booted.  How can this happen?
| When a laptop suspends, it doesn't update the update while it is
| asleep, nor does it update the uptime by the amount of time that has
| been slept.  IS this a bug in the apm code?

The machine is neither up or down until you collapse the even horizon and
unsuspend it to observe its state d8)

-- 
Totally Holistic Enterprises Internet|  P:+61 7 3870 0066   | Andrew Milton
The Internet (Aust) Pty Ltd  |  F:+61 7 3870 4477   | 
ACN: 082 081 472 |  M:+61 416 022 411   | Carpe Daemon
PO Box 837 Indooroopilly QLD 4068|[EMAIL PROTECTED]| 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Mike Smith


 

 On Thu, 16 Dec 1999, Warner Losh wrote:
 
  In message [EMAIL PROTECTED] Poul-Henning Kamp writes:
  : If people do a "settimeofday" we change the boot time since the
  : amount of time we've been up *IS* known for sure, whereas the boottime
  : is only an estimate.
  
  There is one problem with this.  The amount of uptime isn't the same
  as the amount of time since the machine booted.  How can this happen?
  When a laptop suspends, it doesn't update the update while it is
  asleep, nor does it update the uptime by the amount of time that has
  been slept.  IS this a bug in the apm code?
 
 IIRC it does update uptime properly after a suspend in 2.2.8 but does not
 do so in 3.X and -current on my ThinkPad 770.

Not updating uptime to account for time slept is the "correct" behaviour 
given the way the kernel currently thinks about things, where "correct" 
is defined as "most survivable".

-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  [EMAIL PROTECTED]
\\ and he'll hate you for a lifetime. \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Andrew Gallatin



Matthew Dillon writes:

  We're already testing a patch.

Thanks again Matt!

The latest rev of nfs_serv.c has fixed it.  

I'm now seeing FreeBSD UDP client read bandwidth of 9.2MB/sec  write
bandwidth of 10.9MB/sec.  Solaris clients are writing over TCP at
10.1MB/sec (and that is across a router!) and are reading at 7MB/sec.

Awesome!

Thanks,

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Matthew Dillon



:Matthew Dillon writes:
:
:  We're already testing a patch.
:
:Thanks again Matt!
:
:The latest rev of nfs_serv.c has fixed it.  
:
:I'm now seeing FreeBSD UDP client read bandwidth of 9.2MB/sec  write
:bandwidth of 10.9MB/sec.  Solaris clients are writing over TCP at
:10.1MB/sec (and that is across a router!) and are reading at 7MB/sec.
:
:Awesome!
:
:Thanks,
:
:Drew
:
:--
:Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin
:Duke UniversityEmail: [EMAIL PROTECTED]

Those are really quite excellent results!  Linux eat your heart out!  I
get 9.5 to 10.5 MBytes/sec on my half-duplex network between two fast
machines.  I tend to get between 7.5 and 9 MBytes/sec when using slower
(200-300 MHz) clients.  That's *with* packet loss (for some reason when
my fxp ethernets pump data out that quickly they tend to cause packet
loss in other parts of my HUBed network, which I find quite annoying).

We've solved most of the performance issues, but NFS is still
eating a little too much cpu for my tastes.  Unfortunately it is getting to
the point where a significant portion of the performance loss is occuring
in the network driver itself.  Some of my cards eat 25% of the cpu just
in 'interrupt' (at 10 MBytes/sec half duplex), not even counting the
TCP or UDP stacks.  This is mainly due to the MTU being too small (i.e.
packet fragmentation takes it toll on the interrupt subsystem).  SCSI 
cards are way ahead of NIC cards in regards to reducing interrupt 
overhead (though gigabit NICs have caught up some).

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-16 Thread Kenneth D. Merry


On Thu, Dec 16, 1999 at 19:28:34 -0800, Matthew Dillon wrote:
 :Matthew Dillon writes:
 :
 :  We're already testing a patch.
 :
 :Thanks again Matt!
 :
 :The latest rev of nfs_serv.c has fixed it.  
 :
 :I'm now seeing FreeBSD UDP client read bandwidth of 9.2MB/sec  write
 :bandwidth of 10.9MB/sec.  Solaris clients are writing over TCP at
 :10.1MB/sec (and that is across a router!) and are reading at 7MB/sec.

[ ... ]

 :Andrew Gallatin, Sr Systems Programmer   http://www.cs.duke.edu/~gallatin
 :Duke University  Email: [EMAIL PROTECTED]
 
 Those are really quite excellent results!  Linux eat your heart out!  I
 get 9.5 to 10.5 MBytes/sec on my half-duplex network between two fast
 machines.  I tend to get between 7.5 and 9 MBytes/sec when using slower
 (200-300 MHz) clients.  That's *with* packet loss (for some reason when
 my fxp ethernets pump data out that quickly they tend to cause packet
 loss in other parts of my HUBed network, which I find quite annoying).
 
 We've solved most of the performance issues, but NFS is still
 eating a little too much cpu for my tastes.  Unfortunately it is getting to
 the point where a significant portion of the performance loss is occuring
 in the network driver itself.  Some of my cards eat 25% of the cpu just
 in 'interrupt' (at 10 MBytes/sec half duplex), not even counting the
 TCP or UDP stacks.  This is mainly due to the MTU being too small (i.e.
 packet fragmentation takes it toll on the interrupt subsystem).  SCSI 
 cards are way ahead of NIC cards in regards to reducing interrupt 
 overhead (though gigabit NICs have caught up some).


Another advantage with gigabit ethernet is that if you can do jumbo frames,
you can fit an entire 8K NFS packet in one frame.

I'd like to see NFS numbers from two 21264 Alphas with GigE cards, zero
copy, checksum offloading and a big striped array on one end at least.  I
bet you could get pretty good performance with a setup like that.  (Now
don't anybody go out and do it unless you really want to spend the time.)

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-15 Thread Matthew Dillon


:However, I'm seeing a showstopping problem when running newer kernels:
:When writing a large file via TCP, a Solaris 2.7 client pauses when
:closing the file, and appears to become stuck in an infinate loop.
:Eg:
:
:dd if=/dev/zero of=zot bs=64k count=8192
:8192+0 records in
:8192+0 records out
:^C - wedge
:
:The process does not exit, and there is a flurry of activity between
:the client  server:
:
:solaris - freebsd NFS C WRITE3 FH=F5CB at 369655808 for 32768 (ASYNC)
:
:freebsd - solaris TCP D=843 S=2049 Ack=1906313520 Seq=94979688 Len=164 Win=33176
:freebsd - solaris RPC R (#5146) XID=299504169 Success
:freebsd - solaris NFS R WRITE3 OK 32768 (ASYNC)
:

This is very odd.  Does it lockup with UDP or only with TCP?   And only
with a solaris client?

It's possible that the problem may be related to changes in the TCP
stack rather then related to NFS.  Looking at your trace output
I don't see anything wrong beyond the solaris client apparently repeating
write requests.  Another possibility is that we somehow nixed the
commit rpc.  All the repeated writes occur after the commit rpc.  In
the protocol trace, though, the commit rpc succeeds.  Very odd!

:I would think that this is not our fault, except things work just dandy
:with a kernel from July.  In fact, the only way out of this situation
:is to reboot the FreeBSD NFS server into an older kernel.
:
:In the trace (about 30 seconds or so of the activity after dd
:finished, but before it exited) there are ~21,000 packets.  There is a
:grand total of:
:
:NFS C WRITE3:  11024
:NFS R WRITE3 OK:   10499
:NFS C COMMIT3: 1
:NFS R COMMIT3 OK:  1
:
:In case more details are needed, I've left the complete trace in
:~gallatin/nfs-trace.gz on freefall.

I've looked at it -- it looks normal except for the repeated writes.

:Also, while read performance has improved by 44%, write performance
:has degraded by between 50 - 70% (FreeBSD clients)!  Here are some
:quick benchmarks.  Note that the file size of 512MB is larger than
:memory on both the server and client.  Also note that the disk array
:on the server will read at 50MB/sec and write at 40MB/sec, so we are
:not disk bound ;-)
:
:- UDP NFS write performance from a FreeBSD client:

Ok, I'll take a look at it.  I get 9 MBytes/sec writing over UDP links
but only with fast (400MHz class) clients.  With a 200 MHz class client
I am only getting 4-5 MBytes/sec.  Frankly, I should be getting 9 MB/sec
even with the slower client!

:UDP NFS Read performance has gotten better:
:
:July's kernel:
:% dd if=zot of=/dev/null bs=64k
:8192+0 records in
:8192+0 records out
:536870912 bytes transferred in 84.621477 secs (6344381 bytes/sec)
:
:Today's kernel:
:dd if=zot of=/dev/null bs=64k
:8192+0 records in
:8192+0 records out
:536870912 bytes transferred in 58.544409 secs (9170319 bytes/sec)
:
:Cheers,
:
:Drew

Yes, read performance has been improved in just the last few days
simply by adding a read heuristic to the server side for transfers off
the physical media.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

:--
:Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin
:Duke UniversityEmail: [EMAIL PROTECTED]
:Department of Computer Science Phone: (919) 660-6590
:
:
:
:To Unsubscribe: send mail to [EMAIL PROTECTED]
:with "unsubscribe freebsd-current" in the body of the message
:



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-15 Thread Andrew Gallatin



Matthew Dillon writes:
  This is very odd.  Does it lockup with UDP or only with TCP?   And only
  with a solaris client?

This appears to be solaris only.  I just tried a UDP mount  I see the
same problem.   Is there anything else I can do?


...
  :
  :- UDP NFS write performance from a FreeBSD client:
  
  Ok, I'll take a look at it.  I get 9 MBytes/sec writing over UDP links
  but only with fast (400MHz class) clients.  With a 200 MHz class client
  I am only getting 4-5 MBytes/sec.  Frankly, I should be getting 9 MB/sec
  even with the slower client!

The client and server here are both 450MHz machines.  The server
and client are identical (450MHz PII (or PIII), etherexpress pro nics) 
except that the server has 384MB of ram and the client has
64MB.  The client was running a kernel from last week.  My desktop
(196 MB 300MHZ PII) shows similar behaviour -- a drop from 7MB/s to
3MB/s.  It is running a kernel from Monday.

...

  Yes, read performance has been improved in just the last few days
  simply by adding a read heuristic to the server side for transfers off
  the physical media.

Nice!

Thanks,

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-15 Thread Matthew Dillon



:
:
:Matthew Dillon writes:
:  This is very odd.  Does it lockup with UDP or only with TCP?   And only
:  with a solaris client?
:
:This appears to be solaris only.  I just tried a UDP mount  I see the
:same problem.   Is there anything else I can do?

Yes, see if you can repeat the problem with a shorter dd count -- see
how small a count you can achieve and still produce the problem, then
do a nice long protocol trace.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-15 Thread Matthew Dillon



:Also, while read performance has improved by 44%, write performance
:has degraded by between 50 - 70% (FreeBSD clients)!  Here are some
:quick benchmarks.  Note that the file size of 512MB is larger than
:memory on both the server and client.  Also note that the disk array
:on the server will read at 50MB/sec and write at 40MB/sec, so we are
:not disk bound ;-)
:
:- UDP NFS write performance from a FreeBSD client:
:
:July's kernel: 
:% dd if=/dev/zero of=zot bs=1024k count=512
:512+0 records in
:512+0 records out
:536870912 bytes transferred in 52.780773 secs (10171714 bytes/sec)
:
:Today's kernel::
:% dd if=/dev/zero of=zot bs=1024k count=512
:512+0 records in
:512+0 records out
:536870912 bytes transferred in 141.593458 secs (3791636 bytes/sec)

Question on these:  Is it the client you are running the old and new
kernels on or the server?  Also, make sure you are running the same
number of nfsiod's on each (and also try running a different number
of nfsiod's).

At the moment I am assuming you ran these tests with the client running
the old and new kernel.  I have some ideas there in regards to inefficient
context switching when the nfsiod's are saturated that I am testing.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-15 Thread Andrew Gallatin



Matthew Dillon writes:
  
  :Also, while read performance has improved by 44%, write performance
  :has degraded by between 50 - 70% (FreeBSD clients)!  Here are some
  :quick benchmarks.  Note that the file size of 512MB is larger than
  :memory on both the server and client.  Also note that the disk array
  :on the server will read at 50MB/sec and write at 40MB/sec, so we are
  :not disk bound ;-)
  :
  :- UDP NFS write performance from a FreeBSD client:
  :
  :July's kernel:  
  :% dd if=/dev/zero of=zot bs=1024k count=512
  :512+0 records in
  :512+0 records out
  :536870912 bytes transferred in 52.780773 secs (10171714 bytes/sec)
  :
  :Today's kernel::
  :% dd if=/dev/zero of=zot bs=1024k count=512
  :512+0 records in
  :512+0 records out
  :536870912 bytes transferred in 141.593458 secs (3791636 bytes/sec)
  
  Question on these:  Is it the client you are running the old and new
  kernels on or the server?  Also, make sure you are running the same
  number of nfsiod's on each (and also try running a different number
  of nfsiod's).
  
  At the moment I am assuming you ran these tests with the client running
  the old and new kernel.  I have some ideas there in regards to inefficient
  context switching when the nfsiod's are saturated that I am testing.
  
   -Matt

Sorry I wasn't clear.  The variable was the server's kernel version.
The client's kernel remained constant.  It was from sources built late
week.   

When you are measuring server write performance, are you looking at
the numbers from the client's perspective, or are you looking at the
numbers from the point of view of the server's disks.

I ask because if I watch the server via systat, I see a steady
10-11MB/sec hitting the disk.  However, if I time the process, or look 
at the output from dd, I see the poor ( 4MB/sec) numbers I reported.  


Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-15 Thread Andrew Gallatin



Matthew Dillon writes:
  
  :
  :
  :Matthew Dillon writes:
  :  This is very odd.  Does it lockup with UDP or only with TCP?   And only
  :  with a solaris client?
  :
  :This appears to be solaris only.  I just tried a UDP mount  I see the
  :same problem.   Is there anything else I can do?
  
  Yes, see if you can repeat the problem with a shorter dd count -- see
  how small a count you can achieve and still produce the problem, then
  do a nice long protocol trace.
  
   -Matt

I've tried.  I can get the file to take a hell of a long time to close
with a count as short as 512 64k chunks.  But it eventually completes.
Perhaps the large files would eventually complete too, if I were
patient enough..

I've left you a full trace on freefall in ~gallatin of a 384MB test
(client has 320MB, server has 384MB).

large.gz :   trace from file creation to server reboot  Server is
 running kernel from today.

large_reboot.gz: trace after server is rebooted into ~July kernel
 and the writes complete.

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-15 Thread Matthew Dillon


Here's a general update on this bug report to -current.  It took all day
but I was finally able to reproduce Andrew's bug.

You guys are going to *love* this.

NFS uses the kernel 'boottime' structure to generate its version id.
Now normally you might believe that this structure, once set, will
never change.  The authors of NFS certainly make that assumption!

No such luck.  If you happen to be running, oh, xntpd for example,
the kernel adjusts the boottime structure to take into account time
changes, including PLL changes so, in fact, the boottime structure
can change quite often - once each tick, in fact.

Now, the effect of boottime changing on NFS is rather drastic.  You
see, the version id controls whether NFS clients must reset their
state machines for NFS data writes.  If a client has done a stage 1
write and is ready to do the stage 2 commit, and the version id
changes, the client must revert back to stage 1.

And so Andrews bug report comes into the light!   His poor client
(and mine once I reproduced the bug) got into a state, due to the
server returning a different version id for virtually every packet,
where it resent the same write data over the network over and over
and over and over and over again.

I think recent changes to the way the kernel clocks work in -current
brought the bug out into the open, but it's definitely a problem in
both -stable and -current.

Doh!  I gotta say that if I hadn't happened to have been running xntpd
on my test box I would have *never* figured it out.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-15 Thread Poul-Henning Kamp


In message [EMAIL PROTECTED], Matthew Dillon writes:
Here's a general update on this bug report to -current.  It took all day
but I was finally able to reproduce Andrew's bug.

You guys are going to *love* this.

NFS uses the kernel 'boottime' structure to generate its version id.
Now normally you might believe that this structure, once set, will
never change.  The authors of NFS certainly make that assumption!

Is this another case of "lets assume the time of day is a random number" or
is there any underlying assumption about time in this ?

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-15 Thread Poul-Henning Kamp


In message [EMAIL PROTECTED], Matthew Dillon writes:

:NFS uses the kernel 'boottime' structure to generate its version id.
:Now normally you might believe that this structure, once set, will
:never change.  The authors of NFS certainly make that assumption!
:
:Is this another case of "lets assume the time of day is a random number" or
:is there any underlying assumption about time in this ?
:
:--
:Poul-Henning Kamp FreeBSD coreteam member
:[EMAIL PROTECTED]   "Real hackers run -current on their laptop."

It basically needs to be a unique for each server reboot in order
to allow clients to resynchronize.

Ok, then I suggest that you cache a copy of the boottime in the NFS
code for this purpose.

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Serious server-side NFS problem

1999-12-15 Thread Kevin Day


 
 In message [EMAIL PROTECTED], Matthew Dillon writes:
 
 :NFS uses the kernel 'boottime' structure to generate its version id.
 :Now normally you might believe that this structure, once set, will
 :never change.  The authors of NFS certainly make that assumption!
 :
 :Is this another case of "lets assume the time of day is a random number" or
 :is there any underlying assumption about time in this ?
 :
 :--
 :Poul-Henning Kamp FreeBSD coreteam member
 :[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
 
 It basically needs to be a unique for each server reboot in order
 to allow clients to resynchronize.
 
 Ok, then I suggest that you cache a copy of the boottime in the NFS
 code for this purpose.
 

Ack, I was using this very same thing for several devices in an isolated
peer-to-peer network to decide who the 'master' was. (Whoever had been up
longest knew more about the state of the network) Having this change could
cause weirdness for me too... I assumed (without checking *thwap*) that
boottime was a constant.

Perhaps a 'real_boottime' or 'unadjusted_boottime' that gets copied after
'boottime' gets initialized so that others can use it, not just NFS? :)


Kevin


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

37 matches

Mail list logo