Re: 2xPIIIx450 results NFS results

1999-09-20 Thread Bruce Evans

 I remember this one too.  I think the problem is that we fail to 
 service the RTC intr for some reason.  This patch was only a 
 workaround, and received a verbal broadside from Bruce if I
 remember right.
 
 Maybe it should be added under a sysctl until a better solution
 is know.
 
 --- clock.c   Sat Sep 18 22:41:40 1999
 +++ clock.c.new   Sun Sep  5 13:21:35 1999
 @@ -203,4 +203,6 @@
  clkintr(struct clockframe frame)
  {
 + while (rtcin(RTC_INTR)  RTCIR_PERIOD)
 + statclock(frame);
   if (timecounter-tc_get_timecount == i8254_get_timecount) {
   disable_intr();

Use a watchdog timeout like you should for any device that may hang.
Don't waste time running it every clock tick.

ISTR that we thought that the bug might be caused by a bug in unwanted
SMI interrupt handling.

Bruce



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-20 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Bruce E
vans writes:
 I remember this one too.  I think the problem is that we fail to 
 service the RTC intr for some reason.  This patch was only a 
 workaround, and received a verbal broadside from Bruce if I
 remember right.
 
 Maybe it should be added under a sysctl until a better solution
 is know.
 
 --- clock.c  Sat Sep 18 22:41:40 1999
 +++ clock.c.new  Sun Sep  5 13:21:35 1999
 @@ -203,4 +203,6 @@
  clkintr(struct clockframe frame)
  {
 +while (rtcin(RTC_INTR)  RTCIR_PERIOD)
 +statclock(frame);
  if (timecounter-tc_get_timecount == i8254_get_timecount) {
  disable_intr();

Use a watchdog timeout like you should for any device that may hang.
Don't waste time running it every clock tick.

ISTR that we thought that the bug might be caused by a bug in unwanted
SMI interrupt handling.

If anybody can reproduce this reliably on a *BX chipset I have
code that will block SMI interrupts we can test with...

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-20 Thread Mike Smith

 Use a watchdog timeout like you should for any device that may hang.
 Don't waste time running it every clock tick.
 
 ISTR that we thought that the bug might be caused by a bug in unwanted
 SMI interrupt handling.
 
 If anybody can reproduce this reliably on a *BX chipset I have
 code that will block SMI interrupts we can test with...

I'm not sure I follow what the alleged problem is here; who is handling 
the "unwanted" SMIs?  If it's the BIOS, the last thing you want to do 
is block all SMIs. 

-- 
\\  The mind's the standard   \\  Mike Smith
\\  of the man.   \\  [EMAIL PROTECTED]
\\-- Joseph Merrick   \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-20 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Mike Smith writes:
 Use a watchdog timeout like you should for any device that may hang.
 Don't waste time running it every clock tick.
 
 ISTR that we thought that the bug might be caused by a bug in unwanted
 SMI interrupt handling.
 
 If anybody can reproduce this reliably on a *BX chipset I have
 code that will block SMI interrupts we can test with...

I'm not sure I follow what the alleged problem is here; who is handling 
the "unwanted" SMIs?  If it's the BIOS, the last thing you want to do 
is block all SMIs. 

The problem is that the RTC stalls.  It is suspected that it could
be long SMI durations which is responsible for this.  To confirm
the diagnosis disabling SMI is feasible.

You system will not blow up if you disable SMI, I have a system
happily chugging away here with SMI disabled for a month.

Doing so even idiot-proofs the soft-off button :-)

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-20 Thread Michael Reifenberger

Hi,
 Use a watchdog timeout like you should for any device that may hang.
 Don't waste time running it every clock tick.
 
 ISTR that we thought that the bug might be caused by a bug in unwanted
 SMI interrupt handling.
Upgrading the BIOS to Rev. 1010 does solve the problem for me without the patch
to clock.c.

Bye!

Michael Reifenberger
Plaut Software GmbH, R/3 Basis



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-19 Thread Andrzej Bialecki

On Sun, 19 Sep 1999, Adam Strohl wrote:

 OK, Upgraded my Asus P2B-D machien from BIOS version 1008 to 1010, the
 problem disappeared.  Popped back to my old 1008 BIOS, problem came back.
 
 Looks like there was some wierd issue that got resolved in 1009 or 1010.

Good! Thanks for the info. Now we have a solution.

Andrzej Bialecki

//  [EMAIL PROTECTED] WebGiro AB, Sweden (http://www.webgiro.com)
// ---
// -- FreeBSD: The Power to Serve. http://www.freebsd.org 
// --- Small  Embedded FreeBSD: http://www.freebsd.org/~picobsd/ 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Andrzej Bialecki

On Fri, 17 Sep 1999, Rodney W. Grimes wrote:

  : I/O, and then closing it.
  :
  :4.0-CURRENT (SMP on an ASUS P2B-DS with two CPU's installed; BIOS revision
  :1008.A, running `systat -vm 1' gives the normal display but without any
  :numbers filled in, then switches over to an empty screen that says:
  :...
  
  Whenever systat or top do weird things it probably means you
  need to recompile libkvm.
 
 This is not a libkvm problem on my box, these are fresh make worlds
 on 3.3-RC as of 2 days ago.  It only appears to occur when running SMP,

The problem seems to occur reliably on ASUS boards - perhaps a
coincidence, but I have several machines here which behave this way. And
yes, libkvm is in perfect sync with the rest of the system (3.3-RC)

Andrzej Bialecki

//  [EMAIL PROTECTED] WebGiro AB, Sweden (http://www.webgiro.com)
// ---
// -- FreeBSD: The Power to Serve. http://www.freebsd.org 
// --- Small  Embedded FreeBSD: http://www.freebsd.org/~picobsd/ 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread N

Matthew Dillon wrote:

 4.0-CURRENT (SMP on an ASUS P2B-DS with two CPU's installed; BIOS revision
 1008.A, running `systat -vm 1' gives the normal display but without any
 numbers filled in, then switches over to an empty screen that says:
[..]
 Whenever systat or top do weird things it probably means you
 need to recompile libkvm.

No.  This is the `broken statclock' thingy that has to do with APM in ways
I cannot fathom.  ASUS broke this in P2B-DS BIOS revision 1008 or
thereabouts.  Patches have been posted to several mailing lists, I was
wondering whether they've been committed somewhere along the line, and
whether APM was safe for inclusion into a 4.0-CURRENT SMP kernel again.

(Really, world and kernel are in sync, speed and duplex settings on the
 Ethernet card matches the switch, etc.)


-- Niels.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Jaye Mathisen


Yes, it definitely seems ASUS related...  I dropped back to uni processor
ASUS boards, and it's fine.  didn't need the SMP anyway, just wanted to
play with it some more.

On Sat, 18 Sep 1999, Andrzej Bialecki wrote:

 On Fri, 17 Sep 1999, Rodney W. Grimes wrote:
 
   : I/O, and then closing it.
   :
   :4.0-CURRENT (SMP on an ASUS P2B-DS with two CPU's installed; BIOS revision
   :1008.A, running `systat -vm 1' gives the normal display but without any
   :numbers filled in, then switches over to an empty screen that says:
   :...
   
   Whenever systat or top do weird things it probably means you
   need to recompile libkvm.
  
  This is not a libkvm problem on my box, these are fresh make worlds
  on 3.3-RC as of 2 days ago.  It only appears to occur when running SMP,
 
 The problem seems to occur reliably on ASUS boards - perhaps a
 coincidence, but I have several machines here which behave this way. And
 yes, libkvm is in perfect sync with the rest of the system (3.3-RC)
 
 Andrzej Bialecki
 
 //  [EMAIL PROTECTED] WebGiro AB, Sweden (http://www.webgiro.com)
 // ---
 // -- FreeBSD: The Power to Serve. http://www.freebsd.org 
 // --- Small  Embedded FreeBSD: http://www.freebsd.org/~picobsd/ 
 
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-current" in the body of the message
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Garrett Wollman

On Sat, 18 Sep 1999 01:16:52 + (GMT), Adam Strohl [EMAIL PROTECTED] said:

 I've been getting this too on 4.0-C, just rebuild last night, still there.
 top displays:
 CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0%
 idle

On my dual-PPro Intel BB440FX system I am not seeing this.

-GAWollman

--
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
[EMAIL PROTECTED]  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Andrzej Bialecki

On Sat, 18 Sep 1999, Jaye Mathisen wrote:

 
 Yes, it definitely seems ASUS related...  I dropped back to uni processor
 ASUS boards, and it's fine.  didn't need the SMP anyway, just wanted to
 play with it some more.

Now, here's the difference - I don't play, I NEED SMP, and the machines in
question are going to production soon... So, for me this is a real
problem.

Andrzej Bialecki

//  [EMAIL PROTECTED] WebGiro AB, Sweden (http://www.webgiro.com)
// ---
// -- FreeBSD: The Power to Serve. http://www.freebsd.org 
// --- Small  Embedded FreeBSD: http://www.freebsd.org/~picobsd/ 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Rodney W. Grimes

 On Sat, 18 Sep 1999 01:16:52 + (GMT), Adam Strohl [EMAIL PROTECTED] 
said:
 
  I've been getting this too on 4.0-C, just rebuild last night, still there.
  top displays:
  CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0%
  idle
 
 On my dual-PPro Intel BB440FX system I am not seeing this.

Do you have apm compiled in or not?  Is it enabled or disabled?  I'm
trying to track this down this afternoon and can use all the data I
can.

-- 
Rod Grimes - KD7CAX - (RWG25)[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Warner Losh

In message [EMAIL PROTECTED] Andrzej 
Bialecki writes:
: The problem seems to occur reliably on ASUS boards - perhaps a
: coincidence, but I have several machines here which behave this way. And
: yes, libkvm is in perfect sync with the rest of the system (3.3-RC)

We're seenig it too in the 19990815ish time frame.  This is both with
the 3.2R binaries AND the ones rebuilt and reinstalled.

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Michael R
eifenberger writes:

Hi,
 We're seenig it too in the 19990815ish time frame.  This is both with
 the 3.2R binaries AND the ones rebuilt and reinstalled.
Saw it too on my ASUS P2B-DS (F.Rev.1008) 
Solved by a patch flooding around to /sys/i386/isa/clock.c But why?
The patch is attached.

I remember this one too.  I think the problem is that we fail to 
service the RTC intr for some reason.  This patch was only a 
workaround, and received a verbal broadside from Bruce if I
remember right.

Maybe it should be added under a sysctl until a better solution
is know.

--- clock.c Sat Sep 18 22:41:40 1999
+++ clock.c.new Sun Sep  5 13:21:35 1999
@@ -203,4 +203,6 @@
 clkintr(struct clockframe frame)
 {
+   while (rtcin(RTC_INTR)  RTCIR_PERIOD)
+   statclock(frame);
if (timecounter-tc_get_timecount == i8254_get_timecount) {
disable_intr();

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Michael Reifenberger

Hi,
 We're seenig it too in the 19990815ish time frame.  This is both with
 the 3.2R binaries AND the ones rebuilt and reinstalled.
Saw it too on my ASUS P2B-DS (F.Rev.1008) 
Solved by a patch flooding around to /sys/i386/isa/clock.c But why?
The patch is attached.

Bye!

Michael Reifenberger
Plaut Software GmbH, R/3 Basis


--- clock.c Sat Sep 18 22:41:40 1999
+++ clock.c.new Sun Sep  5 13:21:35 1999
@@ -203,4 +203,6 @@
 clkintr(struct clockframe frame)
 {
+   while (rtcin(RTC_INTR)  RTCIR_PERIOD)
+   statclock(frame);
if (timecounter-tc_get_timecount == i8254_get_timecount) {
disable_intr();



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Peter Wemm

Mike Smith wrote:
  Matthew Dillon wrote: thereabouts.  Patches have been posted to several ma
iling lists, I was
  wondering whether they've been committed somewhere along the line, and
  whether APM was safe for inclusion into a 4.0-CURRENT SMP kernel again.
 
 APM and SMP are not functional in -current; this broke with the new 
 BIOS call mechanism and I have not yet been able to fathom the nature 
 of the problem.

According to Alan Cox (the Linux one), most APM BIOS implementations are
fundamentally incompatable with SMP.  (he said "all", not "most" actually).

Using APM for anything more than "turn the box off" after shutting down all
the AP's is going to be trouble, assuming he's right.

Cheers,
-Peter




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Mike Smith

 Matthew Dillon wrote: thereabouts.  Patches have been posted to several mailing 
lists, I was
 wondering whether they've been committed somewhere along the line, and
 whether APM was safe for inclusion into a 4.0-CURRENT SMP kernel again.

APM and SMP are not functional in -current; this broke with the new 
BIOS call mechanism and I have not yet been able to fathom the nature 
of the problem.
-- 
\\  The mind's the standard   \\  Mike Smith
\\  of the man.   \\  [EMAIL PROTECTED]
\\-- Joseph Merrick   \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results (was More benchmarking stuff...)

1999-09-18 Thread Greg Lehey

On Friday, 17 September 1999 at 11:17:48 -0700, Matthew Dillon wrote:
 : Might I then request that you help rewrite it so that it performs
 :a much more comprehensive testing of OS/filesystem throughput?
 :Myself, I'd really love to see something that lets you seriously
 :stress your system along the lines of Greg Lehey's rawio, but instead
 :at a higher level.  IMO, bonnie sucks worse than postmark, although
 :they're measuring different things.
 :
 : Although it should certainly be forking, whether forking or not I
 :can tell you that creating huge directories is not necessarily a bad
 :simulation of a heavily-used mail server.  I've seen mail servers
 :with over 100,000 files in /var/spool/mqueue, both at former
 :employers (like AOL), and at former customer sites (such as some of
 :the largest freemail providers in the world).

 What we really need is something that generates a performance
 curve based on several variables, including block size, locality of
 reference (seek randomosity), amount of parallelism, locality of
 parallelism (i.e. operating on same files vs different files), size of
 dataset in bytes, and size of dataset in files.

 The program should dynamically mess with all the variables until it
 gets a statistically relevant curve.

 I don't have the time to do it.  Sniff!

Sounds like rawio, sort of.  It doesn't do files, though.

Greg
--
See complete headers for address, home page and phone numbers
finger [EMAIL PROTECTED] for PGP public key


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Adam Strohl

OK, Upgraded my Asus P2B-D machien from BIOS version 1008 to 1010, the
problem disappeared.  Popped back to my old 1008 BIOS, problem came back.

Looks like there was some wierd issue that got resolved in 1009 or 1010.

- ( Adam Strohl ) -
-  UNIX Operations/Systems   http://www.digitalspark.net  -
-  adams (at) digitalspark.netxxx.xxx. x  -
- ( DigitalSpark.NET )--- -

On Sat, 18 Sep 1999, Garrett Wollman wrote:

 On Sat, 18 Sep 1999 01:16:52 + (GMT), Adam Strohl [EMAIL PROTECTED] 
said:
 
  I've been getting this too on 4.0-C, just rebuild last night, still there.
  top displays:
  CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0%
  idle
 
 On my dual-PPro Intel BB440FX system I am not seeing this.
 
 -GAWollman
 
 --
 Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
 [EMAIL PROTECTED]  | O Siem / The fires of freedom 
 Opinions not those of| Dance in the burning flame
 MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-current" in the body of the message
 
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Rodney W. Grimes

 OK, Upgraded my Asus P2B-D machien from BIOS version 1008 to 1010, the
 problem disappeared.  Popped back to my old 1008 BIOS, problem came back.

So far about alls I have confirmed is that the problem does not exists
with BIOS 1009 when the apm code is not compiled into the kernel.  I'll
have a full matrix of with/without apm 1008/1009/1010 some time tomarrow,
as the machines are building there system disks now.

 Looks like there was some wierd issue that got resolved in 1009 or 1010.
 
 On Sat, 18 Sep 1999, Garrett Wollman wrote:
 
   I've been getting this too on 4.0-C, just rebuild last night, still there.
   top displays:
   CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0%
   idle
  
  On my dual-PPro Intel BB440FX system I am not seeing this.
  

-- 
Rod Grimes - KD7CAX - (RWG25)[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-18 Thread Adam Strohl

On Sat, 18 Sep 1999, Rodney W. Grimes wrote:

 So far about alls I have confirmed is that the problem does not exists
 with BIOS 1009 when the apm code is not compiled into the kernel.  I'll
 have a full matrix of with/without apm 1008/1009/1010 some time tomarrow,
 as the machines are building there system disks now.

Sounds good.  Is there a way to detect the BIOS revision on boot or
something and either warn (ie; Say "Update yer BIOS!") or install a work
around?

- ( Adam Strohl ) -
-  UNIX Operations/Systems   http://www.digitalspark.net  -
-  adams (at) digitalspark.netxxx.xxx. x  -
- ( DigitalSpark.NET )--- -




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results (was More benchmarking stuff...)

1999-09-17 Thread Brad Knowles

At 11:56 AM -0700 1999/9/17, Matthew Dillon wrote:

 In real-life... for example, with a mail or web server, the namecache
 tends to be somewhat more effective then 50%.  The web servers at BEST
 generally had a 95%+ name cache hit rate.  The name cache misses are
 what are causing the lion's share of the directory inefficiencies.

Note that with a mail server, this is precisely the sort of thing 
that happens with /var/spool/mqueue.  In particular, with sendmail, a 
qf/df pair of files get created, the message is received, the sender 
is told "250 Ok", then sendmail goes to deliver the message in the 
background, which 95-99% of the time happens on the first attempt, 
and then the qf/df pair of files get deleted.

So, again, we see that they've actually done a decent first-pass 
attempt at simulating the load a mail server would place on the 
filesystem.  All that we need to add now are a few more features.  ;-)


With a news server using traditional spool, the file would tend 
to get created, live for a relatively significant period of time 
(days or even weeks before it gets expired), and only then would it 
get removed.

An absolutely full newsfeed these days is running somewhere 
around 1.1 million files comprising some 55GB of data (see 
http://transit.us-va.remarq.com/feed-size/), or an average of 
52,608.71 bytes per article.  A very busy mail server might do a 
million messages per day (or more), but the average message size 
would be much closer to 2-5KB.

The primary difference between mail and news is that news 
articles tend to live a lot longer (mail messages might live for 
months or years in a users mailbox, but that's not /var/spool/mqueue, 
and has a different pattern of access), and they tend to be a lot 
larger.

However, there are still on the same order of number of 
articles/messages input per day versus deleted per day (assuming 
you've got a handle on your spool and it's not growing out of control 
on you), it just typically takes news servers a few days to delete 
something once it comes in.


Thus, the name cache misses might tend to be lower on news 
servers, but I wouldn't be too willing to bet on it -- an article 
that got created seven days ago and not touched since probably won't 
be in the name cache any more than a file that is just now being 
created.

Of course, once you get into CNFS or timehash sorts of news spool 
storage mechanisms, you've traded the name cache problem and 
directory update problems in for an entirely different set of 
problems, and the filesystem may or may not be particularly good at 
optimizing them, as softupdates is with lots more smaller individual 
files.

-- 
   These are my opinions -- not to be taken as official Skynet policy
  
|o| Brad Knowles, [EMAIL PROTECTED]Belgacom Skynet NV/SA |o|
|o| Systems Architect, News  FTP Admin  Rue Col. Bourg, 124   |o|
|o| Phone/Fax: +32-2-706.11.11/12.49 B-1140 Brussels   |o|
|o| http://www.skynet.be Belgium   |o|
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
  Unix is like a wigwam -- no Gates, no Windows, and an Apache inside.
   Unix is very user-friendly.  It's just picky who its friends are.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results (was More benchmarking stuff...)

1999-09-17 Thread Brad Knowles

At 1:02 PM -0700 1999/9/17, Matthew Dillon wrote:

 Sendmail does not get into trouble with queue files it is able to retire
 quickly.  Where sendmail gets into trouble is with queue files it ISN'T
 able to retire quickly.  This is why you *see* 10,000+ files in mqueue
 at times.  These files build up because a small percentage of mail
 destinations cannot be delivered to immediately.

In my experience, sendmail almost always accepts messages a lot 
faster than it can process them, regardless of the mode in which 
sendmail is running.  Foreground, queue-only, background, in my 
experience it doesn't make a whole lot of difference.

So the input number of files can grow very quickly, with 
deliveries (and mqueue file removals) suffering.

 The reason sendmail tends to break down with large queue directories has
 little to do with directory overhead and a lot to do with sendmail's own
 algorithms.  If you have 50 sendmails running a 10,000 file queue, each
 of those sendmail processes is essentially scanning the entire queue.

Yup, that's another very real problem with sendmail, and another 
reason why I really like postfix.

 If not controlled, this eventually leads to a cascade failure.  The
 potential for a cascade failure is, in fact, the number one reason for
 *NOT* running sendmail with background queueing mode turned on.  The
 best way to avoid a cascade failure is to run the sendmail daemon in
 queue-only mode with a set fork limit:

   sendmail -bd -OMaxDaemonChildren=X -ODeliveryMode=q

 And run the sendmail queue runner separately:

   sendmail -q1m -OMaxDaemonChildren=Y -OMinQueueAge=1h

In my experience, you still get messages being accepted a lot 
faster than they're being pushed out.  This is usually made even 
worse when you're delivering to mbox-style mailboxes, where a single 
large message may come in addressed to dozens of recipients, and now 
you might have megabytes of data being read but gigabytes of data 
being written.

This is another of my pet peeves, where I believe that a 
database-oriented solution that writes just one copy of the message 
and then gives all recipients pointers to it, would help a great deal.


Of course, there's not anyone like Eric or Wietse to pick on when 
it comes to writing database-oriented local delivery agents.

  The key issue with any mail server is that bandwidth and transaction
  useage tends to be low relatively speaking.  A USENET news system
  almost always has much higher transactional overhead, especially if it
  is taking several feeds.  A million news messages a day translates to
  around 10 million protocol transactions for a news box taking 4 feeds.

However, most of those transactions should be looking up 
message-ids in history or precommit cache databases and then refusing 
the article without it actually being transmitted.

High transactional rates, yes.  But the actual number of articles 
being received would be on par with a very busy mail server.  If 
you've got a lot of outbound feeds, of course the outbound data rate 
could be a very real problem, but that's a separate issue.

   What you cannot afford to spend time
  doing in a mail server is scanning the same queue file over and over
  again, so what you want to optimize for are the 5% of email messages
  that wind up stuck in the queue for more then a few minutes but usually
  less then an hour, and then make sure the 1% that stick around past
  that do not interfere with the processing of those that stick around
  less.

You can't really fix sendmail in this regard, although you could 
replace it with a different MTA.


I guess you could change the implementation methods of the 
underlying filesystem so as to speed up those constant linear sweeps 
of the entire mqueue directory by the queue runners (and by every 
sendmail process that goes to create a file in the mqueue, since they 
have to guarantee that the filename they're creating is unique).

How you would actually do this is totally beyond me, however.

-- 
   These are my opinions -- not to be taken as official Skynet policy
  
|o| Brad Knowles, [EMAIL PROTECTED]Belgacom Skynet NV/SA |o|
|o| Systems Architect, News  FTP Admin  Rue Col. Bourg, 124   |o|
|o| Phone/Fax: +32-2-706.11.11/12.49 B-1140 Brussels   |o|
|o| http://www.skynet.be Belgium   |o|
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
  Unix is like a wigwam -- no Gates, no Windows, and an Apache inside.
   Unix is very user-friendly.  It's just picky who its friends are.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the 

Re: 2xPIIIx450 results NFS results

1999-09-17 Thread N

Matthew Dillon wrote:

[..]
 One thing of interest to note, especially as it relates to the
 performance degredation with a larger number of files, is that
 'systat -vm 1' reports an approximately 50% name-cache hit no
 matter what postmark is doing.  In otherwords, postmark is creating
 a new file (namecache miss), opening it (namecache hit), doing some
 I/O, and then closing it.

4.0-CURRENT (SMP on an ASUS P2B-DS with two CPU's installed; BIOS revision
1008.A, running `systat -vm 1' gives the normal display but without any
numbers filled in, then switches over to an empty screen that says:

  The alternate system clock has died!
  Reverting to ``pigs'' display.

Which also doesn't work (I'm sure innd would be considered a CPU and
memory hog but nothing is displayed).  top is also broken (0% everywhere).
Apparently this can be fixed by adding `device apm0 at nexus? flags
0x0020' to the kernel config file, but the last time I tried that the
machine would panic while booting.  Has this been fixed since?


 In real-life... for example, with a mail or web server, the namecache
 tends to be somewhat more effective then 50%.  The web servers at BEST
 generally had a 95%+ name cache hit rate.  The name cache misses are
 what are causing the lion's share of the directory inefficiencies.

100% on another news server (3.2-STABLE, INN 2.2 with CNFS) :-) (only
watched it for a few moments though, lowest was 97.)

Thanks,


-- Niels.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-17 Thread Matthew Dillon

: I/O, and then closing it.
:
:4.0-CURRENT (SMP on an ASUS P2B-DS with two CPU's installed; BIOS revision
:1008.A, running `systat -vm 1' gives the normal display but without any
:numbers filled in, then switches over to an empty screen that says:
:...

Whenever systat or top do weird things it probably means you
need to recompile libkvm.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: 2xPIIIx450 results NFS results

1999-09-17 Thread Adam Strohl

I've been getting this too on 4.0-C, just rebuild last night, still there.

top displays:
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0%
idle

AND loads  2 make the machine very unresponsive, its like SMP was before
that pci_support.c patch a month or two ago.

- ( Adam Strohl ) -
-  UNIX Operations/Systems   http://www.digitalspark.net  -
-  adams (at) digitalspark.netxxx.xxx. x  -
- ( DigitalSpark.NET )--- -

On Fri, 17 Sep 1999, Rodney W. Grimes wrote:

  : I/O, and then closing it.
  :
  :4.0-CURRENT (SMP on an ASUS P2B-DS with two CPU's installed; BIOS revision
  :1008.A, running `systat -vm 1' gives the normal display but without any
  :numbers filled in, then switches over to an empty screen that says:
  :...
  
  Whenever systat or top do weird things it probably means you
  need to recompile libkvm.
 
 This is not a libkvm problem on my box, these are fresh make worlds
 on 3.3-RC as of 2 days ago.  It only appears to occur when running SMP,
 and has been a problem in the past if you look at the cvs log for
 systat/vmstat.c.  Search for the specific message given by this
 user in the log, you'll see it has come and gone at least once.
 
 I already sent one message out about this, in response to Jordans
 ``release tag going down''.
 
 -- 
 Rod Grimes - KD7CAX - (RWG25)[EMAIL PROTECTED]
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-current" in the body of the message
 
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message