Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-24 Thread Vallo Kallaste

On Tue, Apr 23, 2002 at 09:40:11PM -0700, David Schultz
[EMAIL PROTECTED] wrote:

  Userspace processes will allocate memory from UVA space and can
  grow over 1GB of size if needed by swapping.  You can certainly
  have more than one over-1GB process going on at the same time,
  but swapping will constrain your performance.
 
 It isn't a performance constraint.  32-bit architectures have
 32-bit pointers, so in the absence of segmentation tricks, a
 virtual address space can only contain 2^32 = 4G locations.  If
 the kernel gets 3 GB of that, the maximum amount of memory that
 any individual user process can use is 1 GB.  If you had, say, 4
 GB of physical memory, a single user process could not use it all.
 Swap increases the total amount of memory that *all* processes can
 allocate by pushing some of the pages out of RAM and onto the
 disk, but it doesn't increase the total amount of memory that a
 single process can address.

Thank you, Terry and David, now I grasp how it should work (I hope).
I really miss some education, but that's life.
-- 

Vallo Kallaste
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-24 Thread Terry Lambert

David Schultz wrote:
 Thus spake Terry Lambert [EMAIL PROTECTED]:
  Writing a useful (non-fluff) technical book, optimistically,
  takes 2080 hours ... or 40 hours per week for 52 weeks... a man
  year.
 
  By the time you are done, the book is a year out of date, and
  even if you worked really hard and kept it up to date (e.g. you
  had 4 authors and spent only 6 months of wall time on the book),
  the shelf life on the book is still pretty short.
 
 Although it would be unreasonable to comprehensively document the
 kernel internals and expect the details to remain valid for a year,
 there is a great deal of lasting information that could be conveyed.
 For example, Kirk's 4.[34]BSD books cover obsolete systems, and yet
 much of what they say applies equally well to recent versions of
 FreeBSD.

These are general OS architecture books by a noted authority on
OS architecture.  That's a barrier to entry for other authors,
as the intrinsic value in the information is not constrained to
the direct subject of the work.  8-).

Kirk is supposedly working on a similar book for FreeBSD, release
date indeterminate.

In any case, this doesn't resolve the issue of Where do I go to
do XXX to version YYY, without having to learn everything there is
to know about YYY?.


 It's true that the specific question ``How do I change my KVA size?''
 might have different answers at different times, but I doubt that the
 ideas behind an answer have all been invented in the last few months.
 Even things like PAE, used by the Linux 2.4 kernel, remind me of how
 DOS dealt with the 1 MB memory limit.

The PAE is the thing that Peter was reportedly working on in order
to break the 4G barrier on machines capable of accessing up to 16G
of RAM using bank selection.  I didn't mention it by name, since
the general principle is also applicable to the Alpha, which has a
current limit of 2G because of DMA barrier and other constraints.


While it's true that the ideas behind the answer remain the same...
the ideas behind the answer are already published in the books I've
already referenced in the context of this thread.

If people were content to discover implementation details based on
a working knowledge of general principles, then this thread would
never have occurred in the first place.


It's my opinion that people are wanting to do more in depth things
to the operating system, and that there is a latency barrier in the
way of them doing this.  My participation in this discussion, and in
particular, with regard to the publication of thorough and useful
documentation, has really been around this point.


-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-24 Thread Steve O'Hara-Smith

On Mon, 22 Apr 2002 06:04:34 -0700
Terry Lambert [EMAIL PROTECTED] wrote:

TL FreeBSD doesn't currently support bank selection.  Peter was
TL working on it, last time I heard.  Linux supports it, at an
TL incredible performance penalty.

This inspired an off the wall thought that may be insane. Would
it be possible (on a 4Gb system) to address 4Gb of RAM and write a driver
to make the rest appear as a device, which could then be used for a
preferred or (even neater) first level swap.

-- 
C:WIN  | Directable Mirrors
The computer obeys and wins.|A Better Way To Focus The Sun
You lose and Bill collects. |  licenses available - see:
|   http://www.sohara.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-24 Thread Terry Lambert

Steve O'Hara-Smith wrote:
 On Mon, 22 Apr 2002 06:04:34 -0700
 Terry Lambert [EMAIL PROTECTED] wrote:
 
 TL FreeBSD doesn't currently support bank selection.  Peter was
 TL working on it, last time I heard.  Linux supports it, at an
 TL incredible performance penalty.
 
 This inspired an off the wall thought that may be insane. Would
 it be possible (on a 4Gb system) to address 4Gb of RAM and write a driver
 to make the rest appear as a device, which could then be used for a
 preferred or (even neater) first level swap.

Only if you reserved a window for it.  Say 1G of KVA, though last
I checked the bank selection granularity wasn't fine enough for
that.  Memory in the window can *never* be a target for DMA, and
should *probably* never be used for kernel structures.

If you ever programmed graphics on a TI 99/4A, which has a 4k
visible window onto screen memory, or programmed code on the
Commodore C64 to use the 32K of RAM-under-ROM, or programmed
in DOS or CP/M using Overlays, then you'll be familiar with the
problem.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-24 Thread Steve O'Hara-Smith

On Wed, 24 Apr 2002 16:38:08 -0700
Terry Lambert [EMAIL PROTECTED] wrote:

TL Only if you reserved a window for it.  Say 1G of KVA, though last

I was thinking more like 1M, or even a few K, it sounds like that's
not possible.

TL I checked the bank selection granularity wasn't fine enough for

That would be a problem.

TL If you ever programmed graphics on a TI 99/4A, which has a 4k
TL visible window onto screen memory, or programmed code on the

I've done a number of similar things (Newbrain, Lynx, bank
switched CP/M ...).

-- 
C:WIN  | Directable Mirrors
The computer obeys and wins.|A Better Way To Focus The Sun
You lose and Bill collects. |  licenses available - see:
|   http://www.sohara.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-23 Thread David Schultz

Thus spake Terry Lambert [EMAIL PROTECTED]:
 I'm pretty sure Solaris also used 4K pages for swappable memory
 in the kernel, as well: 4M pages don't make much sense, since
 you could, for example, exhaust KVA space with 250 kernel modules
 (250 X (1 data + 1 code) * 4M = 2G).

It doesn't use 4M pages for all kernel memory---just the first 4M of
code and the first 4M of data.  Supposedly it also allows applications
to take advantage of 4M pages, though I'm not sure how that works.  At
the very least I'd suppose that those pages are locked into memory.

 I don't know where the Linux limitation comes from; it's really
 hard for me to believe ~3G, since it's not an even power of 2,
 so I don't really credit this limitation.

I don't really understand it either.  I could try to find the link to
where I found this information if you're interested, but I wouldn't be
surprised if it is simply wrong.  The 2.4 kernel docs seem to imply
that 2.4 can use 4 GB of RAM without PAE.

 No problem; I think you will have to, if you are planning on
 mucking about with more than 4G of physical memory.

I have no such plans in the immediate future; at this point the
discussion is a curiosity.  But with any luck I will already know
what's going on by the time I need to worry about tweaking the kernel
in bizarre ways.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-23 Thread Marc G. Fournier

On Mon, 22 Apr 2002, Terry Lambert wrote:

 Marc G. Fournier wrote:
  First, alot of this stuff is slowly sinking in ... after repeatedly
  reading it and waiting for the headache to disapate:)
 
  But, one thing that I'm still not clear on ...
 
  If I have 4Gig of RAM in a server, does it make any sense to have swap
  space on that server also?

 Yes.  But it (mostly) does not apply to KVA, only to UVA data,
 and there are much larger KVA requirements, so the more RAM you
 have, the bigger the bottleneck to user space for anything you
 swap.

Okay ... to me, a bottleneck generally means slowdown ... so the more RAM
I have, the slower the system will perform?

  Again, from what I'm reading, I have a total of 4Gig *aggregate* to
  work with, between RAM and swap, but its right here that I'm confused
  right now ... basically, the closer to 4Gig of RAM you get, the closer
  to 0 of swap you can have?

 No.

 I think you are getting confused on cardinality.  You get one KVA, but
 you get an arbitrary number of UVA's, until you run out of physical RAM
 to make new ones.

 You have 4G aggregate KVA + UVA.

 So if your KVA is 3G, and your UVA is 1G, then you can have 1 3G
 KVA, and 1000 1G UVA's.

Okay, first question here ... above you say 'arbitrary number of UVAs',
but here you state 1000 ... just a number picked out of the air, or is
this some fixed limit?

 Certain aspects of KVA are non-swappable.  Some parts of UVA are
 swappable in theory, but never swapped in practice (the page
 tables and descriptors for each user process).

 The closer to 4G you have, the more physical RAM you have to spend
 on managing the physical RAM.

 The total amount of physical RAM you have to spend on managing
 memory is based on the total physical RAM plus the total swap.

Okay, this makes sense ... I notice in 4.5-STABLE that if maxusers is set
to 0, then the system will auto-tune based on memory ... is this something
that also is auto-tuned?

 As soon as that number exceeds ~2.5G, you can't do it on a 32
 bit processor any more, unless you hack FreeBSD to swap the
 VM housekeeping data it uses for swapping UVA contents.

Okay, now here you lost me ... which number exceeds ~2.5G?  The total
amount of physical RAM you have to spend?

 Think of physical RAM as a resource.  It's seperate from the
 KVA and UVA, but the KVA has to have physical references to
 do paged memory management.  You are limited by how many of
 these you can have in physical RAM, total.

Okay ... alot of lights came on simultaneously with this email ... some of
those lights, mind you, might be false, but its a start ...

If I'm understanding this at all ... the KVA (among other things) is a
pre-allocated/reserved section of RAM for managing the UVAs ...
simplistically, it maintains the process list and all resources
associated with it ... I realize it does do alot of other things, but this
is what I'm more focused on right now ...

Now, what exactly is a UVA?  You state above '1000 1G UVAs' ... is one
process == 1 UVA?  Or is one user (with all its associated processes) == 1
UVA?

Next, again, if I'm reading this right ... if I set my KVA to 3G, when the
system boots, it will reserve 3G of *physical* RAM for the kernel itself,
correct?  So on a 4G machine, 1G of *physical* RAM will be available for
UVAs ... so, if I run 1G worth of processes, that is where swapping to
disk comes in, right?  Other then the massive performance hit, and the
limit you mention about some parts of UVA not being swappable, I could
theoretically have 4G of swap to page out to?

Is there a reason why this stuff isn't auto-scaled based on RAM as it is?



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-23 Thread Vallo Kallaste

On Tue, Apr 23, 2002 at 09:44:50AM -0300, Marc G. Fournier
[EMAIL PROTECTED] wrote:

 Next, again, if I'm reading this right ... if I set my KVA to 3G,
 when the system boots, it will reserve 3G of *physical* RAM for
 the kernel itself, correct?  So on a 4G machine, 1G of *physical*
 RAM will be available for UVAs ... so, if I run 1G worth of
 processes, that is where swapping to disk comes in, right?  Other
 then the massive performance hit, and the limit you mention about
 some parts of UVA not being swappable, I could theoretically have
 4G of swap to page out to?

You can have up to ~12GB of usable swap space, as I've heard. Don't
remember why such arbitrary limit, unfortunately. Information about
such topics is spread over several lists arhives, usually the
subjects are strange, too.. so hard to find out. As I understand it
you are on the track, having 3GB allocated to KVA means 1GB for UVA,
whatever it exactly means. Userspace processes will allocate memory
from UVA space and can grow over 1GB of size if needed by swapping.
You can certainly have more than one over-1GB process going on at
the same time, but swapping will constrain your performance.
I'm sure Terry or some other knowledgeable person will correct me if
it doesn't make sense.

 Is there a reason why this stuff isn't auto-scaled based on RAM as
 it is?

Probably lack of manpower, to code it up you'll have to understand
every bit of it, but as we currently see, we don't understand it,
probably many others as well :-)
-- 

Vallo Kallaste
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-23 Thread Vallo Kallaste

On Tue, Apr 23, 2002 at 12:25:31PM -0700, Terry Lambert
[EMAIL PROTECTED] wrote:

 Vallo Kallaste wrote:
  You can have up to ~12GB of usable swap space, as I've heard. Don't
  remember why such arbitrary limit, unfortunately. Information about
  such topics is spread over several lists arhives, usually the
  subjects are strange, too.. so hard to find out. As I understand it
  you are on the track, having 3GB allocated to KVA means 1GB for UVA,
  whatever it exactly means. Userspace processes will allocate memory
  from UVA space and can grow over 1GB of size if needed by swapping.
  You can certainly have more than one over-1GB process going on at
  the same time, but swapping will constrain your performance.
  I'm sure Terry or some other knowledgeable person will correct me if
  it doesn't make sense.
 
 Actually, you have a total concurrent virtual address space of 4G.
 
 If you assign 3G of that to KVA, then you can never exceed 1G of
 space for a user process, under any circumstances.
 
 This is because a given user process and kernel must be able
 to exist simultaneously in order to do things like copyin/copyout.

Hmm, ok, but can we have more than one 1G user process at one time?
Four 500MB ones and so on?
Somehow I've made such conclusion based on previous information.
Should be so, otherwise I don't understand how swapping will fit
into overall picture.
-- 

Vallo Kallaste
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-23 Thread Dave Hayes

Terry Lambert (who fits my arbitrary definition of a good cynic)
writes:
 It's a hazard of Open Source projects, in general, that there are
 so many people hacking on whatever they think is cool that nothing
 ever really gets built to a long term design plan that's stable
 enough that a book stands a chance of having a 1 year lifetime.

I could not help but notice your multiple attempts at expresing this
particular concept often, that is...an implied necessity of a book
that explains what's going on under the kernel hood. I agree that such
a book would rapidly be out of date, but I also see the necessity
thereof. 

So, it's time to question the assumption that the information you want
available should be in a book.

Many websites have annotation as a form of ad-hoc documentation
(e.g. php.net). Why not have someone take a crack at documenting the
FreeBSD kernel, and perhaps use some annotation feature to create a
living document which (hopefully) comes close to describing the
kernel architechture?

If you want to track a moving target, perhaps you need to use a moving
track? 
--
Dave Hayes - Consultant - Altadena CA, USA - [EMAIL PROTECTED] 
 The opinions expressed above are entirely my own 

What's so special about the Net? People -still- don't
listen...
  -The Unknown Drummer







To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-23 Thread Chris BeHanna

On Tue, 23 Apr 2002, Dave Hayes wrote:

 Terry Lambert (who fits my arbitrary definition of a good cynic)
 writes:
  It's a hazard of Open Source projects, in general, that there are
  so many people hacking on whatever they think is cool that nothing
  ever really gets built to a long term design plan that's stable
  enough that a book stands a chance of having a 1 year lifetime.

 I could not help but notice your multiple attempts at expresing this
 particular concept often, that is...an implied necessity of a book
 that explains what's going on under the kernel hood. I agree that such
 a book would rapidly be out of date, but I also see the necessity
 thereof.

 So, it's time to question the assumption that the information you want
 available should be in a book.

 Many websites have annotation as a form of ad-hoc documentation
 (e.g. php.net). Why not have someone take a crack at documenting the
 FreeBSD kernel, and perhaps use some annotation feature to create a
 living document which (hopefully) comes close to describing the
 kernel architechture?

 If you want to track a moving target, perhaps you need to use a moving
 track?

doxygen is *wonderful* for this for large C++ projects:  it's able
to draw you inheritance graphs and collaboration diagrams, as well as
generate pretty, nicely formatted HTML containing API descriptions
generated from javadoc-like comments in header files.

I've never tried it on straight C.  I suppose it is possible, but
given the lack of inheritance, collaboration diagrams are going to
be very messy.

Still and all, it might be a very useful thing.  If not doxygen,
then perhaps some way to run the headers through jade/sgmlformat, with
docbook-style SGML embedded in comments in header files describing
kernel API calls and their parameters, with all typedef'd datatypes
appropriately cross-linked.  As a hack, one could even gulp embed
POD within comments and run perldoc on everything.  This could be done
nightly or twice daily, with updates appearing live at freebsd.org.
HTML versions of man pages with crosslinks go part of the way; what
I'm thinking about (if any of you have used doxygen you'll know where
I'm going) is a bit more comprehensive, with links to the actual
header file from which the documentation was generated, so that the
reader can see the declaration in its native context (with the doxygen
or docbook comments stripped out for clarity).

This still wouldn't address the need for some kind of overall
architectural document, as well as the difficulty of keeping it
up-to-date, but it would be of tremendous help to everyone working on
the project.  *If* developers can get used to updating the in-comment
documentation whenever they make changes, then this reference would
automatically be kept up-to-date.

-- 
Chris BeHanna
Software Engineer   (Remove bogus before responding.)
[EMAIL PROTECTED]
I was raised by a pack of wild corn dogs.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-23 Thread Terry Lambert

Dave Hayes wrote:
 So, it's time to question the assumption that the information you want
 available should be in a book.
 
 Many websites have annotation as a form of ad-hoc documentation
 (e.g. php.net). Why not have someone take a crack at documenting the
 FreeBSD kernel, and perhaps use some annotation feature to create a
 living document which (hopefully) comes close to describing the
 kernel architechture?
 
 If you want to track a moving target, perhaps you need to use a moving
 track?

How does the person or persons involved in documenting the
internals to sufficient detail to be useful to third parties
get paid for the effort?

We are talking the work equivalent of a full time job.

If they aren't paid, what's the incentive to create documentation
above and beyond the status quo?

If that incentive exists, what's the URL for the documentation
that was created as a result?


I think I can count on my fingers the number of people who know
the various aspects of the boot process well enough to document
it for people who want to hack on it to, for example, declaratively
allocate physical memory as part of the boot process.

A lot of the information in this thread was never collected
centrally anywhere before (e.g. the missing piece about the
files to modify and the calculation of the NKPDE value that
was left out of David Greenman's posting of a year ago).  Most
of this information will be quickly out of date, since as soon
as you document something, people understand it enough to realize
the shortcomings, and so nearly the first thing that happens is
the shortcomings are corrected, and voila, your documentation is
now out of date.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-23 Thread David Schultz

Thus spake Vallo Kallaste [EMAIL PROTECTED]:
 Userspace processes will allocate memory
 from UVA space and can grow over 1GB of size if needed by swapping.
 You can certainly have more than one over-1GB process going on at
 the same time, but swapping will constrain your performance.

It isn't a performance constraint.  32-bit architectures have 32-bit
pointers, so in the absence of segmentation tricks, a virtual address
space can only contain 2^32 = 4G locations.  If the kernel gets 3 GB
of that, the maximum amount of memory that any individual user process
can use is 1 GB.  If you had, say, 4 GB of physical memory, a single
user process could not use it all.  Swap increases the total amount of
memory that *all* processes can allocate by pushing some of the pages
out of RAM and onto the disk, but it doesn't increase the total amount
of memory that a single process can address.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-23 Thread David Schultz

Thus spake Terry Lambert [EMAIL PROTECTED]:
 Writing a useful (non-fluff) technical book, optimistically,
 takes 2080 hours ... or 40 hours per week for 52 weeks... a man
 year.
 
 By the time you are done, the book is a year out of date, and
 even if you worked really hard and kept it up to date (e.g. you
 had 4 authors and spent only 6 months of wall time on the book),
 the shelf life on the book is still pretty short.

Although it would be unreasonable to comprehensively document the
kernel internals and expect the details to remain valid for a year,
there is a great deal of lasting information that could be conveyed.
For example, Kirk's 4.[34]BSD books cover obsolete systems, and yet
much of what they say applies equally well to recent versions of
FreeBSD.

It's true that the specific question ``How do I change my KVA size?''
might have different answers at different times, but I doubt that the
ideas behind an answer have all been invented in the last few months.
Even things like PAE, used by the Linux 2.4 kernel, remind me of how
DOS dealt with the 1 MB memory limit.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread Jeroen Ruigrok/asmodai

Marc,

-On [20020421 00:30], Marc G. Fournier ([EMAIL PROTECTED]) wrote:

Over the past week, I've been trying to get information on how to fix a
server that panics with:

| panic: vm_map_entry_create: kernel resources exhausted
| mp_lock = 0101; cpuid = 1; lapic.id = 0100
| boot() called on cpu#1

Take a look at this:

http://www.freebsd.org/cgi/getmsg.cgi?fetch=245329+248644+/usr/local/www/db/text/2001/freebsd-hackers/20010624.freebsd-hackers

Hope this helps,

-- 
Jeroen Ruigrok van der Werven / asmodai / Kita no Mono
asmodai@[wxs.nl|xmach.org], finger [EMAIL PROTECTED]
http://www.softweyr.com/asmodai/ | http://www.[tendra|xmach].org/
How many cares one loses when one decides not to be something but to be
someone.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread Terry Lambert

Marc G. Fournier wrote:
  No, there's no stats collected on this stuff, because it's a
  pretty obvious and straight-forward thing: you have to have a
  KVA space large enough that, once you subtract out 4K for each
  4M of physical memory and swap (max 4G total for both), you
  end up with memory left over for the kernel to use, and your
  limits are such that the you don't run out of PTEs before you
  run out of mbufs (or whatever you plan on allocating).
 
 ... and translated to english, this means? :)
 
 Okay, I'm going to assume that I'm allowed 4Gig of RAM + 4Gig of Swap, for
 a total of 8Gig ... so, if I subtract out 4K for each 4M, that is 8M for
 ... what?
 
 So, I've theoretically got 8184M of VM available for the kernel to use
 right now?  what are PTEs and how do I know how many I have right now?  as
 for mbufs, I've currently got:

No.

Each 4M of physical memory takes 4K of statically allocated KVA.
Each 4M of backing store takes 4K of statically allocated KVA.

The definition of backing store includes:

o   All dirty data pages in swap
o   All dirty code pages in swap
o   All clean data pages in files mapped into process or kernel
address space
o   All clean code pages for executables mapped into process or
kernel address space
o   Reserved mappings for copy-on-write pages that haven't yet
been written

A PTE is a page table entry.  It's the 32 bit value in the page
table for each address space (one for the kernel, one per process).
See the books I posted the titles of for more details, or read the
Intel processor PDF's from their developer web site.


 jupiter netstat -m
 173/1664/61440 mbufs in use (current/peak/max):
 77 mbufs allocated to data
 96 mbufs allocated to packet headers
 71/932/15360 mbuf clusters in use (current/peak/max)
 2280 Kbytes allocated to network (4% of mb_map in use)
 0 requests for memory denied
 0 requests for memory delayed
 0 calls to protocol drain routines
 
 So how do I find out where my PTEs are sitting at?

The mbufs are only important because most people allocate a
large number of mbufs up front for networking applications, or
for alrge numbers of users with network applications that will
need resources in order to be able to actually run.  There's
also protocol control blocks and other allocation that occur
up front, based on the maximum number of system open files
and sockets you intend to permit.

The user space stuff is generally a lot easier to calculate:
do a ps -gaxl, round each entry in the VSZ column up to
4M, divide by 4K, and that tells you how many 4K units you
have allocated for user space.  For kernel space, the answer
is that there are some allocated at boot time, (120M worth),
and then the kernel map is grown, as necessary, until it hits
the KVA space limit.  If you plan on using up every byte, then
divide your total KVA space by 4K to get the number of 4K pages
allocated there.

For the kernel stuff... you basically need to know where the
kernel puts how much memory, based on the tuning parameters
you use on it.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread Terry Lambert

Marc G. Fournier wrote:
 On Sun, 21 Apr 2002, Terry Lambert wrote:
  No, there's no stats collected on this stuff, because it's a pretty
  obvious and straight-forward thing: you have to have a KVA space large
  enough that, once you subtract out 4K for each 4M of physical memory and
  swap (max 4G total for both), you end up with memory left over for the
  kernel to use, and your limits are such that the you don't run out of
  PTEs before you run out of mbufs (or whatever you plan on allocating).
 
 God, I'm glad its straightforwards :)
 
 Okay, first off, you say (max 4G total for both) ... do you max *total*
 between the two, or phy can be 4g *plus* swap can be 4g for a total of 8g?

You aren't going to be able to exceed 4G, no matter what you do,
because that's the limit of your address space.

If you want more, then you need to use a 64 bit processor (or use a
processor that supports bank selection, and hack up FreeBSD to do
bank swapping on 2G at a time, just like Linux has been hacked up,
and expect that it won't be very useful).

If you are swapping, you are demand paging.

The way demand paging works is that you reference a page that has
been swapped out, or for which physical memory backing store has
not been addigned.

When you make this reference, you get a page not present fault (a
trap 12).  The trap handler puts the faulting process to sleep,
and then starts the process of pulling the page in from backing
store (if it's not a create-on-reference), which, among other
things, locates a physical page to contain the copy of the data
pulled in from the backing store (or zero'ed out of physical memoy,
if it's an unbacked page, e.g. non-swappable, or swappable, but for
which swap has not yet been allocated, because it's the first use).

Only certain types of kernel memory are swappable -- mostly kernel
memory that's allocated on a per process basis.  Kernel swapping
really does you no good, if you have a fully populated physical
memory in the virtual address space, since there's only one kernel
virtual address space (SMP reserves a little bit of per processor
memory, but the amount is tiny: one page descriptor's worth: 4M);
after a certain point, your KVA is committed, and it's a mistake to
have it compete in the same LRU domain as processes.  You can't
really avoid that, for the most part, since there's a shared TLB
cache that you really don't have opportunity to manage, other than
by seperating 4M vs. 4K pages (and 2M, etc., for the Pentium Pro,
though variable page granularity is not supported in FreeBSD, since
it's not common to most hardware people actually have).


 For instance, right now, I have 3Gig of physical and ~3gig of swap
 allocated ...

Each process maintains its own virtual address space.  Almost all
of a process virtual address space is swappable.  So if you are
swapping, it's going to be process address space: UVA, not KVA.

If you increase the KVA, then you will decrease the UVA available to
user processes.  The total of the two can not exceed 4G.


With 4G of physical memory, then 3G of KVA is practically a
requirement, particularly if you intend to use the additional memory
for kernel data (you will have to, for PDE's: you have no choice).
For 3G, it's ~2.5G KVA minimally required.  Personally, I'd just
put it at 3G, and live with it, so you can throw in RAM to your limit
later, when you decide you need to throw RAM at some problem or other.
If you can't afford for the UVA to be as small as 1G, then you are
going to have to make some hard decisions on the amount of physical
RAM you put in the machine.

It's not really that bad: for 3G of KVA, you need 3M for PDE's.  The
problem comes when they are exhausted because of the amount of PDE's
you have lying around to describe UVA pages that are swapped out for
various processes, and for kernel memory requirements that go way up
when you crank up the kernel's ability to handle load (e.g. for network
equipment, I generally take half of physical memory for mbufs, mostly
because that's around the limit of what I can take, and have anything
left over).

That you are using System V shared memory segments is *REALLY* going to
hurt you; each of these shared memory segment comes out of the KVA, so
using shared memory segments with the shm*() calls, rather than using
mmap()'ed files as backing store, can eat huge chunks of KVA, as well
as fragmenting the KVA, particularly over time.

For more details on paged memory management on x86, see:

Protected Mode Software Architecture

and:

The Indispensible PC Hardware Book

You might also want to find a book on bootstrapping protected mode
operating systems (actually, I have yet to find a very good one,
so post about it, if you find one).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread David Schultz

Thus spake Terry Lambert [EMAIL PROTECTED]:
 If you want more, then you need to use a 64 bit processor (or use a
 processor that supports bank selection, and hack up FreeBSD to do
 bank swapping on 2G at a time, just like Linux has been hacked up,
 and expect that it won't be very useful).

I'm guessing that this just means looking at more than 4 GB of memory
by working with 2 GB frames at a time.  As I recall, David Greenman
said that this hack would essentially require a rewrite of the VM
system.  Does this just boil down to using 36 bit physical addresses?
Are there plans for FreeBSD to support it, or is everyone just waiting
until 64 bit processors become more common?

 You can't
 really avoid that, for the most part, since there's a shared TLB
 cache that you really don't have opportunity to manage, other than
 by seperating 4M vs. 4K pages (and 2M, etc., for the Pentium Pro,
 though variable page granularity is not supported in FreeBSD, since
 it's not common to most hardware people actually have).

Does FreeBSD use 4M pages exclusively for kernel memory, as in
Solaris, or is there a more complicated scheme?

 If you increase the KVA, then you will decrease the UVA available to
 user processes.  The total of the two can not exceed 4G.

In Linux, all of physical memory is mapped into the kernel's virtual
address space, and hence, until recently Linux was limited to ~3 GB of
physical memory.  FreeBSD, as I understand, doesn't do that.  So is
the cause of this limitation that the top half of the kernel has to
share a virtual address space with user processes?

I'll have to read those books one of these days when I have time(6).
Thanks for the info.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread Vizion Communication

Test - Please ignore
- Original Message - 
From: David Schultz [EMAIL PROTECTED]
To: Terry Lambert [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, April 22, 2002 6:09 AM
Subject: Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?


 Thus spake Terry Lambert [EMAIL PROTECTED]:
  If you want more, then you need to use a 64 bit processor (or use a
  processor that supports bank selection, and hack up FreeBSD to do
  bank swapping on 2G at a time, just like Linux has been hacked up,
  and expect that it won't be very useful).
 
 I'm guessing that this just means looking at more than 4 GB of memory
 by working with 2 GB frames at a time.  As I recall, David Greenman
 said that this hack would essentially require a rewrite of the VM
 system.  Does this just boil down to using 36 bit physical addresses?
 Are there plans for FreeBSD to support it, or is everyone just waiting
 until 64 bit processors become more common?
 
  You can't
  really avoid that, for the most part, since there's a shared TLB
  cache that you really don't have opportunity to manage, other than
  by seperating 4M vs. 4K pages (and 2M, etc., for the Pentium Pro,
  though variable page granularity is not supported in FreeBSD, since
  it's not common to most hardware people actually have).
 
 Does FreeBSD use 4M pages exclusively for kernel memory, as in
 Solaris, or is there a more complicated scheme?
 
  If you increase the KVA, then you will decrease the UVA available to
  user processes.  The total of the two can not exceed 4G.
 
 In Linux, all of physical memory is mapped into the kernel's virtual
 address space, and hence, until recently Linux was limited to ~3 GB of
 physical memory.  FreeBSD, as I understand, doesn't do that.  So is
 the cause of this limitation that the top half of the kernel has to
 share a virtual address space with user processes?
 
 I'll have to read those books one of these days when I have time(6).
 Thanks for the info.
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-stable in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread Marc G. Fournier

On Mon, 22 Apr 2002, Terry Lambert wrote:

 Marc G. Fournier wrote:
  On Sun, 21 Apr 2002, Terry Lambert wrote:
   No, there's no stats collected on this stuff, because it's a pretty
   obvious and straight-forward thing: you have to have a KVA space large
   enough that, once you subtract out 4K for each 4M of physical memory and
   swap (max 4G total for both), you end up with memory left over for the
   kernel to use, and your limits are such that the you don't run out of
   PTEs before you run out of mbufs (or whatever you plan on allocating).
 
  God, I'm glad its straightforwards :)
 
  Okay, first off, you say (max 4G total for both) ... do you max *total*
  between the two, or phy can be 4g *plus* swap can be 4g for a total of 8g?

 You aren't going to be able to exceed 4G, no matter what you do,
 because that's the limit of your address space.

 If you want more, then you need to use a 64 bit processor (or use a
 processor that supports bank selection, and hack up FreeBSD to do
 bank swapping on 2G at a time, just like Linux has been hacked up,
 and expect that it won't be very useful).

Now I'm confused ... from what I've read so far, going out and buying an
IBM eSeries 350 with 16Gig of RAM with Dual-PIII processors and hoping to
run FreeBSD on it is not possible?  Or, rather, hoping to use more then
4 out of 16Gig of RAM is?


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread Terry Lambert

Marc G. Fournier wrote:
  You aren't going to be able to exceed 4G, no matter what you do,
  because that's the limit of your address space.
 
  If you want more, then you need to use a 64 bit processor (or use a
  processor that supports bank selection, and hack up FreeBSD to do
  bank swapping on 2G at a time, just like Linux has been hacked up,
  and expect that it won't be very useful).
 
 Now I'm confused ... from what I've read so far, going out and buying an
 IBM eSeries 350 with 16Gig of RAM with Dual-PIII processors and hoping to
 run FreeBSD on it is not possible?  Or, rather, hoping to use more then
 4 out of 16Gig of RAM is?

FreeBSD doesn't currently support bank selection.  Peter was
working on it, last time I heard.  Linux supports it, at an
incredible performance penalty.

But yes, it means only 4G of the RAM will be usable by you.

Bank selection works by leaving the address space at 4G, and
switching between banks, 2G at a time out of the 16G.

Basically, your kernel code lives in the first 2G, and then
you get to pick which 2G out of the 16G is the last 2G.

As I said, I expect that doing this won't be very useful;
since Itanium's are available, and FreeBSD runs native in
multiuser mode on IA64 now, there's really no reason to
do the 16G, 2G at a time bank selection trick.

The main reason I don't think it'll be useful is DMA: for
the DMA to occur, it will have to occur into the first 2G,
so that it's never selected out.  This is because, no matter
what you do, your address space is limited to 4G total:
adding banks just controls what physical memory is placed
in that 4G window at any given time.

Since the most useful thing you could do with more memory is
buffers for networking and disk I/O for things like web and
file servers... not very useful.

Consider that if I had two processes, and divided the memory
into 8 2G banks.  The 0th bank has the kernel in it, and
can never be selected out, if you expect the kernel to run or
DMA to be possible.  The 1th bank contains the memory for one
process, running on CPU 0.  The 4th bank contains the memory
for one process, running on CPU 1.  Basically, now, you can
not run these processes simultaneously, because they have
conflicting bank selects.

You could jam everything into all the code -- you'd have to
hack the paged memory management, the VM, the scheduler, etc.,
to get it to work -- but, even so, after all that work, what
you have effectively bought yourself is an L3 cache that's
in RAM, rather than in a swap partition.

You are better off just making it usable as swap, semi-directly,
and then making all the paging structures not used for the
kernel itself, swappable.

Even so, your KVA is restricted by whatever your bank size is,
and you can't use it directly (e.g. KVA + UVA + bank_region = 4G).

You really, really ought to look at the books I recommended,
if you are confused about why you can only use 4G with a 32
bit processor and FreeBSD, without additional heroic work.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread Marc G. Fournier


First, alot of this stuff is slowly sinking in ... after repeatedly
reading it and waiting for the headache to disapate:)

But, one thing that I'm still not clear on ...

If I have 4Gig of RAM in a server, does it make any sense to have swap
space on that server also?  Again, from what I'm reading, I have a total
of 4Gig *aggregate* to work with, between RAM and swap, but its right here
that I'm confused right now ... basically, the closer to 4Gig of RAM you
get, the closer to 0 of swap you can have?

On Mon, 22 Apr 2002, Terry Lambert wrote:

 Marc G. Fournier wrote:
   No, there's no stats collected on this stuff, because it's a
   pretty obvious and straight-forward thing: you have to have a
   KVA space large enough that, once you subtract out 4K for each
   4M of physical memory and swap (max 4G total for both), you
   end up with memory left over for the kernel to use, and your
   limits are such that the you don't run out of PTEs before you
   run out of mbufs (or whatever you plan on allocating).
 
  ... and translated to english, this means? :)
 
  Okay, I'm going to assume that I'm allowed 4Gig of RAM + 4Gig of Swap, for
  a total of 8Gig ... so, if I subtract out 4K for each 4M, that is 8M for
  ... what?
 
  So, I've theoretically got 8184M of VM available for the kernel to use
  right now?  what are PTEs and how do I know how many I have right now?  as
  for mbufs, I've currently got:

 No.

 Each 4M of physical memory takes 4K of statically allocated KVA.
 Each 4M of backing store takes 4K of statically allocated KVA.

 The definition of backing store includes:

 o All dirty data pages in swap
 o All dirty code pages in swap
 o All clean data pages in files mapped into process or kernel
   address space
 o All clean code pages for executables mapped into process or
   kernel address space
 o Reserved mappings for copy-on-write pages that haven't yet
   been written

 A PTE is a page table entry.  It's the 32 bit value in the page
 table for each address space (one for the kernel, one per process).
 See the books I posted the titles of for more details, or read the
 Intel processor PDF's from their developer web site.


  jupiter netstat -m
  173/1664/61440 mbufs in use (current/peak/max):
  77 mbufs allocated to data
  96 mbufs allocated to packet headers
  71/932/15360 mbuf clusters in use (current/peak/max)
  2280 Kbytes allocated to network (4% of mb_map in use)
  0 requests for memory denied
  0 requests for memory delayed
  0 calls to protocol drain routines
 
  So how do I find out where my PTEs are sitting at?

 The mbufs are only important because most people allocate a
 large number of mbufs up front for networking applications, or
 for alrge numbers of users with network applications that will
 need resources in order to be able to actually run.  There's
 also protocol control blocks and other allocation that occur
 up front, based on the maximum number of system open files
 and sockets you intend to permit.

 The user space stuff is generally a lot easier to calculate:
 do a ps -gaxl, round each entry in the VSZ column up to
 4M, divide by 4K, and that tells you how many 4K units you
 have allocated for user space.  For kernel space, the answer
 is that there are some allocated at boot time, (120M worth),
 and then the kernel map is grown, as necessary, until it hits
 the KVA space limit.  If you plan on using up every byte, then
 divide your total KVA space by 4K to get the number of 4K pages
 allocated there.

 For the kernel stuff... you basically need to know where the
 kernel puts how much memory, based on the tuning parameters
 you use on it.

 -- Terry



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread Terry Lambert

Jeroen Ruigrok/asmodai wrote:
 Take a look at this:
 
http://www.freebsd.org/cgi/getmsg.cgi?fetch=245329+248644+/usr/local/www/db/text/2001/freebsd-hackers/20010624.freebsd-hackers

This is actually no longer valid, since there have been changes
to both the PDE caclcualtions and the kernel base definition to
try and make it more automatic the change the KVA space size.

At the time of the referenced posting, the modifications necessary
were to /sys/conf/ldscript.i386 and /sys/i386/include/pmap.h.

David also neglected to document how he calculated the 511,
which is actually 511 for a UP system, 510 for an SMP system,
which is to divide the kernbase by 0x0040, after subtracting
0x0010, and then subtracting the recursive entry out of the
total.  You also have to subtract out the private entries (if any)
for SMP, etc..  Basically, you have to calculate the number of
descriptor entries required to map the entire KVA space as 4K
pages from 1K of 4K page tables (1K worth of entries in a 4K page
descriptor table for the address space).

Of course, now everyone is going to say how do I... how do I...,
wanting one of the six ways you have to do it, based on the FreeBSD
version and/or intermediate release (-release?  -stable?  -security?
-some-date-here?), rather than figuring out the answer based on a
single known release.

The other issue here is that the number 1 reason for wanting to
dick around with this is to be able to add more physical memory,
and to do that successfully, you have to know a hell of a lot more
about tuning FreeBSD than reading the happy-fun tuning manual
page can ever teach you, without you understanding how the OS
actually does its thing at a low level.  I personally consider the
tuning man page as just a knee-jerk reaction to bad publicity
resulting from naieve benckmarking.

IMO, it's much better to just give elliptical clues, and then
leave the job to the people who can follow the clues and learn
enough that they not only get the right answer, but then end up
knowing enough about *why* it's the right answer to be able to
do the other required tuning.

If FreeBSD would ever sit still long enough for someone to get
a book out, there's probably be a book on the subject (Kirk has
been working on one for a year now, according to several people,
called The Design and Implementation of the FreeBSD Operating
System; no, I don't know what version it's supposed to apply to);
IMO, an architect should set some things in stone, and leave them
there long enough that documentation doesn't immediately go out
of date.

It's a hazard of Open Source projects, in general, that there are
so many people hacking on whatever they think is cool that nothing
ever really gets built to a long term design plan that's stable
enough that a book stands a chance of having a 1 year lifetime.

Basically, it'll boil down to paying someone who knows where the
bodies are buried to do the work for you, if you want to get more
than just a hack job.  8-(.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread Terry Lambert

David Schultz wrote:
 Thus spake Terry Lambert [EMAIL PROTECTED]:
  If you want more, then you need to use a 64 bit processor (or use a
  processor that supports bank selection, and hack up FreeBSD to do
  bank swapping on 2G at a time, just like Linux has been hacked up,
  and expect that it won't be very useful).
 
 I'm guessing that this just means looking at more than 4 GB of memory
 by working with 2 GB frames at a time.  As I recall, David Greenman
 said that this hack would essentially require a rewrite of the VM
 system.  Does this just boil down to using 36 bit physical addresses?
 Are there plans for FreeBSD to support it, or is everyone just waiting
 until 64 bit processors become more common?

David Greenman is right.Nevertheless, Peter was planning on
doing the hack, according to his postings to -current.  Please
check the list archives for these things.


 Does FreeBSD use 4M pages exclusively for kernel memory, as in
 Solaris, or is there a more complicated scheme?

FreeBSD starts out using 4K pages for the premapped memory, and
switches over to a 4M page scheme for the initially loaded kernel,
for at least the first 4M.  The PTEs that were for the 4K pages
that are replaced with the 4M mappings are simply lost in the
reload of CR3, and never recovered for the system to use (the
pages containing the PTEs there are leaked, but it's usually
one page, so 4K is not that bad a leak).

For much of the FreeBSD kernel, 4K pages are used.

I'm pretty sure Solaris also used 4K pages for swappable memory
in the kernel, as well: 4M pages don't make much sense, since
you could, for example, exhaust KVA space with 250 kernel modules
(250 X (1 data + 1 code) * 4M = 2G).



  If you increase the KVA, then you will decrease the UVA available to
  user processes.  The total of the two can not exceed 4G.
 
 In Linux, all of physical memory is mapped into the kernel's virtual
 address space, and hence, until recently Linux was limited to ~3 GB of
 physical memory.  FreeBSD, as I understand, doesn't do that.  So is
 the cause of this limitation that the top half of the kernel has to
 share a virtual address space with user processes?

No.  You need to look at the copyin implementation in both OSs to
find the answer.  The way it works is by mapping the address space
of the process in question and the kernel at the same time, and
copying bytes between them.

These are really basic questions about memory layout, which you
should already know the answer to, if you are mucking about in
the KVA size or other parts of the kernel.

I don't know where the Linux limitation comes from; it's really
hard for me to believe ~3G, since it's not an even power of 2,
so I don't really credit this limitation.


 I'll have to read those books one of these days when I have time(6).
 Thanks for the info.

No problem; I think you will have to, if you are planning on
mucking about with more than 4G of physical memory.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-22 Thread Terry Lambert

Marc G. Fournier wrote:
 First, alot of this stuff is slowly sinking in ... after repeatedly
 reading it and waiting for the headache to disapate:)
 
 But, one thing that I'm still not clear on ...
 
 If I have 4Gig of RAM in a server, does it make any sense to have swap
 space on that server also?

Yes.  But it (mostly) does not apply to KVA, only to UVA data,
and there are much larger KVA requirements, so the more RAM you
have, the bigger the bottleneck to user space for anything you
swap.


 Again, from what I'm reading, I have a total of 4Gig *aggregate* to
 work with, between RAM and swap, but its right here that I'm confused
 right now ... basically, the closer to 4Gig of RAM you get, the closer
 to 0 of swap you can have?

No.

I think you are getting confused on cardinality.  You get one KVA,
but you get an arbitrary number of UVA's, until you run out of
physical RAM to make new ones.

You have 4G aggregate KVA + UVA.

So if your KVA is 3G, and your UVA is 1G, then you can have 1 3G
KVA, and 1000 1G UVA's.


Certain aspects of KVA are non-swappable.  Some parts of UVA are
swappable in theory, but never swapped in practice (the page
tables and descriptors for each user process).

The closer to 4G you have, the more physical RAM you have to spend
on managing the physical RAM.

The total amount of physical RAM you have to spend on managing
memory is based on the total physical RAM plus the total swap.

As soon as that number exceeds ~2.5G, you can't do it on a 32
bit processor any more, unless you hack FreeBSD to swap the
VM housekeeping data it uses for swapping UVA contents.

Think of physical RAM as a resource.  It's seperate from the
KVA and UVA, but the KVA has to have physical references to
do paged memory management.  You are limited by how many of
these you can have in physical RAM, total.


-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-21 Thread Terry Lambert

Marc G. Fournier wrote:
  You have more memory than you can allocate kernel memory to
  provide page table entries for.
 
  The only solution is to increase your kernel virtual address
  space size to accomodate the page mappings.
 
  How to do this varies widely by the version of FreeBSD you are
  using, and, unless you read NOTES and are running a recent
  -current, is not incredibly well documented, and requires an
  understanding of how the virtual address space is laid out and
  managed (which is also not well documented anywhere).
 
 Ya, this is the roadblock I'm hitting :(  I'm running 4.5-STABLE here, as
 of this afternoon ... thoughts/suggestiosn based on that?

Read the handbook as it existed for 4.5-STABLE, and read NOTES.
It (sorta) tells you how to increase your KVA size.


 Also, is there somethign that I can run to monitor this, similar to
 running netstat -m to watch nmbclusters?

DDB?  8-) 8-).

No, there's no stats collected on this stuff, because it's a
pretty obvious and straight-forward thing: you have to have a
KVA space large enough that, once you subtract out 4K for each
4M of physical memory and swap (max 4G total for both), you
end up with memory left over for the kernel to use, and your
limits are such that the you don't run out of PTEs before you
run out of mbufs (or whatever you plan on allocating).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-21 Thread Marc G. Fournier

On Sun, 21 Apr 2002, Terry Lambert wrote:

 No, there's no stats collected on this stuff, because it's a pretty
 obvious and straight-forward thing: you have to have a KVA space large
 enough that, once you subtract out 4K for each 4M of physical memory and
 swap (max 4G total for both), you end up with memory left over for the
 kernel to use, and your limits are such that the you don't run out of
 PTEs before you run out of mbufs (or whatever you plan on allocating).

God, I'm glad its straightforwards :)

Okay, first off, you say (max 4G total for both) ... do you max *total*
between the two, or phy can be 4g *plus* swap can be 4g for a total of 8g?

For instance, right now, I have 3Gig of physical and ~3gig of swap
allocated ...


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-21 Thread Marc G. Fournier

On Sun, 21 Apr 2002, Terry Lambert wrote:

 Marc G. Fournier wrote:
   You have more memory than you can allocate kernel memory to
   provide page table entries for.
  
   The only solution is to increase your kernel virtual address
   space size to accomodate the page mappings.
  
   How to do this varies widely by the version of FreeBSD you are
   using, and, unless you read NOTES and are running a recent
   -current, is not incredibly well documented, and requires an
   understanding of how the virtual address space is laid out and
   managed (which is also not well documented anywhere).
 
  Ya, this is the roadblock I'm hitting :(  I'm running 4.5-STABLE here, as
  of this afternoon ... thoughts/suggestiosn based on that?

 Read the handbook as it existed for 4.5-STABLE, and read NOTES.
 It (sorta) tells you how to increase your KVA size.


  Also, is there somethign that I can run to monitor this, similar to
  running netstat -m to watch nmbclusters?

 DDB?  8-) 8-).

 No, there's no stats collected on this stuff, because it's a
 pretty obvious and straight-forward thing: you have to have a
 KVA space large enough that, once you subtract out 4K for each
 4M of physical memory and swap (max 4G total for both), you
 end up with memory left over for the kernel to use, and your
 limits are such that the you don't run out of PTEs before you
 run out of mbufs (or whatever you plan on allocating).

... and translated to english, this means? :)

Okay, I'm going to assume that I'm allowed 4Gig of RAM + 4Gig of Swap, for
a total of 8Gig ... so, if I subtract out 4K for each 4M, that is 8M for
... what?

So, I've theoretically got 8184M of VM available for the kernel to use
right now?  what are PTEs and how do I know how many I have right now?  as
for mbufs, I've currently got:

jupiter netstat -m
173/1664/61440 mbufs in use (current/peak/max):
77 mbufs allocated to data
96 mbufs allocated to packet headers
71/932/15360 mbuf clusters in use (current/peak/max)
2280 Kbytes allocated to network (4% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

So how do I find out where my PTEs are sitting at?




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-20 Thread Marc G. Fournier



As a quick follow-up to this, doing more searching on the web, I came
across a few suggested 'sysctl' settings, which I've added to what I had
before, for a total of:

kern.maxfiles=65534
jail.sysvipc_allowed=1
vm.swap_idle_enabled=1
vfs.vmiodirenable=1
kern.ipc.somaxconn=4096

I've also just reduced my maxusers to 256 from 1024, since 1024 was
crashing worse then 512, and I ran across the 'tuning' man page that
stated that you shouldn't go above 256 :(

Just a bit more detail on the setup ...

On Sat, 20 Apr 2002, Marc G. Fournier wrote:


 Over the past week, I've been trying to get information on how to fix a
 server that panics with:

 | panic: vm_map_entry_create: kernel resources exhausted
 | mp_lock = 0101; cpuid = 1; lapic.id = 0100
 | boot() called on cpu#1

 Great ... but, how do I determine what 'resources' I need to increase to
 avoid that crash?  I've tried increasing maxusers from 512-1024, but *if*
 that works, I imagine I'm raising a bunch of limits (and using memory)
 that I don't have to ...

 The server is a Dual-CPU PIII-1Ghz with 3Gig of RAM and ~3Gig of swap
 space right now ... the data drive is 5x18gig drives in a RAID5
 configuration (hardware RAID, not vinum) ...

 I ran top in an xterm so that I could see what was up just before the
 crash, and the results were:

 last pid: 84988;  load averages: 19.82, 57.35, 44.426   up 0+23:33:12 02:05:00
 5021 processes:16 running, 5005 sleeping
 CPU states:  8.7% user,  0.0% nice, 24.3% system,  2.2% interrupt, 64.7% idle
 Mem: 2320M Active, 211M Inact, 390M Wired, 92M Cache, 199M Buf, 4348K Free
 Swap: 3072M Total, 1048M Used, 2024M Free, 34% Inuse, 448K Out

   So, I have plenty of swapspace left, lots of idle CPU and a whole
 whack of processes ...

   Now, looking at the LINT file, there appears to be *alot* of
 things I *could* change ... for instance, NSFBUFS, KVA_FILES, etc ... but
 I don't imagine that changing these blindly is particularly wise ... so,
 how do you determine what to change?  For instance, at a maxusers of 512,
 NSFBUFS should be ~8704, and if I've only got 5000 processes running,
 chances are I'm still safe at that value, no?  But sysctl doesn't show any
 'sf_buf' value, so how do I figure out what I'm using?

   Basically, are there any commands similar to netstat -m for
 nmbclusters that I can run to 'monitor' and isolate where I'm exhausting
 these resources?

   Is there a doc on this sort of stuff that I should be reading for
 this?  Something that talks about kernel tuning for high-load/processes
 servers?

   Thanks for any help in advance ..

 ---
 machine   i386
 cpu   I686_CPU
 ident kernel
 maxusers  1024

 options   NMBCLUSTERS=15360

 options   INET#InterNETworking
 options   INET6   #IPv6 communications protocols
 options   FFS #Berkeley Fast Filesystem
 options   FFS_ROOT#FFS usable as root device [keep this!]
 options   SOFTUPDATES #Enable FFS soft updates support
 options   PROCFS  #Process filesystem
 options   COMPAT_43   #Compatible with BSD 4.3 [KEEP THIS!]
 options   SCSI_DELAY=15000#Delay (in ms) before probing SCSI
 options   KTRACE  #ktrace(1) support

 options SYSVSHM
 options SHMMAXPGS=98304
 options SHMMAX=(SHMMAXPGS*PAGE_SIZE+1)

 options SYSVSEM
 options SEMMNI=2048
 options SEMMNS=4096

 options SYSVMSG #SYSV-style message queues

 options   P1003_1B#Posix P1003_1B real-time extensions
 options   _KPOSIX_PRIORITY_SCHEDULING
 options   ICMP_BANDLIM#Rate limit bad replies

 options   SMP # Symmetric MultiProcessor Kernel
 options   APIC_IO # Symmetric (APIC) I/O

 deviceisa
 devicepci

 devicescbus   # SCSI bus (required)
 deviceda  # Direct Access (disks)
 devicesa  # Sequential Access (tape etc)
 devicecd  # CD
 devicepass# Passthrough device (direct SCSI access)

 deviceamr # AMI MegaRAID
 device  sym

 deviceatkbdc0 at isa? port IO_KBD
 deviceatkbd0  at atkbdc? irq 1 flags 0x1
 devicepsm0at atkbdc? irq 12

 devicevga0at isa?

 pseudo-device splash

 devicesc0 at isa? flags 0x100

 devicenpx0at nexus? port IO_NPX irq 13

 devicesio0at isa? port IO_COM1 flags 0x10 irq 4
 devicesio1at isa? port IO_COM2 irq 3

 devicemiibus  # MII bus support
 devicefxp # Intel 

Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-20 Thread Alfred Perlstein

* The Hermit Hacker [EMAIL PROTECTED] [020420 16:01] wrote:
 
 
 As a quick follow-up to this, doing more searching on the web, I came
 across a few suggested 'sysctl' settings, which I've added to what I had
 before, for a total of:
 
 kern.maxfiles=65534
 jail.sysvipc_allowed=1
 vm.swap_idle_enabled=1
 vfs.vmiodirenable=1
 kern.ipc.somaxconn=4096
 
 I've also just reduced my maxusers to 256 from 1024, since 1024 was
 crashing worse then 512, and I ran across the 'tuning' man page that
 stated that you shouldn't go above 256 :(
 
 Just a bit more detail on the setup ...

You said you're running 5000 processes.  5000 processes of what?

Are they useing SYSVSHM?  If so, this sysctl might help:

kern.ipc.shm_use_phys=1

It'll only work if you set it before your processes setup.

Some more information about what these 5000 processes are doing
would help.

-Alfred

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-20 Thread Marc G. Fournier

On Sat, 20 Apr 2002, Alfred Perlstein wrote:

 * The Hermit Hacker [EMAIL PROTECTED] [020420 16:01] wrote:
 
 
  As a quick follow-up to this, doing more searching on the web, I came
  across a few suggested 'sysctl' settings, which I've added to what I had
  before, for a total of:
 
  kern.maxfiles=65534
  jail.sysvipc_allowed=1
  vm.swap_idle_enabled=1
  vfs.vmiodirenable=1
  kern.ipc.somaxconn=4096
 
  I've also just reduced my maxusers to 256 from 1024, since 1024 was
  crashing worse then 512, and I ran across the 'tuning' man page that
  stated that you shouldn't go above 256 :(
 
  Just a bit more detail on the setup ...

 You said you're running 5000 processes.  5000 processes of what?

 Are they useing SYSVSHM?  If so, this sysctl might help:

 kern.ipc.shm_use_phys=1

Okay, never knew of that one before ... have it set for the next reboot,
as I do have a few postgresql servers going on the 'root (non-jail)'
server ...

 It'll only work if you set it before your processes setup.

 Some more information about what these 5000 processes are doing
 would help.

Sorry ... the server is running ~210 jails ... so the '5k processes' would
be when they all start up their periodic scripts ... normally, it hovers
around 2700 processes ...


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-20 Thread Terry Lambert

Marc G. Fournier wrote:
 Over the past week, I've been trying to get information on how to fix a
 server that panics with:
 
 | panic: vm_map_entry_create: kernel resources exhausted
 | mp_lock = 0101; cpuid = 1; lapic.id = 0100
 | boot() called on cpu#1
 
 Great ... but, how do I determine what 'resources' I need to increase to
 avoid that crash?  I've tried increasing maxusers from 512-1024, but *if*
 that works, I imagine I'm raising a bunch of limits (and using memory)
 that I don't have to ...
 
 The server is a Dual-CPU PIII-1Ghz with 3Gig of RAM and ~3Gig of swap
 space right now ... the data drive is 5x18gig drives in a RAID5
 configuration (hardware RAID, not vinum) ...

You have more memory than you can allocate kernel memory to
provide page table entries for.

The only solution is to increase your kernel virtual address
space size to accomodate the page mappings.

How to do this varies widely by the version of FreeBSD you are
using, and, unless you read NOTES and are running a recent
-current, is not incredibly well documented, and requires an
understanding of how the virtual address space is laid out and
managed (which is also not well documented anywhere).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-20 Thread Terry Lambert

Marc G. Fournier wrote:
  It'll only work if you set it before your processes setup.
 
  Some more information about what these 5000 processes are doing
  would help.
 
 Sorry ... the server is running ~210 jails ... so the '5k processes' would
 be when they all start up their periodic scripts ... normally, it hovers
 around 2700 processes ...

Sounds like my laptop.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?

2002-04-20 Thread Marc G. Fournier

On Sat, 20 Apr 2002, Terry Lambert wrote:

 Marc G. Fournier wrote:
  Over the past week, I've been trying to get information on how to fix a
  server that panics with:
 
  | panic: vm_map_entry_create: kernel resources exhausted
  | mp_lock = 0101; cpuid = 1; lapic.id = 0100
  | boot() called on cpu#1
 
  Great ... but, how do I determine what 'resources' I need to increase to
  avoid that crash?  I've tried increasing maxusers from 512-1024, but *if*
  that works, I imagine I'm raising a bunch of limits (and using memory)
  that I don't have to ...
 
  The server is a Dual-CPU PIII-1Ghz with 3Gig of RAM and ~3Gig of swap
  space right now ... the data drive is 5x18gig drives in a RAID5
  configuration (hardware RAID, not vinum) ...

 You have more memory than you can allocate kernel memory to
 provide page table entries for.

 The only solution is to increase your kernel virtual address
 space size to accomodate the page mappings.

 How to do this varies widely by the version of FreeBSD you are
 using, and, unless you read NOTES and are running a recent
 -current, is not incredibly well documented, and requires an
 understanding of how the virtual address space is laid out and
 managed (which is also not well documented anywhere).

Ya, this is the roadblock I'm hitting :(  I'm running 4.5-STABLE here, as
of this afternoon ... thoughts/suggestiosn based on that?

Also, is there somethign that I can run to monitor this, similar to
running netstat -m to watch nmbclusters?


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message