Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Tue, Apr 23, 2002 at 09:40:11PM -0700, David Schultz [EMAIL PROTECTED] wrote: Userspace processes will allocate memory from UVA space and can grow over 1GB of size if needed by swapping. You can certainly have more than one over-1GB process going on at the same time, but swapping will constrain your performance. It isn't a performance constraint. 32-bit architectures have 32-bit pointers, so in the absence of segmentation tricks, a virtual address space can only contain 2^32 = 4G locations. If the kernel gets 3 GB of that, the maximum amount of memory that any individual user process can use is 1 GB. If you had, say, 4 GB of physical memory, a single user process could not use it all. Swap increases the total amount of memory that *all* processes can allocate by pushing some of the pages out of RAM and onto the disk, but it doesn't increase the total amount of memory that a single process can address. Thank you, Terry and David, now I grasp how it should work (I hope). I really miss some education, but that's life. -- Vallo Kallaste [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
David Schultz wrote: Thus spake Terry Lambert [EMAIL PROTECTED]: Writing a useful (non-fluff) technical book, optimistically, takes 2080 hours ... or 40 hours per week for 52 weeks... a man year. By the time you are done, the book is a year out of date, and even if you worked really hard and kept it up to date (e.g. you had 4 authors and spent only 6 months of wall time on the book), the shelf life on the book is still pretty short. Although it would be unreasonable to comprehensively document the kernel internals and expect the details to remain valid for a year, there is a great deal of lasting information that could be conveyed. For example, Kirk's 4.[34]BSD books cover obsolete systems, and yet much of what they say applies equally well to recent versions of FreeBSD. These are general OS architecture books by a noted authority on OS architecture. That's a barrier to entry for other authors, as the intrinsic value in the information is not constrained to the direct subject of the work. 8-). Kirk is supposedly working on a similar book for FreeBSD, release date indeterminate. In any case, this doesn't resolve the issue of Where do I go to do XXX to version YYY, without having to learn everything there is to know about YYY?. It's true that the specific question ``How do I change my KVA size?'' might have different answers at different times, but I doubt that the ideas behind an answer have all been invented in the last few months. Even things like PAE, used by the Linux 2.4 kernel, remind me of how DOS dealt with the 1 MB memory limit. The PAE is the thing that Peter was reportedly working on in order to break the 4G barrier on machines capable of accessing up to 16G of RAM using bank selection. I didn't mention it by name, since the general principle is also applicable to the Alpha, which has a current limit of 2G because of DMA barrier and other constraints. While it's true that the ideas behind the answer remain the same... the ideas behind the answer are already published in the books I've already referenced in the context of this thread. If people were content to discover implementation details based on a working knowledge of general principles, then this thread would never have occurred in the first place. It's my opinion that people are wanting to do more in depth things to the operating system, and that there is a latency barrier in the way of them doing this. My participation in this discussion, and in particular, with regard to the publication of thorough and useful documentation, has really been around this point. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Mon, 22 Apr 2002 06:04:34 -0700 Terry Lambert [EMAIL PROTECTED] wrote: TL FreeBSD doesn't currently support bank selection. Peter was TL working on it, last time I heard. Linux supports it, at an TL incredible performance penalty. This inspired an off the wall thought that may be insane. Would it be possible (on a 4Gb system) to address 4Gb of RAM and write a driver to make the rest appear as a device, which could then be used for a preferred or (even neater) first level swap. -- C:WIN | Directable Mirrors The computer obeys and wins.|A Better Way To Focus The Sun You lose and Bill collects. | licenses available - see: | http://www.sohara.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Steve O'Hara-Smith wrote: On Mon, 22 Apr 2002 06:04:34 -0700 Terry Lambert [EMAIL PROTECTED] wrote: TL FreeBSD doesn't currently support bank selection. Peter was TL working on it, last time I heard. Linux supports it, at an TL incredible performance penalty. This inspired an off the wall thought that may be insane. Would it be possible (on a 4Gb system) to address 4Gb of RAM and write a driver to make the rest appear as a device, which could then be used for a preferred or (even neater) first level swap. Only if you reserved a window for it. Say 1G of KVA, though last I checked the bank selection granularity wasn't fine enough for that. Memory in the window can *never* be a target for DMA, and should *probably* never be used for kernel structures. If you ever programmed graphics on a TI 99/4A, which has a 4k visible window onto screen memory, or programmed code on the Commodore C64 to use the 32K of RAM-under-ROM, or programmed in DOS or CP/M using Overlays, then you'll be familiar with the problem. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Wed, 24 Apr 2002 16:38:08 -0700 Terry Lambert [EMAIL PROTECTED] wrote: TL Only if you reserved a window for it. Say 1G of KVA, though last I was thinking more like 1M, or even a few K, it sounds like that's not possible. TL I checked the bank selection granularity wasn't fine enough for That would be a problem. TL If you ever programmed graphics on a TI 99/4A, which has a 4k TL visible window onto screen memory, or programmed code on the I've done a number of similar things (Newbrain, Lynx, bank switched CP/M ...). -- C:WIN | Directable Mirrors The computer obeys and wins.|A Better Way To Focus The Sun You lose and Bill collects. | licenses available - see: | http://www.sohara.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Thus spake Terry Lambert [EMAIL PROTECTED]: I'm pretty sure Solaris also used 4K pages for swappable memory in the kernel, as well: 4M pages don't make much sense, since you could, for example, exhaust KVA space with 250 kernel modules (250 X (1 data + 1 code) * 4M = 2G). It doesn't use 4M pages for all kernel memory---just the first 4M of code and the first 4M of data. Supposedly it also allows applications to take advantage of 4M pages, though I'm not sure how that works. At the very least I'd suppose that those pages are locked into memory. I don't know where the Linux limitation comes from; it's really hard for me to believe ~3G, since it's not an even power of 2, so I don't really credit this limitation. I don't really understand it either. I could try to find the link to where I found this information if you're interested, but I wouldn't be surprised if it is simply wrong. The 2.4 kernel docs seem to imply that 2.4 can use 4 GB of RAM without PAE. No problem; I think you will have to, if you are planning on mucking about with more than 4G of physical memory. I have no such plans in the immediate future; at this point the discussion is a curiosity. But with any luck I will already know what's going on by the time I need to worry about tweaking the kernel in bizarre ways. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Mon, 22 Apr 2002, Terry Lambert wrote: Marc G. Fournier wrote: First, alot of this stuff is slowly sinking in ... after repeatedly reading it and waiting for the headache to disapate:) But, one thing that I'm still not clear on ... If I have 4Gig of RAM in a server, does it make any sense to have swap space on that server also? Yes. But it (mostly) does not apply to KVA, only to UVA data, and there are much larger KVA requirements, so the more RAM you have, the bigger the bottleneck to user space for anything you swap. Okay ... to me, a bottleneck generally means slowdown ... so the more RAM I have, the slower the system will perform? Again, from what I'm reading, I have a total of 4Gig *aggregate* to work with, between RAM and swap, but its right here that I'm confused right now ... basically, the closer to 4Gig of RAM you get, the closer to 0 of swap you can have? No. I think you are getting confused on cardinality. You get one KVA, but you get an arbitrary number of UVA's, until you run out of physical RAM to make new ones. You have 4G aggregate KVA + UVA. So if your KVA is 3G, and your UVA is 1G, then you can have 1 3G KVA, and 1000 1G UVA's. Okay, first question here ... above you say 'arbitrary number of UVAs', but here you state 1000 ... just a number picked out of the air, or is this some fixed limit? Certain aspects of KVA are non-swappable. Some parts of UVA are swappable in theory, but never swapped in practice (the page tables and descriptors for each user process). The closer to 4G you have, the more physical RAM you have to spend on managing the physical RAM. The total amount of physical RAM you have to spend on managing memory is based on the total physical RAM plus the total swap. Okay, this makes sense ... I notice in 4.5-STABLE that if maxusers is set to 0, then the system will auto-tune based on memory ... is this something that also is auto-tuned? As soon as that number exceeds ~2.5G, you can't do it on a 32 bit processor any more, unless you hack FreeBSD to swap the VM housekeeping data it uses for swapping UVA contents. Okay, now here you lost me ... which number exceeds ~2.5G? The total amount of physical RAM you have to spend? Think of physical RAM as a resource. It's seperate from the KVA and UVA, but the KVA has to have physical references to do paged memory management. You are limited by how many of these you can have in physical RAM, total. Okay ... alot of lights came on simultaneously with this email ... some of those lights, mind you, might be false, but its a start ... If I'm understanding this at all ... the KVA (among other things) is a pre-allocated/reserved section of RAM for managing the UVAs ... simplistically, it maintains the process list and all resources associated with it ... I realize it does do alot of other things, but this is what I'm more focused on right now ... Now, what exactly is a UVA? You state above '1000 1G UVAs' ... is one process == 1 UVA? Or is one user (with all its associated processes) == 1 UVA? Next, again, if I'm reading this right ... if I set my KVA to 3G, when the system boots, it will reserve 3G of *physical* RAM for the kernel itself, correct? So on a 4G machine, 1G of *physical* RAM will be available for UVAs ... so, if I run 1G worth of processes, that is where swapping to disk comes in, right? Other then the massive performance hit, and the limit you mention about some parts of UVA not being swappable, I could theoretically have 4G of swap to page out to? Is there a reason why this stuff isn't auto-scaled based on RAM as it is? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Tue, Apr 23, 2002 at 09:44:50AM -0300, Marc G. Fournier [EMAIL PROTECTED] wrote: Next, again, if I'm reading this right ... if I set my KVA to 3G, when the system boots, it will reserve 3G of *physical* RAM for the kernel itself, correct? So on a 4G machine, 1G of *physical* RAM will be available for UVAs ... so, if I run 1G worth of processes, that is where swapping to disk comes in, right? Other then the massive performance hit, and the limit you mention about some parts of UVA not being swappable, I could theoretically have 4G of swap to page out to? You can have up to ~12GB of usable swap space, as I've heard. Don't remember why such arbitrary limit, unfortunately. Information about such topics is spread over several lists arhives, usually the subjects are strange, too.. so hard to find out. As I understand it you are on the track, having 3GB allocated to KVA means 1GB for UVA, whatever it exactly means. Userspace processes will allocate memory from UVA space and can grow over 1GB of size if needed by swapping. You can certainly have more than one over-1GB process going on at the same time, but swapping will constrain your performance. I'm sure Terry or some other knowledgeable person will correct me if it doesn't make sense. Is there a reason why this stuff isn't auto-scaled based on RAM as it is? Probably lack of manpower, to code it up you'll have to understand every bit of it, but as we currently see, we don't understand it, probably many others as well :-) -- Vallo Kallaste [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Tue, Apr 23, 2002 at 12:25:31PM -0700, Terry Lambert [EMAIL PROTECTED] wrote: Vallo Kallaste wrote: You can have up to ~12GB of usable swap space, as I've heard. Don't remember why such arbitrary limit, unfortunately. Information about such topics is spread over several lists arhives, usually the subjects are strange, too.. so hard to find out. As I understand it you are on the track, having 3GB allocated to KVA means 1GB for UVA, whatever it exactly means. Userspace processes will allocate memory from UVA space and can grow over 1GB of size if needed by swapping. You can certainly have more than one over-1GB process going on at the same time, but swapping will constrain your performance. I'm sure Terry or some other knowledgeable person will correct me if it doesn't make sense. Actually, you have a total concurrent virtual address space of 4G. If you assign 3G of that to KVA, then you can never exceed 1G of space for a user process, under any circumstances. This is because a given user process and kernel must be able to exist simultaneously in order to do things like copyin/copyout. Hmm, ok, but can we have more than one 1G user process at one time? Four 500MB ones and so on? Somehow I've made such conclusion based on previous information. Should be so, otherwise I don't understand how swapping will fit into overall picture. -- Vallo Kallaste [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Terry Lambert (who fits my arbitrary definition of a good cynic) writes: It's a hazard of Open Source projects, in general, that there are so many people hacking on whatever they think is cool that nothing ever really gets built to a long term design plan that's stable enough that a book stands a chance of having a 1 year lifetime. I could not help but notice your multiple attempts at expresing this particular concept often, that is...an implied necessity of a book that explains what's going on under the kernel hood. I agree that such a book would rapidly be out of date, but I also see the necessity thereof. So, it's time to question the assumption that the information you want available should be in a book. Many websites have annotation as a form of ad-hoc documentation (e.g. php.net). Why not have someone take a crack at documenting the FreeBSD kernel, and perhaps use some annotation feature to create a living document which (hopefully) comes close to describing the kernel architechture? If you want to track a moving target, perhaps you need to use a moving track? -- Dave Hayes - Consultant - Altadena CA, USA - [EMAIL PROTECTED] The opinions expressed above are entirely my own What's so special about the Net? People -still- don't listen... -The Unknown Drummer To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Tue, 23 Apr 2002, Dave Hayes wrote: Terry Lambert (who fits my arbitrary definition of a good cynic) writes: It's a hazard of Open Source projects, in general, that there are so many people hacking on whatever they think is cool that nothing ever really gets built to a long term design plan that's stable enough that a book stands a chance of having a 1 year lifetime. I could not help but notice your multiple attempts at expresing this particular concept often, that is...an implied necessity of a book that explains what's going on under the kernel hood. I agree that such a book would rapidly be out of date, but I also see the necessity thereof. So, it's time to question the assumption that the information you want available should be in a book. Many websites have annotation as a form of ad-hoc documentation (e.g. php.net). Why not have someone take a crack at documenting the FreeBSD kernel, and perhaps use some annotation feature to create a living document which (hopefully) comes close to describing the kernel architechture? If you want to track a moving target, perhaps you need to use a moving track? doxygen is *wonderful* for this for large C++ projects: it's able to draw you inheritance graphs and collaboration diagrams, as well as generate pretty, nicely formatted HTML containing API descriptions generated from javadoc-like comments in header files. I've never tried it on straight C. I suppose it is possible, but given the lack of inheritance, collaboration diagrams are going to be very messy. Still and all, it might be a very useful thing. If not doxygen, then perhaps some way to run the headers through jade/sgmlformat, with docbook-style SGML embedded in comments in header files describing kernel API calls and their parameters, with all typedef'd datatypes appropriately cross-linked. As a hack, one could even gulp embed POD within comments and run perldoc on everything. This could be done nightly or twice daily, with updates appearing live at freebsd.org. HTML versions of man pages with crosslinks go part of the way; what I'm thinking about (if any of you have used doxygen you'll know where I'm going) is a bit more comprehensive, with links to the actual header file from which the documentation was generated, so that the reader can see the declaration in its native context (with the doxygen or docbook comments stripped out for clarity). This still wouldn't address the need for some kind of overall architectural document, as well as the difficulty of keeping it up-to-date, but it would be of tremendous help to everyone working on the project. *If* developers can get used to updating the in-comment documentation whenever they make changes, then this reference would automatically be kept up-to-date. -- Chris BeHanna Software Engineer (Remove bogus before responding.) [EMAIL PROTECTED] I was raised by a pack of wild corn dogs. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Dave Hayes wrote: So, it's time to question the assumption that the information you want available should be in a book. Many websites have annotation as a form of ad-hoc documentation (e.g. php.net). Why not have someone take a crack at documenting the FreeBSD kernel, and perhaps use some annotation feature to create a living document which (hopefully) comes close to describing the kernel architechture? If you want to track a moving target, perhaps you need to use a moving track? How does the person or persons involved in documenting the internals to sufficient detail to be useful to third parties get paid for the effort? We are talking the work equivalent of a full time job. If they aren't paid, what's the incentive to create documentation above and beyond the status quo? If that incentive exists, what's the URL for the documentation that was created as a result? I think I can count on my fingers the number of people who know the various aspects of the boot process well enough to document it for people who want to hack on it to, for example, declaratively allocate physical memory as part of the boot process. A lot of the information in this thread was never collected centrally anywhere before (e.g. the missing piece about the files to modify and the calculation of the NKPDE value that was left out of David Greenman's posting of a year ago). Most of this information will be quickly out of date, since as soon as you document something, people understand it enough to realize the shortcomings, and so nearly the first thing that happens is the shortcomings are corrected, and voila, your documentation is now out of date. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Thus spake Vallo Kallaste [EMAIL PROTECTED]: Userspace processes will allocate memory from UVA space and can grow over 1GB of size if needed by swapping. You can certainly have more than one over-1GB process going on at the same time, but swapping will constrain your performance. It isn't a performance constraint. 32-bit architectures have 32-bit pointers, so in the absence of segmentation tricks, a virtual address space can only contain 2^32 = 4G locations. If the kernel gets 3 GB of that, the maximum amount of memory that any individual user process can use is 1 GB. If you had, say, 4 GB of physical memory, a single user process could not use it all. Swap increases the total amount of memory that *all* processes can allocate by pushing some of the pages out of RAM and onto the disk, but it doesn't increase the total amount of memory that a single process can address. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Thus spake Terry Lambert [EMAIL PROTECTED]: Writing a useful (non-fluff) technical book, optimistically, takes 2080 hours ... or 40 hours per week for 52 weeks... a man year. By the time you are done, the book is a year out of date, and even if you worked really hard and kept it up to date (e.g. you had 4 authors and spent only 6 months of wall time on the book), the shelf life on the book is still pretty short. Although it would be unreasonable to comprehensively document the kernel internals and expect the details to remain valid for a year, there is a great deal of lasting information that could be conveyed. For example, Kirk's 4.[34]BSD books cover obsolete systems, and yet much of what they say applies equally well to recent versions of FreeBSD. It's true that the specific question ``How do I change my KVA size?'' might have different answers at different times, but I doubt that the ideas behind an answer have all been invented in the last few months. Even things like PAE, used by the Linux 2.4 kernel, remind me of how DOS dealt with the 1 MB memory limit. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Marc, -On [20020421 00:30], Marc G. Fournier ([EMAIL PROTECTED]) wrote: Over the past week, I've been trying to get information on how to fix a server that panics with: | panic: vm_map_entry_create: kernel resources exhausted | mp_lock = 0101; cpuid = 1; lapic.id = 0100 | boot() called on cpu#1 Take a look at this: http://www.freebsd.org/cgi/getmsg.cgi?fetch=245329+248644+/usr/local/www/db/text/2001/freebsd-hackers/20010624.freebsd-hackers Hope this helps, -- Jeroen Ruigrok van der Werven / asmodai / Kita no Mono asmodai@[wxs.nl|xmach.org], finger [EMAIL PROTECTED] http://www.softweyr.com/asmodai/ | http://www.[tendra|xmach].org/ How many cares one loses when one decides not to be something but to be someone. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Marc G. Fournier wrote: No, there's no stats collected on this stuff, because it's a pretty obvious and straight-forward thing: you have to have a KVA space large enough that, once you subtract out 4K for each 4M of physical memory and swap (max 4G total for both), you end up with memory left over for the kernel to use, and your limits are such that the you don't run out of PTEs before you run out of mbufs (or whatever you plan on allocating). ... and translated to english, this means? :) Okay, I'm going to assume that I'm allowed 4Gig of RAM + 4Gig of Swap, for a total of 8Gig ... so, if I subtract out 4K for each 4M, that is 8M for ... what? So, I've theoretically got 8184M of VM available for the kernel to use right now? what are PTEs and how do I know how many I have right now? as for mbufs, I've currently got: No. Each 4M of physical memory takes 4K of statically allocated KVA. Each 4M of backing store takes 4K of statically allocated KVA. The definition of backing store includes: o All dirty data pages in swap o All dirty code pages in swap o All clean data pages in files mapped into process or kernel address space o All clean code pages for executables mapped into process or kernel address space o Reserved mappings for copy-on-write pages that haven't yet been written A PTE is a page table entry. It's the 32 bit value in the page table for each address space (one for the kernel, one per process). See the books I posted the titles of for more details, or read the Intel processor PDF's from their developer web site. jupiter netstat -m 173/1664/61440 mbufs in use (current/peak/max): 77 mbufs allocated to data 96 mbufs allocated to packet headers 71/932/15360 mbuf clusters in use (current/peak/max) 2280 Kbytes allocated to network (4% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines So how do I find out where my PTEs are sitting at? The mbufs are only important because most people allocate a large number of mbufs up front for networking applications, or for alrge numbers of users with network applications that will need resources in order to be able to actually run. There's also protocol control blocks and other allocation that occur up front, based on the maximum number of system open files and sockets you intend to permit. The user space stuff is generally a lot easier to calculate: do a ps -gaxl, round each entry in the VSZ column up to 4M, divide by 4K, and that tells you how many 4K units you have allocated for user space. For kernel space, the answer is that there are some allocated at boot time, (120M worth), and then the kernel map is grown, as necessary, until it hits the KVA space limit. If you plan on using up every byte, then divide your total KVA space by 4K to get the number of 4K pages allocated there. For the kernel stuff... you basically need to know where the kernel puts how much memory, based on the tuning parameters you use on it. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Marc G. Fournier wrote: On Sun, 21 Apr 2002, Terry Lambert wrote: No, there's no stats collected on this stuff, because it's a pretty obvious and straight-forward thing: you have to have a KVA space large enough that, once you subtract out 4K for each 4M of physical memory and swap (max 4G total for both), you end up with memory left over for the kernel to use, and your limits are such that the you don't run out of PTEs before you run out of mbufs (or whatever you plan on allocating). God, I'm glad its straightforwards :) Okay, first off, you say (max 4G total for both) ... do you max *total* between the two, or phy can be 4g *plus* swap can be 4g for a total of 8g? You aren't going to be able to exceed 4G, no matter what you do, because that's the limit of your address space. If you want more, then you need to use a 64 bit processor (or use a processor that supports bank selection, and hack up FreeBSD to do bank swapping on 2G at a time, just like Linux has been hacked up, and expect that it won't be very useful). If you are swapping, you are demand paging. The way demand paging works is that you reference a page that has been swapped out, or for which physical memory backing store has not been addigned. When you make this reference, you get a page not present fault (a trap 12). The trap handler puts the faulting process to sleep, and then starts the process of pulling the page in from backing store (if it's not a create-on-reference), which, among other things, locates a physical page to contain the copy of the data pulled in from the backing store (or zero'ed out of physical memoy, if it's an unbacked page, e.g. non-swappable, or swappable, but for which swap has not yet been allocated, because it's the first use). Only certain types of kernel memory are swappable -- mostly kernel memory that's allocated on a per process basis. Kernel swapping really does you no good, if you have a fully populated physical memory in the virtual address space, since there's only one kernel virtual address space (SMP reserves a little bit of per processor memory, but the amount is tiny: one page descriptor's worth: 4M); after a certain point, your KVA is committed, and it's a mistake to have it compete in the same LRU domain as processes. You can't really avoid that, for the most part, since there's a shared TLB cache that you really don't have opportunity to manage, other than by seperating 4M vs. 4K pages (and 2M, etc., for the Pentium Pro, though variable page granularity is not supported in FreeBSD, since it's not common to most hardware people actually have). For instance, right now, I have 3Gig of physical and ~3gig of swap allocated ... Each process maintains its own virtual address space. Almost all of a process virtual address space is swappable. So if you are swapping, it's going to be process address space: UVA, not KVA. If you increase the KVA, then you will decrease the UVA available to user processes. The total of the two can not exceed 4G. With 4G of physical memory, then 3G of KVA is practically a requirement, particularly if you intend to use the additional memory for kernel data (you will have to, for PDE's: you have no choice). For 3G, it's ~2.5G KVA minimally required. Personally, I'd just put it at 3G, and live with it, so you can throw in RAM to your limit later, when you decide you need to throw RAM at some problem or other. If you can't afford for the UVA to be as small as 1G, then you are going to have to make some hard decisions on the amount of physical RAM you put in the machine. It's not really that bad: for 3G of KVA, you need 3M for PDE's. The problem comes when they are exhausted because of the amount of PDE's you have lying around to describe UVA pages that are swapped out for various processes, and for kernel memory requirements that go way up when you crank up the kernel's ability to handle load (e.g. for network equipment, I generally take half of physical memory for mbufs, mostly because that's around the limit of what I can take, and have anything left over). That you are using System V shared memory segments is *REALLY* going to hurt you; each of these shared memory segment comes out of the KVA, so using shared memory segments with the shm*() calls, rather than using mmap()'ed files as backing store, can eat huge chunks of KVA, as well as fragmenting the KVA, particularly over time. For more details on paged memory management on x86, see: Protected Mode Software Architecture and: The Indispensible PC Hardware Book You might also want to find a book on bootstrapping protected mode operating systems (actually, I have yet to find a very good one, so post about it, if you find one). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Thus spake Terry Lambert [EMAIL PROTECTED]: If you want more, then you need to use a 64 bit processor (or use a processor that supports bank selection, and hack up FreeBSD to do bank swapping on 2G at a time, just like Linux has been hacked up, and expect that it won't be very useful). I'm guessing that this just means looking at more than 4 GB of memory by working with 2 GB frames at a time. As I recall, David Greenman said that this hack would essentially require a rewrite of the VM system. Does this just boil down to using 36 bit physical addresses? Are there plans for FreeBSD to support it, or is everyone just waiting until 64 bit processors become more common? You can't really avoid that, for the most part, since there's a shared TLB cache that you really don't have opportunity to manage, other than by seperating 4M vs. 4K pages (and 2M, etc., for the Pentium Pro, though variable page granularity is not supported in FreeBSD, since it's not common to most hardware people actually have). Does FreeBSD use 4M pages exclusively for kernel memory, as in Solaris, or is there a more complicated scheme? If you increase the KVA, then you will decrease the UVA available to user processes. The total of the two can not exceed 4G. In Linux, all of physical memory is mapped into the kernel's virtual address space, and hence, until recently Linux was limited to ~3 GB of physical memory. FreeBSD, as I understand, doesn't do that. So is the cause of this limitation that the top half of the kernel has to share a virtual address space with user processes? I'll have to read those books one of these days when I have time(6). Thanks for the info. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Test - Please ignore - Original Message - From: David Schultz [EMAIL PROTECTED] To: Terry Lambert [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, April 22, 2002 6:09 AM Subject: Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ? Thus spake Terry Lambert [EMAIL PROTECTED]: If you want more, then you need to use a 64 bit processor (or use a processor that supports bank selection, and hack up FreeBSD to do bank swapping on 2G at a time, just like Linux has been hacked up, and expect that it won't be very useful). I'm guessing that this just means looking at more than 4 GB of memory by working with 2 GB frames at a time. As I recall, David Greenman said that this hack would essentially require a rewrite of the VM system. Does this just boil down to using 36 bit physical addresses? Are there plans for FreeBSD to support it, or is everyone just waiting until 64 bit processors become more common? You can't really avoid that, for the most part, since there's a shared TLB cache that you really don't have opportunity to manage, other than by seperating 4M vs. 4K pages (and 2M, etc., for the Pentium Pro, though variable page granularity is not supported in FreeBSD, since it's not common to most hardware people actually have). Does FreeBSD use 4M pages exclusively for kernel memory, as in Solaris, or is there a more complicated scheme? If you increase the KVA, then you will decrease the UVA available to user processes. The total of the two can not exceed 4G. In Linux, all of physical memory is mapped into the kernel's virtual address space, and hence, until recently Linux was limited to ~3 GB of physical memory. FreeBSD, as I understand, doesn't do that. So is the cause of this limitation that the top half of the kernel has to share a virtual address space with user processes? I'll have to read those books one of these days when I have time(6). Thanks for the info. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-stable in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Mon, 22 Apr 2002, Terry Lambert wrote: Marc G. Fournier wrote: On Sun, 21 Apr 2002, Terry Lambert wrote: No, there's no stats collected on this stuff, because it's a pretty obvious and straight-forward thing: you have to have a KVA space large enough that, once you subtract out 4K for each 4M of physical memory and swap (max 4G total for both), you end up with memory left over for the kernel to use, and your limits are such that the you don't run out of PTEs before you run out of mbufs (or whatever you plan on allocating). God, I'm glad its straightforwards :) Okay, first off, you say (max 4G total for both) ... do you max *total* between the two, or phy can be 4g *plus* swap can be 4g for a total of 8g? You aren't going to be able to exceed 4G, no matter what you do, because that's the limit of your address space. If you want more, then you need to use a 64 bit processor (or use a processor that supports bank selection, and hack up FreeBSD to do bank swapping on 2G at a time, just like Linux has been hacked up, and expect that it won't be very useful). Now I'm confused ... from what I've read so far, going out and buying an IBM eSeries 350 with 16Gig of RAM with Dual-PIII processors and hoping to run FreeBSD on it is not possible? Or, rather, hoping to use more then 4 out of 16Gig of RAM is? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Marc G. Fournier wrote: You aren't going to be able to exceed 4G, no matter what you do, because that's the limit of your address space. If you want more, then you need to use a 64 bit processor (or use a processor that supports bank selection, and hack up FreeBSD to do bank swapping on 2G at a time, just like Linux has been hacked up, and expect that it won't be very useful). Now I'm confused ... from what I've read so far, going out and buying an IBM eSeries 350 with 16Gig of RAM with Dual-PIII processors and hoping to run FreeBSD on it is not possible? Or, rather, hoping to use more then 4 out of 16Gig of RAM is? FreeBSD doesn't currently support bank selection. Peter was working on it, last time I heard. Linux supports it, at an incredible performance penalty. But yes, it means only 4G of the RAM will be usable by you. Bank selection works by leaving the address space at 4G, and switching between banks, 2G at a time out of the 16G. Basically, your kernel code lives in the first 2G, and then you get to pick which 2G out of the 16G is the last 2G. As I said, I expect that doing this won't be very useful; since Itanium's are available, and FreeBSD runs native in multiuser mode on IA64 now, there's really no reason to do the 16G, 2G at a time bank selection trick. The main reason I don't think it'll be useful is DMA: for the DMA to occur, it will have to occur into the first 2G, so that it's never selected out. This is because, no matter what you do, your address space is limited to 4G total: adding banks just controls what physical memory is placed in that 4G window at any given time. Since the most useful thing you could do with more memory is buffers for networking and disk I/O for things like web and file servers... not very useful. Consider that if I had two processes, and divided the memory into 8 2G banks. The 0th bank has the kernel in it, and can never be selected out, if you expect the kernel to run or DMA to be possible. The 1th bank contains the memory for one process, running on CPU 0. The 4th bank contains the memory for one process, running on CPU 1. Basically, now, you can not run these processes simultaneously, because they have conflicting bank selects. You could jam everything into all the code -- you'd have to hack the paged memory management, the VM, the scheduler, etc., to get it to work -- but, even so, after all that work, what you have effectively bought yourself is an L3 cache that's in RAM, rather than in a swap partition. You are better off just making it usable as swap, semi-directly, and then making all the paging structures not used for the kernel itself, swappable. Even so, your KVA is restricted by whatever your bank size is, and you can't use it directly (e.g. KVA + UVA + bank_region = 4G). You really, really ought to look at the books I recommended, if you are confused about why you can only use 4G with a 32 bit processor and FreeBSD, without additional heroic work. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
First, alot of this stuff is slowly sinking in ... after repeatedly reading it and waiting for the headache to disapate:) But, one thing that I'm still not clear on ... If I have 4Gig of RAM in a server, does it make any sense to have swap space on that server also? Again, from what I'm reading, I have a total of 4Gig *aggregate* to work with, between RAM and swap, but its right here that I'm confused right now ... basically, the closer to 4Gig of RAM you get, the closer to 0 of swap you can have? On Mon, 22 Apr 2002, Terry Lambert wrote: Marc G. Fournier wrote: No, there's no stats collected on this stuff, because it's a pretty obvious and straight-forward thing: you have to have a KVA space large enough that, once you subtract out 4K for each 4M of physical memory and swap (max 4G total for both), you end up with memory left over for the kernel to use, and your limits are such that the you don't run out of PTEs before you run out of mbufs (or whatever you plan on allocating). ... and translated to english, this means? :) Okay, I'm going to assume that I'm allowed 4Gig of RAM + 4Gig of Swap, for a total of 8Gig ... so, if I subtract out 4K for each 4M, that is 8M for ... what? So, I've theoretically got 8184M of VM available for the kernel to use right now? what are PTEs and how do I know how many I have right now? as for mbufs, I've currently got: No. Each 4M of physical memory takes 4K of statically allocated KVA. Each 4M of backing store takes 4K of statically allocated KVA. The definition of backing store includes: o All dirty data pages in swap o All dirty code pages in swap o All clean data pages in files mapped into process or kernel address space o All clean code pages for executables mapped into process or kernel address space o Reserved mappings for copy-on-write pages that haven't yet been written A PTE is a page table entry. It's the 32 bit value in the page table for each address space (one for the kernel, one per process). See the books I posted the titles of for more details, or read the Intel processor PDF's from their developer web site. jupiter netstat -m 173/1664/61440 mbufs in use (current/peak/max): 77 mbufs allocated to data 96 mbufs allocated to packet headers 71/932/15360 mbuf clusters in use (current/peak/max) 2280 Kbytes allocated to network (4% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines So how do I find out where my PTEs are sitting at? The mbufs are only important because most people allocate a large number of mbufs up front for networking applications, or for alrge numbers of users with network applications that will need resources in order to be able to actually run. There's also protocol control blocks and other allocation that occur up front, based on the maximum number of system open files and sockets you intend to permit. The user space stuff is generally a lot easier to calculate: do a ps -gaxl, round each entry in the VSZ column up to 4M, divide by 4K, and that tells you how many 4K units you have allocated for user space. For kernel space, the answer is that there are some allocated at boot time, (120M worth), and then the kernel map is grown, as necessary, until it hits the KVA space limit. If you plan on using up every byte, then divide your total KVA space by 4K to get the number of 4K pages allocated there. For the kernel stuff... you basically need to know where the kernel puts how much memory, based on the tuning parameters you use on it. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Jeroen Ruigrok/asmodai wrote: Take a look at this: http://www.freebsd.org/cgi/getmsg.cgi?fetch=245329+248644+/usr/local/www/db/text/2001/freebsd-hackers/20010624.freebsd-hackers This is actually no longer valid, since there have been changes to both the PDE caclcualtions and the kernel base definition to try and make it more automatic the change the KVA space size. At the time of the referenced posting, the modifications necessary were to /sys/conf/ldscript.i386 and /sys/i386/include/pmap.h. David also neglected to document how he calculated the 511, which is actually 511 for a UP system, 510 for an SMP system, which is to divide the kernbase by 0x0040, after subtracting 0x0010, and then subtracting the recursive entry out of the total. You also have to subtract out the private entries (if any) for SMP, etc.. Basically, you have to calculate the number of descriptor entries required to map the entire KVA space as 4K pages from 1K of 4K page tables (1K worth of entries in a 4K page descriptor table for the address space). Of course, now everyone is going to say how do I... how do I..., wanting one of the six ways you have to do it, based on the FreeBSD version and/or intermediate release (-release? -stable? -security? -some-date-here?), rather than figuring out the answer based on a single known release. The other issue here is that the number 1 reason for wanting to dick around with this is to be able to add more physical memory, and to do that successfully, you have to know a hell of a lot more about tuning FreeBSD than reading the happy-fun tuning manual page can ever teach you, without you understanding how the OS actually does its thing at a low level. I personally consider the tuning man page as just a knee-jerk reaction to bad publicity resulting from naieve benckmarking. IMO, it's much better to just give elliptical clues, and then leave the job to the people who can follow the clues and learn enough that they not only get the right answer, but then end up knowing enough about *why* it's the right answer to be able to do the other required tuning. If FreeBSD would ever sit still long enough for someone to get a book out, there's probably be a book on the subject (Kirk has been working on one for a year now, according to several people, called The Design and Implementation of the FreeBSD Operating System; no, I don't know what version it's supposed to apply to); IMO, an architect should set some things in stone, and leave them there long enough that documentation doesn't immediately go out of date. It's a hazard of Open Source projects, in general, that there are so many people hacking on whatever they think is cool that nothing ever really gets built to a long term design plan that's stable enough that a book stands a chance of having a 1 year lifetime. Basically, it'll boil down to paying someone who knows where the bodies are buried to do the work for you, if you want to get more than just a hack job. 8-(. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
David Schultz wrote: Thus spake Terry Lambert [EMAIL PROTECTED]: If you want more, then you need to use a 64 bit processor (or use a processor that supports bank selection, and hack up FreeBSD to do bank swapping on 2G at a time, just like Linux has been hacked up, and expect that it won't be very useful). I'm guessing that this just means looking at more than 4 GB of memory by working with 2 GB frames at a time. As I recall, David Greenman said that this hack would essentially require a rewrite of the VM system. Does this just boil down to using 36 bit physical addresses? Are there plans for FreeBSD to support it, or is everyone just waiting until 64 bit processors become more common? David Greenman is right.Nevertheless, Peter was planning on doing the hack, according to his postings to -current. Please check the list archives for these things. Does FreeBSD use 4M pages exclusively for kernel memory, as in Solaris, or is there a more complicated scheme? FreeBSD starts out using 4K pages for the premapped memory, and switches over to a 4M page scheme for the initially loaded kernel, for at least the first 4M. The PTEs that were for the 4K pages that are replaced with the 4M mappings are simply lost in the reload of CR3, and never recovered for the system to use (the pages containing the PTEs there are leaked, but it's usually one page, so 4K is not that bad a leak). For much of the FreeBSD kernel, 4K pages are used. I'm pretty sure Solaris also used 4K pages for swappable memory in the kernel, as well: 4M pages don't make much sense, since you could, for example, exhaust KVA space with 250 kernel modules (250 X (1 data + 1 code) * 4M = 2G). If you increase the KVA, then you will decrease the UVA available to user processes. The total of the two can not exceed 4G. In Linux, all of physical memory is mapped into the kernel's virtual address space, and hence, until recently Linux was limited to ~3 GB of physical memory. FreeBSD, as I understand, doesn't do that. So is the cause of this limitation that the top half of the kernel has to share a virtual address space with user processes? No. You need to look at the copyin implementation in both OSs to find the answer. The way it works is by mapping the address space of the process in question and the kernel at the same time, and copying bytes between them. These are really basic questions about memory layout, which you should already know the answer to, if you are mucking about in the KVA size or other parts of the kernel. I don't know where the Linux limitation comes from; it's really hard for me to believe ~3G, since it's not an even power of 2, so I don't really credit this limitation. I'll have to read those books one of these days when I have time(6). Thanks for the info. No problem; I think you will have to, if you are planning on mucking about with more than 4G of physical memory. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Marc G. Fournier wrote: First, alot of this stuff is slowly sinking in ... after repeatedly reading it and waiting for the headache to disapate:) But, one thing that I'm still not clear on ... If I have 4Gig of RAM in a server, does it make any sense to have swap space on that server also? Yes. But it (mostly) does not apply to KVA, only to UVA data, and there are much larger KVA requirements, so the more RAM you have, the bigger the bottleneck to user space for anything you swap. Again, from what I'm reading, I have a total of 4Gig *aggregate* to work with, between RAM and swap, but its right here that I'm confused right now ... basically, the closer to 4Gig of RAM you get, the closer to 0 of swap you can have? No. I think you are getting confused on cardinality. You get one KVA, but you get an arbitrary number of UVA's, until you run out of physical RAM to make new ones. You have 4G aggregate KVA + UVA. So if your KVA is 3G, and your UVA is 1G, then you can have 1 3G KVA, and 1000 1G UVA's. Certain aspects of KVA are non-swappable. Some parts of UVA are swappable in theory, but never swapped in practice (the page tables and descriptors for each user process). The closer to 4G you have, the more physical RAM you have to spend on managing the physical RAM. The total amount of physical RAM you have to spend on managing memory is based on the total physical RAM plus the total swap. As soon as that number exceeds ~2.5G, you can't do it on a 32 bit processor any more, unless you hack FreeBSD to swap the VM housekeeping data it uses for swapping UVA contents. Think of physical RAM as a resource. It's seperate from the KVA and UVA, but the KVA has to have physical references to do paged memory management. You are limited by how many of these you can have in physical RAM, total. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Marc G. Fournier wrote: You have more memory than you can allocate kernel memory to provide page table entries for. The only solution is to increase your kernel virtual address space size to accomodate the page mappings. How to do this varies widely by the version of FreeBSD you are using, and, unless you read NOTES and are running a recent -current, is not incredibly well documented, and requires an understanding of how the virtual address space is laid out and managed (which is also not well documented anywhere). Ya, this is the roadblock I'm hitting :( I'm running 4.5-STABLE here, as of this afternoon ... thoughts/suggestiosn based on that? Read the handbook as it existed for 4.5-STABLE, and read NOTES. It (sorta) tells you how to increase your KVA size. Also, is there somethign that I can run to monitor this, similar to running netstat -m to watch nmbclusters? DDB? 8-) 8-). No, there's no stats collected on this stuff, because it's a pretty obvious and straight-forward thing: you have to have a KVA space large enough that, once you subtract out 4K for each 4M of physical memory and swap (max 4G total for both), you end up with memory left over for the kernel to use, and your limits are such that the you don't run out of PTEs before you run out of mbufs (or whatever you plan on allocating). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Sun, 21 Apr 2002, Terry Lambert wrote: No, there's no stats collected on this stuff, because it's a pretty obvious and straight-forward thing: you have to have a KVA space large enough that, once you subtract out 4K for each 4M of physical memory and swap (max 4G total for both), you end up with memory left over for the kernel to use, and your limits are such that the you don't run out of PTEs before you run out of mbufs (or whatever you plan on allocating). God, I'm glad its straightforwards :) Okay, first off, you say (max 4G total for both) ... do you max *total* between the two, or phy can be 4g *plus* swap can be 4g for a total of 8g? For instance, right now, I have 3Gig of physical and ~3gig of swap allocated ... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Sun, 21 Apr 2002, Terry Lambert wrote: Marc G. Fournier wrote: You have more memory than you can allocate kernel memory to provide page table entries for. The only solution is to increase your kernel virtual address space size to accomodate the page mappings. How to do this varies widely by the version of FreeBSD you are using, and, unless you read NOTES and are running a recent -current, is not incredibly well documented, and requires an understanding of how the virtual address space is laid out and managed (which is also not well documented anywhere). Ya, this is the roadblock I'm hitting :( I'm running 4.5-STABLE here, as of this afternoon ... thoughts/suggestiosn based on that? Read the handbook as it existed for 4.5-STABLE, and read NOTES. It (sorta) tells you how to increase your KVA size. Also, is there somethign that I can run to monitor this, similar to running netstat -m to watch nmbclusters? DDB? 8-) 8-). No, there's no stats collected on this stuff, because it's a pretty obvious and straight-forward thing: you have to have a KVA space large enough that, once you subtract out 4K for each 4M of physical memory and swap (max 4G total for both), you end up with memory left over for the kernel to use, and your limits are such that the you don't run out of PTEs before you run out of mbufs (or whatever you plan on allocating). ... and translated to english, this means? :) Okay, I'm going to assume that I'm allowed 4Gig of RAM + 4Gig of Swap, for a total of 8Gig ... so, if I subtract out 4K for each 4M, that is 8M for ... what? So, I've theoretically got 8184M of VM available for the kernel to use right now? what are PTEs and how do I know how many I have right now? as for mbufs, I've currently got: jupiter netstat -m 173/1664/61440 mbufs in use (current/peak/max): 77 mbufs allocated to data 96 mbufs allocated to packet headers 71/932/15360 mbuf clusters in use (current/peak/max) 2280 Kbytes allocated to network (4% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines So how do I find out where my PTEs are sitting at? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
As a quick follow-up to this, doing more searching on the web, I came across a few suggested 'sysctl' settings, which I've added to what I had before, for a total of: kern.maxfiles=65534 jail.sysvipc_allowed=1 vm.swap_idle_enabled=1 vfs.vmiodirenable=1 kern.ipc.somaxconn=4096 I've also just reduced my maxusers to 256 from 1024, since 1024 was crashing worse then 512, and I ran across the 'tuning' man page that stated that you shouldn't go above 256 :( Just a bit more detail on the setup ... On Sat, 20 Apr 2002, Marc G. Fournier wrote: Over the past week, I've been trying to get information on how to fix a server that panics with: | panic: vm_map_entry_create: kernel resources exhausted | mp_lock = 0101; cpuid = 1; lapic.id = 0100 | boot() called on cpu#1 Great ... but, how do I determine what 'resources' I need to increase to avoid that crash? I've tried increasing maxusers from 512-1024, but *if* that works, I imagine I'm raising a bunch of limits (and using memory) that I don't have to ... The server is a Dual-CPU PIII-1Ghz with 3Gig of RAM and ~3Gig of swap space right now ... the data drive is 5x18gig drives in a RAID5 configuration (hardware RAID, not vinum) ... I ran top in an xterm so that I could see what was up just before the crash, and the results were: last pid: 84988; load averages: 19.82, 57.35, 44.426 up 0+23:33:12 02:05:00 5021 processes:16 running, 5005 sleeping CPU states: 8.7% user, 0.0% nice, 24.3% system, 2.2% interrupt, 64.7% idle Mem: 2320M Active, 211M Inact, 390M Wired, 92M Cache, 199M Buf, 4348K Free Swap: 3072M Total, 1048M Used, 2024M Free, 34% Inuse, 448K Out So, I have plenty of swapspace left, lots of idle CPU and a whole whack of processes ... Now, looking at the LINT file, there appears to be *alot* of things I *could* change ... for instance, NSFBUFS, KVA_FILES, etc ... but I don't imagine that changing these blindly is particularly wise ... so, how do you determine what to change? For instance, at a maxusers of 512, NSFBUFS should be ~8704, and if I've only got 5000 processes running, chances are I'm still safe at that value, no? But sysctl doesn't show any 'sf_buf' value, so how do I figure out what I'm using? Basically, are there any commands similar to netstat -m for nmbclusters that I can run to 'monitor' and isolate where I'm exhausting these resources? Is there a doc on this sort of stuff that I should be reading for this? Something that talks about kernel tuning for high-load/processes servers? Thanks for any help in advance .. --- machine i386 cpu I686_CPU ident kernel maxusers 1024 options NMBCLUSTERS=15360 options INET#InterNETworking options INET6 #IPv6 communications protocols options FFS #Berkeley Fast Filesystem options FFS_ROOT#FFS usable as root device [keep this!] options SOFTUPDATES #Enable FFS soft updates support options PROCFS #Process filesystem options COMPAT_43 #Compatible with BSD 4.3 [KEEP THIS!] options SCSI_DELAY=15000#Delay (in ms) before probing SCSI options KTRACE #ktrace(1) support options SYSVSHM options SHMMAXPGS=98304 options SHMMAX=(SHMMAXPGS*PAGE_SIZE+1) options SYSVSEM options SEMMNI=2048 options SEMMNS=4096 options SYSVMSG #SYSV-style message queues options P1003_1B#Posix P1003_1B real-time extensions options _KPOSIX_PRIORITY_SCHEDULING options ICMP_BANDLIM#Rate limit bad replies options SMP # Symmetric MultiProcessor Kernel options APIC_IO # Symmetric (APIC) I/O deviceisa devicepci devicescbus # SCSI bus (required) deviceda # Direct Access (disks) devicesa # Sequential Access (tape etc) devicecd # CD devicepass# Passthrough device (direct SCSI access) deviceamr # AMI MegaRAID device sym deviceatkbdc0 at isa? port IO_KBD deviceatkbd0 at atkbdc? irq 1 flags 0x1 devicepsm0at atkbdc? irq 12 devicevga0at isa? pseudo-device splash devicesc0 at isa? flags 0x100 devicenpx0at nexus? port IO_NPX irq 13 devicesio0at isa? port IO_COM1 flags 0x10 irq 4 devicesio1at isa? port IO_COM2 irq 3 devicemiibus # MII bus support devicefxp # Intel
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
* The Hermit Hacker [EMAIL PROTECTED] [020420 16:01] wrote: As a quick follow-up to this, doing more searching on the web, I came across a few suggested 'sysctl' settings, which I've added to what I had before, for a total of: kern.maxfiles=65534 jail.sysvipc_allowed=1 vm.swap_idle_enabled=1 vfs.vmiodirenable=1 kern.ipc.somaxconn=4096 I've also just reduced my maxusers to 256 from 1024, since 1024 was crashing worse then 512, and I ran across the 'tuning' man page that stated that you shouldn't go above 256 :( Just a bit more detail on the setup ... You said you're running 5000 processes. 5000 processes of what? Are they useing SYSVSHM? If so, this sysctl might help: kern.ipc.shm_use_phys=1 It'll only work if you set it before your processes setup. Some more information about what these 5000 processes are doing would help. -Alfred To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Sat, 20 Apr 2002, Alfred Perlstein wrote: * The Hermit Hacker [EMAIL PROTECTED] [020420 16:01] wrote: As a quick follow-up to this, doing more searching on the web, I came across a few suggested 'sysctl' settings, which I've added to what I had before, for a total of: kern.maxfiles=65534 jail.sysvipc_allowed=1 vm.swap_idle_enabled=1 vfs.vmiodirenable=1 kern.ipc.somaxconn=4096 I've also just reduced my maxusers to 256 from 1024, since 1024 was crashing worse then 512, and I ran across the 'tuning' man page that stated that you shouldn't go above 256 :( Just a bit more detail on the setup ... You said you're running 5000 processes. 5000 processes of what? Are they useing SYSVSHM? If so, this sysctl might help: kern.ipc.shm_use_phys=1 Okay, never knew of that one before ... have it set for the next reboot, as I do have a few postgresql servers going on the 'root (non-jail)' server ... It'll only work if you set it before your processes setup. Some more information about what these 5000 processes are doing would help. Sorry ... the server is running ~210 jails ... so the '5k processes' would be when they all start up their periodic scripts ... normally, it hovers around 2700 processes ... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Marc G. Fournier wrote: Over the past week, I've been trying to get information on how to fix a server that panics with: | panic: vm_map_entry_create: kernel resources exhausted | mp_lock = 0101; cpuid = 1; lapic.id = 0100 | boot() called on cpu#1 Great ... but, how do I determine what 'resources' I need to increase to avoid that crash? I've tried increasing maxusers from 512-1024, but *if* that works, I imagine I'm raising a bunch of limits (and using memory) that I don't have to ... The server is a Dual-CPU PIII-1Ghz with 3Gig of RAM and ~3Gig of swap space right now ... the data drive is 5x18gig drives in a RAID5 configuration (hardware RAID, not vinum) ... You have more memory than you can allocate kernel memory to provide page table entries for. The only solution is to increase your kernel virtual address space size to accomodate the page mappings. How to do this varies widely by the version of FreeBSD you are using, and, unless you read NOTES and are running a recent -current, is not incredibly well documented, and requires an understanding of how the virtual address space is laid out and managed (which is also not well documented anywhere). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
Marc G. Fournier wrote: It'll only work if you set it before your processes setup. Some more information about what these 5000 processes are doing would help. Sorry ... the server is running ~210 jails ... so the '5k processes' would be when they all start up their periodic scripts ... normally, it hovers around 2700 processes ... Sounds like my laptop. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD 4.5-STABLE not easily scalable to large servers ... ?
On Sat, 20 Apr 2002, Terry Lambert wrote: Marc G. Fournier wrote: Over the past week, I've been trying to get information on how to fix a server that panics with: | panic: vm_map_entry_create: kernel resources exhausted | mp_lock = 0101; cpuid = 1; lapic.id = 0100 | boot() called on cpu#1 Great ... but, how do I determine what 'resources' I need to increase to avoid that crash? I've tried increasing maxusers from 512-1024, but *if* that works, I imagine I'm raising a bunch of limits (and using memory) that I don't have to ... The server is a Dual-CPU PIII-1Ghz with 3Gig of RAM and ~3Gig of swap space right now ... the data drive is 5x18gig drives in a RAID5 configuration (hardware RAID, not vinum) ... You have more memory than you can allocate kernel memory to provide page table entries for. The only solution is to increase your kernel virtual address space size to accomodate the page mappings. How to do this varies widely by the version of FreeBSD you are using, and, unless you read NOTES and are running a recent -current, is not incredibly well documented, and requires an understanding of how the virtual address space is laid out and managed (which is also not well documented anywhere). Ya, this is the roadblock I'm hitting :( I'm running 4.5-STABLE here, as of this afternoon ... thoughts/suggestiosn based on that? Also, is there somethign that I can run to monitor this, similar to running netstat -m to watch nmbclusters? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message