Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
ow...@netptc.net put forth on 10/22/2010 8:15 PM: Actually Amdahl's Law IS a law of diminishing returns but is intended to be applied to hardware, not software. The usual application is to compute the degree to which adding another processor increases the processing power of the system Larry You are is absolutely incorrect. Amdahl's law is specific to algorithm scalability. It has little to do specifically with classic multiprocessing. Case in point: If one has a fairly heavy floating point application but it requires a specific scalar operation be performed in the loop along with every FP OP, say a counter increase of an integer register or similar, one could take this application from his/er 2 GHz single core x86 processor platform and run it on one processor of an NEC SX8 vector supercomputer system, which has a wide 8 pipe vector unit--16 Gflop/s peak vs 4 Gflop/s peak for the x86 chip. Zero scalability would be achieved, even though the floating point hardware is over 4 times more powerful. Note no additional processors were added. We simply moved the algorithm to a machine with a massively parallel vector FP unit. In this case it's even more interesting because the scalar unit in the SX8 runs at 1 GHz, even though the 8 pipe vector unit runs at 2 GHz. So, this floating point algorithm would actually run _slower_ on the SX8 due to the scalar component of the app limiting execution time due to the 1 GHz scalar unit. (This is typical of vector supercomputer processors--Cray did the same thing for years, running the vector units faster than the scalar units, because the vast bulk of the code run on these systems was truly, massively, floating point specific, with little scalar code.) This is the type of thing Gene Amdahl had in mind when postulating his theory, not necessarily multiprocessing specifically, but all forms or processing in which a portion of the algorithm could be broken up to run in parallel, regardless of what the parallel hardware might be. One of the few applications that can truly be nearly infinitely parallelized is graphics rendering. Note I said rendering, not geometry. When attempting to parallelize the geometry calculations in the 3D pipeline we run squarely into Amdahl's brick wall. This is why nVidia/AMD have severe problems getting multi GPU (SLI/Xfire) performance to scale anywhere close to linearly. It's impossible to take the 3D scene and split the geometry calculations evenly between GPUs, because vertices overlap across the portions of the frame buffer for which each GPU is responsible. Thus, for every overlapping vertice, it must be sent to both GPUs adjacent to the boundary. For this reason, adding multiple GPUs to a system yields a vastly diminishing return on investment. Each additional GPU creates one more frame buffer boundary. When you go from two screen regions to 3, you double the amount of geometry processing the middle GPU has to perform, because he now has two neighbor GPUs. The only scenario where 3 or 4 GPUs makes any kind of sense for ROI is with multiple monitors, at insanely high screen resolutions and color depths, with maximum AA/AF and multisampling. These operations are almost entirely raster ops, and as mentioned before, raster pixel operations can be nearly linearly scaled on parallel hardware. Again, Amdahl's law applies to algorithm scalability, not classic CPU multiprocessing. -- Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cc317a2.8010...@hardwarefreak.com
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Ron Johnson put forth on 10/22/2010 8:48 PM: Bah, humbug. Instead of a quad-core at lower GHz, I just got my wife a dual-core at higher speed. Not to mention the fact that for desktop use 2 higher clocked cores will yield faster application performance (think of the single threaded Flash hog and Slashdot jscript) than 4 lower freq cores. They also suck _far_ less power than a quad core (45-65w vs 95-115w avg AMD), and cost significantly less. Fewer cores equals _more_ performance for less money (purchase price and electrical $$)? What? Yep. :) I built a new desktop for the folks last year based on a 2.8 GHz Athlon II X2 (Regor). The CPU was something like $80 from Newegg. At the time the least expensive AMD quad core was between $150-200 IIRC and ran significantly hotter, 65w vs. 115w. It's running WinXP, FF, TB, etc, and they love it. So quiet you can't hear the fans, period, but it has great front/back airflow, none of that front, side, top, back fan idiocy--used an Apevia gamer case: http://www.newegg.com/Product/Product.aspx?Item=N82E16811144140 As with likely many folks here, for most servers I prefer a higher count of slower cores. My servers are all about multi user throughput--few single processes ever come close to eating up all of a core. If one process does decide to hog a core, other users don't suffer as they might on a dual core server, as there are 3 or 7 more available cores for the scheduler to make use of. -- Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cc31d5c.3020...@hardwarefreak.com
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Original Message From: s...@hardwarefreak.com To: debian-user@lists.debian.org Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32? Date: Sat, 23 Oct 2010 12:13:06 -0500 ow...@netptc.net put forth on 10/22/2010 8:15 PM: Actually Amdahl's Law IS a law of diminishing returns but is intended to be applied to hardware, not software. The usual application is to compute the degree to which adding another processor increases the processing power of the system Larry You are is absolutely incorrect. Amdahl's law is specific to algorithm scalability. It has little to do specifically with classic multiprocessing. Case in point: If one has a fairly heavy floating point application but it requires a specific scalar operation be performed in the loop along with every FP OP, say a counter increase of an integer register or similar, one could take this application from his/er 2 GHz single core x86 processor platform and run it on one processor of an NEC SX8 vector supercomputer system, which has a wide 8 pipe vector unit--16 Gflop/s peak vs 4 Gflop/s peak for the x86 chip. Zero scalability would be achieved, even though the floating point hardware is over 4 times more powerful. Note no additional processors were added. We simply moved the algorithm to a machine with a massively parallel vector FP unit. In this case it's even more interesting because the scalar unit in the SX8 runs at 1 GHz, even though the 8 pipe vector unit runs at 2 GHz. So, this floating point algorithm would actually run _slower_ on the SX8 due to the scalar component of the app limiting execution time due to the 1 GHz scalar unit. (This is typical of vector supercomputer processors--Cray did the same thing for years, running the vector units faster than the scalar units, because the vast bulk of the code run on these systems was truly, massively, floating point specific, with little scalar code.) This is the type of thing Gene Amdahl had in mind when postulating his theory, not necessarily multiprocessing specifically, but all forms or processing in which a portion of the algorithm could be broken up to run in parallel, regardless of what the parallel hardware might be. One of the few applications that can truly be nearly infinitely parallelized is graphics rendering. Note I said rendering, not geometry. When attempting to parallelize the geometry calculations in the 3D pipeline we run squarely into Amdahl's brick wall. This is why nVidia/AMD have severe problems getting multi GPU (SLI/Xfire) performance to scale anywhere close to linearly. It's impossible to take the 3D scene and split the geometry calculations evenly between GPUs, because vertices overlap across the portions of the frame buffer for which each GPU is responsible. Thus, for every overlapping vertice, it must be sent to both GPUs adjacent to the boundary. For this reason, adding multiple GPUs to a system yields a vastly diminishing return on investment. Each additional GPU creates one more frame buffer boundary. When you go from two screen regions to 3, you double the amount of geometry processing the middle GPU has to perform, because he now has two neighbor GPUs. The only scenario where 3 or 4 GPUs makes any kind of sense for ROI is with multiple monitors, at insanely high screen resolutions and color depths, with maximum AA/AF and multisampling. These operations are almost entirely raster ops, and as mentioned before, raster pixel operations can be nearly linearly scaled on parallel hardware. Again, Amdahl's law applies to algorithm scalability, not classic CPU multiprocessing. -- Stan Someone once said a text taken out of context is pretext. The original thread concentrated on the potential advantages of adding CPUs to improve performance and the apparent law of diminishing return. I was merely supporting that with the classic law which most certainly may be applied to coupled multiprocessing. Disagreements should be addressed to John Hennessy, author Computer Architecture a Quantitative Approach (out of which I teach), in care of the office of the President, Stanford University Larry -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cc317a2.8010...@hardwarefreak.com -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/380-2201010623231052...@netptc.net
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
On 10/22/2010 12:53 AM, Arthur Machlas wrote: On Thu, Oct 21, 2010 at 8:15 PM, Andrew Reidrei...@bellatlantic.net wrote: But I'm curious if anyone on the list knows the rationale for distributing kernels with this set to 32. Is that just a reasonable number that's never been updated? Or is there some complication that arises after 32 cores, and should I be more careful about tuning other parameters? I've always set the number of cores to exactly how many I have x2 when I roll my own, which on my puny systems is either 4 or 8. I seem to recall reading that there is a slight performance hit for every core you support. Correct. The amount of effort needed for cross-CPU communication, cache coherency and OS process coordination increases much more than linearly as you add CPUs. Crossbar communication (introduced first, I think, by DEC/Compaq in 2001) eliminated a lot of the latency in multi-CPU communications which plagues bus-based systems. AMD used a similar mesh in it's dual-core CPUs (not surprising, since many DEC engineer went to AMD). Harder to design, but much faster. Intel's first (and 2nd?) gen multi-core machines were bus-based; easier to design, quicker to get to market, but a lot slower. (OP's machine is certainly NUMA, where communication between cores on a chip is much faster than communication with cores on a different chip.) Or was it memory hit? Or was that a bong hit I'm thinking of? -- Seek truth from facts. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cc1369d.8070...@cox.net
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
On 2010-10-22 03:15 +0200, Andrew Reid wrote: I recently deployed some new many-core servers at work, with 48 cores each (4x 12 core AMD 6174s), and ran into an issue where the stock Debian kernel is compiled with CONFIG_NR_CPUS=32, meaning it will only use the first 32 cores that it sees. For the record, CONFIG_NR_CPUS has been increased to 512 (the maximum supported upstream) in Squeeze. For old Debian hands like me, this is an easy fix, I just built a new kernel configured for more cores, and it works just fine. But I'm curious if anyone on the list knows the rationale for distributing kernels with this set to 32. Is that just a reasonable number that's never been updated? Or is there some complication that arises after 32 cores, and should I be more careful about tuning other parameters? Basically, 32 is chosen a bit arbitrarily. But there are some problems with high values of CONFIG_NR_CPUS: - each supported CPU adds about eight kilobytes to the kernel image, wasting memory on most machines. - On Linux 2.6.28 (maybe all kernels prior to 2.6.29), module size blows up: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=516709. Sven -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87aam6ofol@turtle.gmx.de
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Ron Johnson put forth on 10/22/2010 2:00 AM: On 10/22/2010 12:53 AM, Arthur Machlas wrote: On Thu, Oct 21, 2010 at 8:15 PM, Andrew Reidrei...@bellatlantic.net wrote: But I'm curious if anyone on the list knows the rationale for distributing kernels with this set to 32. Is that just a reasonable number that's never been updated? Or is there some complication that arises after 32 cores, and should I be more careful about tuning other parameters? I've always set the number of cores to exactly how many I have x2 when I roll my own, which on my puny systems is either 4 or 8. I seem to recall reading that there is a slight performance hit for every core you support. Correct. The amount of effort needed for cross-CPU communication, cache coherency and OS process coordination increases much more than linearly as you add CPUs. All of these things but the scheduler, what you call process coordination, are invisible to the kernel for the most part and are irrelevant to the discussion of CONFIG_NR_CPUS. Crossbar communication (introduced first, I think, by DEC/Compaq in 2001) eliminated a lot of the latency in multi-CPU communications which plagues bus-based systems. Crossbar bus controllers have been around for over 30 years, first implemented by IBM in its mainframes in the late 70s IIRC. Many RISC/UNIX systems in the 90s implemented crossbar controllers, including Data General, HP, SGI, SUN, Unisys, etc. You refer to the Alpha 21364 processor introduced in the ES47/GS80/GS1280, which did not implement a crossbar for inter-socket communication. The 21364 implemented a NUMA interconnect based on a proprietary directory protocol for multiprocessor cache coherence. These circuits in NUMA machines are typically called routers, and, functionally, replace the crossbar of yore. AMD used a similar mesh in it's dual-core CPUs (not surprising, since many DEC engineer went to AMD). Harder to design, but much faster. You make it sound as if AMD _chose_ this design _over_ a shared bus. There never was such a choice to be made. Once you implement multiple cores on a single die you no longer have the option of using a shared bus such as GTL as the drive voltage is 3.3v, over double the voltages used within the die. By definition buses are _external_ to ICs, and connect ICs to one another. Buses aren't used within a die. Discrete data paths are. Intel's first (and 2nd?) gen multi-core machines were bus-based; easier to design, quicker to get to market, but a lot slower. This is because they weren't multi-core chips, but Multi Chip Modules, or MCMs: http://en.wikipedia.org/wiki/Multi-Chip_Module Communication between ICs within an MCM is external communication, thus a bus can be used, as well as NUMA which IBM uses in its pSeries (Power5/6/7) MCMs and Cray used on the X1 and X1E. (OP's machine is certainly NUMA, where communication between cores on a chip is much faster than communication with cores on a different chip.) At least you got this part correct Ron. ;) Back to the question of the thread, the answer, as someone else already stated, is that the only downside to setting CONFIG_NR_CPUS= to a value way above the number of physical cores in the machine is kernel footprint, but it's not very large given the memories of today's machines. Adding netfilter support will bloat the kernel footprint far more than setting CONFIG_NR_CPUS=256 when you only have 48 cores in the box. -- Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cc15e87.7050...@hardwarefreak.com
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Original Message From: ron.l.john...@cox.net To: debian-user@lists.debian.org Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32? Date: Fri, 22 Oct 2010 02:00:45 -0500 On 10/22/2010 12:53 AM, Arthur Machlas wrote: On Thu, Oct 21, 2010 at 8:15 PM, Andrew Reidrei...@bellatlantic.net wrote: But I'm curious if anyone on the list knows the rationale for distributing kernels with this set to 32. Is that just a reasonable number that's never been updated? Or is there some complication that arises after 32 cores, and should I be more careful about tuning other parameters? I've always set the number of cores to exactly how many I have x2 when I roll my own, which on my puny systems is either 4 or 8. I seem to recall reading that there is a slight performance hit for every core you support. Correct. The amount of effort needed for cross-CPU communication, cache coherency and OS process coordination increases much more than linearly as you add CPUs. In fact IIRC the additional overhead follows the square of the number of CPUs. I seem to recall this was called Amdahl's Law after Gene Amdahl of IBM (and later his own company) Larry Crossbar communication (introduced first, I think, by DEC/Compaq in 2001) eliminated a lot of the latency in multi-CPU communications which plagues bus-based systems. AMD used a similar mesh in it's dual-core CPUs (not surprising, since many DEC engineer went to AMD). Harder to design, but much faster. Intel's first (and 2nd?) gen multi-core machines were bus-based; easier to design, quicker to get to market, but a lot slower. (OP's machine is certainly NUMA, where communication between cores on a chip is much faster than communication with cores on a different chip.) Or was it memory hit? Or was that a bong hit I'm thinking of? -- Seek truth from facts. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cc1369d.8070...@cox.net -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/380-2201010522153419...@netptc.net
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
On 10/22/2010 10:34 AM, ow...@netptc.net wrote: Original Message From: ron.l.john...@cox.net To: debian-user@lists.debian.org Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32? Date: Fri, 22 Oct 2010 02:00:45 -0500 Correct. The amount of effort needed for cross-CPU communication, cache coherency and OS process coordination increases much more than linearly as you add CPUs. In fact IIRC the additional overhead follows the square of the number of CPUs. Maybe in brute-force implementations, but otherwise the machine would bog down after just a few CPUs. Note that h/w engineers and OS designers/writers have put a lot of work into minimizing the overhead and maximize the parallelism of extra CPUs. -- Seek truth from facts. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cc1cd87.9050...@cox.net
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Original Message From: ron.l.john...@cox.net To: debian-user@lists.debian.org Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32? Date: Fri, 22 Oct 2010 12:44:39 -0500 On 10/22/2010 10:34 AM, ow...@netptc.net wrote: Original Message From: ron.l.john...@cox.net To: debian-user@lists.debian.org Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32? Date: Fri, 22 Oct 2010 02:00:45 -0500 Correct. The amount of effort needed for cross-CPU communication, cache coherency and OS process coordination increases much more than linearly as you add CPUs. In fact IIRC the additional overhead follows the square of the number of CPUs. Ron et al See the following: http://en.wikipedia.org/wiki/Amdahl's_law Larry Maybe in brute-force implementations, but otherwise the machine would bog down after just a few CPUs. Note that h/w engineers and OS designers/writers have put a lot of work into minimizing the overhead and maximize the parallelism of extra CPUs. -- Seek truth from facts. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cc1cd87.9050...@cox.net -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/380-220101051845...@netptc.net
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
On Friday 22 October 2010 11:34:19 ow...@netptc.net wrote: In fact IIRC the additional overhead follows the square of the number of CPUs. I seem to recall this was called Amdahl's Law after Gene Amdahl of IBM (and later his own company) Either that's not it, or there's more than one Amdahl's law -- the oen I know is about diminishing returns from increasing effort to parallelize code. I don't know it in its pithy form, but the gist of it is that you can only parallelize *some* of your code, because all algorithms have a certain amount of set-up and tear-down overhead that's typically serial. Even if you perfectly parallelize the parallelizable part of the code, so it runs N times faster, your application as a whole will run something less than N times faster, and as N gets large, this serial offset contribution will come to dominate the execution time, at which point additional investments in parallelization are probably wasted. -- A. -- Andrew Reid / rei...@bellatlantic.net -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201010222005.49579.rei...@bellatlantic.net
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
ow...@netptc.net put forth on 10/22/2010 5:18 PM: Ron et al See the following: http://en.wikipedia.org/wiki/Amdahl's_law Larry Amdahl's law doesn't apply to capacity systems, only capability systems. Capacity systems are limited almost exclusively by memory, IPC/coherence, and I/O bandwidth, most often the last of the three. I think many folks forget about this 3rd rail of system performance when they go shopping for the latest/greatest high frequency multi-core processor. Even mainstream desktops all ship with at minimum a dual core CPU today, and a single 7.2K RPM disk. Those two cores combined have I/O bandwidth a few thousand times greater than the disk. Until the average system has I/O bandwidth that's much closer to parity with the CPU bandwidth, we don't really need to worry about Amdahl's law, or any other scalability laws. -- Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cc226f4.6040...@hardwarefreak.com
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
On Friday 22 October 2010 03:22:02 Sven Joachim wrote: On 2010-10-22 03:15 +0200, Andrew Reid wrote: I recently deployed some new many-core servers at work, with 48 cores each (4x 12 core AMD 6174s), and ran into an issue where the stock Debian kernel is compiled with CONFIG_NR_CPUS=32, meaning it will only use the first 32 cores that it sees. For the record, CONFIG_NR_CPUS has been increased to 512 (the maximum supported upstream) in Squeeze. This is good to know. When I built my own 2.6.26, I also noticed that the maximum value offered by the configurator was 256 -- we'll likely be seeing systems with that many cores within a few years, if current trends continue. Basically, 32 is chosen a bit arbitrarily. But there are some problems with high values of CONFIG_NR_CPUS: - each supported CPU adds about eight kilobytes to the kernel image, wasting memory on most machines. - On Linux 2.6.28 (maybe all kernels prior to 2.6.29), module size blows up: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=516709. -- A. -- Andrew Reid / rei...@bellatlantic.net -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201010222008.44192.rei...@bellatlantic.net
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Original Message From: rei...@bellatlantic.net To: debian-user@lists.debian.org Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32? Date: Fri, 22 Oct 2010 20:05:49 -0400 On Friday 22 October 2010 11:34:19 ow...@netptc.net wrote: In fact IIRC the additional overhead follows the square of the number of CPUs. I seem to recall this was called Amdahl's Law after Gene Amdahl of IBM (and later his own company) Either that's not it, or there's more than one Amdahl's law -- the oen I know is about diminishing returns from increasing effort to parallelize code. I don't know it in its pithy form, but the gist of it is that you can only parallelize *some* of your code, because all algorithms have a certain amount of set-up and tear-down overhead that's typically serial. Even if you perfectly parallelize the parallelizable part of the code, so it runs N times faster, your application as a whole will run something less than N times faster, and as N gets large, this serial offset contribution will come to dominate the execution time, at which point additional investments in parallelization are probably wasted. Actually Amdahl's Law IS a law of diminishing returns but is intended to be applied to hardware, not software. The usual application is to compute the degree to which adding another processor increases the processing power of the system Larry -- A. -- Andrew Reid / rei...@bellatlantic.net -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201010222005.49579.rei...@bellatlan tic.net -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/380-220101062311547...@netptc.net
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
On 10/22/2010 07:08 PM, Andrew Reid wrote: On Friday 22 October 2010 03:22:02 Sven Joachim wrote: On 2010-10-22 03:15 +0200, Andrew Reid wrote: I recently deployed some new many-core servers at work, with 48 cores each (4x 12 core AMD 6174s), and ran into an issue where the stock Debian kernel is compiled with CONFIG_NR_CPUS=32, meaning it will only use the first 32 cores that it sees. For the record, CONFIG_NR_CPUS has been increased to 512 (the maximum supported upstream) in Squeeze. This is good to know. When I built my own 2.6.26, I also noticed that the maximum value offered by the configurator was 256 -- we'll likely be seeing systems with that many cores within a few years, if current trends continue. Bah, humbug. Instead of a quad-core at lower GHz, I just got my wife a dual-core at higher speed. -- Seek truth from facts. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cc23f05.3070...@cox.net
Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
On Thu, Oct 21, 2010 at 8:15 PM, Andrew Reid rei...@bellatlantic.net wrote: But I'm curious if anyone on the list knows the rationale for distributing kernels with this set to 32. Is that just a reasonable number that's never been updated? Or is there some complication that arises after 32 cores, and should I be more careful about tuning other parameters? I've always set the number of cores to exactly how many I have x2 when I roll my own, which on my puny systems is either 4 or 8. I seem to recall reading that there is a slight performance hit for every core you support. Or was it memory hit? Or was that a bong hit I'm thinking of? I really can't remember, but I think it was in Greg's Linux Kernel In A Nutshell book, by O'Reilly, though free to download. This exchange seems appropriate now... Peter Griffin: It's true. I read it in a book somewhere Brian Griffin: Are you sure it was a book? Are you sure it wasn't... nothing? -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/aanlkti=vuzcd8znex5ywagcvtjevwqhwr_mf3c3l1...@mail.gmail.com