Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-23 Thread Stan Hoeppner
ow...@netptc.net put forth on 10/22/2010 8:15 PM:

 Actually Amdahl's Law IS a law of diminishing returns but is intended
 to be applied to hardware, not software.  The usual application is to
 compute the degree to which adding another processor increases the
 processing power of the system
 Larry

You are is absolutely incorrect.  Amdahl's law is specific to algorithm
scalability.  It has little to do specifically with classic
multiprocessing.  Case in point:

If one has a fairly heavy floating point application but it requires a
specific scalar operation be performed in the loop along with every FP
OP, say a counter increase of an integer register or similar, one could
take this application from his/er 2 GHz single core x86 processor
platform and run it on one processor of an NEC SX8 vector supercomputer
system, which has a wide 8 pipe vector unit--16 Gflop/s peak vs 4
Gflop/s peak for the x86 chip.

Zero scalability would be achieved, even though the floating point
hardware is over 4 times more powerful.  Note no additional processors
were added.  We simply moved the algorithm to a machine with a massively
parallel vector FP unit.  In this case it's even more interesting
because the scalar unit in the SX8 runs at 1 GHz, even though the 8 pipe
vector unit runs at 2 GHz.

So, this floating point algorithm would actually run _slower_ on the SX8
due to the scalar component of the app limiting execution time due to
the 1 GHz scalar unit.  (This is typical of vector supercomputer
processors--Cray did the same thing for years, running the vector units
faster than the scalar units, because the vast bulk of the code run on
these systems was truly, massively, floating point specific, with little
scalar code.)

This is the type of thing Gene Amdahl had in mind when postulating his
theory, not necessarily multiprocessing specifically, but all forms or
processing in which a portion of the algorithm could be broken up to run
in parallel, regardless of what the parallel hardware might be.  One of
the few applications that can truly be nearly infinitely parallelized is
graphics rendering.  Note I said rendering, not geometry.

When attempting to parallelize the geometry calculations in the 3D
pipeline we run squarely into Amdahl's brick wall.  This is why
nVidia/AMD have severe problems getting multi GPU (SLI/Xfire)
performance to scale anywhere close to linearly.  It's impossible to
take the 3D scene and split the geometry calculations evenly between
GPUs, because vertices overlap across the portions of the frame buffer
for which each GPU is responsible.  Thus, for every overlapping vertice,
it must be sent to both GPUs adjacent to the boundary.  For this reason,
adding multiple GPUs to a system yields a vastly diminishing return on
investment.  Each additional GPU creates one more frame buffer boundary.
 When you go from two screen regions to 3, you double the amount of
geometry processing the middle GPU has to perform, because he now has
two neighbor GPUs.

The only scenario where 3 or 4 GPUs makes any kind of sense for ROI is
with multiple monitors, at insanely high screen resolutions and color
depths, with maximum AA/AF and multisampling.  These operations are
almost entirely raster ops, and as mentioned before, raster pixel
operations can be nearly linearly scaled on parallel hardware.

Again, Amdahl's law applies to algorithm scalability, not classic CPU
multiprocessing.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4cc317a2.8010...@hardwarefreak.com



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-23 Thread Stan Hoeppner
Ron Johnson put forth on 10/22/2010 8:48 PM:

 Bah, humbug.
 
 Instead of a quad-core at lower GHz, I just got my wife a dual-core at
 higher speed.

Not to mention the fact that for desktop use 2 higher clocked cores will
yield faster application performance (think of the single threaded Flash
hog and Slashdot jscript) than 4 lower freq cores.  They also suck _far_
less power than a quad core (45-65w vs 95-115w avg AMD), and cost
significantly less.  Fewer cores equals _more_ performance for less
money (purchase price and electrical $$)?  What?  Yep. :)

I built a new desktop for the folks last year based on a 2.8 GHz Athlon
II X2 (Regor).  The CPU was something like $80 from Newegg.  At the time
the least expensive AMD quad core was between $150-200 IIRC and ran
significantly hotter, 65w vs. 115w.  It's running WinXP, FF, TB, etc,
and they love it.  So quiet you can't hear the fans, period, but it has
great front/back airflow, none of that front, side, top, back fan
idiocy--used an Apevia gamer case:
http://www.newegg.com/Product/Product.aspx?Item=N82E16811144140

As with likely many folks here, for most servers I prefer a higher count
of slower cores.  My servers are all about multi user throughput--few
single processes ever come close to eating up all of a core.  If one
process does decide to hog a core, other users don't suffer as they
might on a dual core server, as there are 3 or 7 more available cores
for the scheduler to make use of.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4cc31d5c.3020...@hardwarefreak.com



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-23 Thread owens



 Original Message 
From: s...@hardwarefreak.com
To: debian-user@lists.debian.org
Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Date: Sat, 23 Oct 2010 12:13:06 -0500

ow...@netptc.net put forth on 10/22/2010 8:15 PM:

 Actually Amdahl's Law IS a law of diminishing returns but is
intended
 to be applied to hardware, not software.  The usual application is
to
 compute the degree to which adding another processor increases the
 processing power of the system
 Larry

You are is absolutely incorrect.  Amdahl's law is specific to
algorithm
scalability.  It has little to do specifically with classic
multiprocessing.  Case in point:

If one has a fairly heavy floating point application but it requires
a
specific scalar operation be performed in the loop along with every
FP
OP, say a counter increase of an integer register or similar, one
could
take this application from his/er 2 GHz single core x86 processor
platform and run it on one processor of an NEC SX8 vector
supercomputer
system, which has a wide 8 pipe vector unit--16 Gflop/s peak vs 4
Gflop/s peak for the x86 chip.

Zero scalability would be achieved, even though the floating point
hardware is over 4 times more powerful.  Note no additional
processors
were added.  We simply moved the algorithm to a machine with a
massively
parallel vector FP unit.  In this case it's even more interesting
because the scalar unit in the SX8 runs at 1 GHz, even though the 8
pipe
vector unit runs at 2 GHz.

So, this floating point algorithm would actually run _slower_ on the
SX8
due to the scalar component of the app limiting execution time due
to
the 1 GHz scalar unit.  (This is typical of vector supercomputer
processors--Cray did the same thing for years, running the vector
units
faster than the scalar units, because the vast bulk of the code run
on
these systems was truly, massively, floating point specific, with
little
scalar code.)

This is the type of thing Gene Amdahl had in mind when postulating
his
theory, not necessarily multiprocessing specifically, but all forms
or
processing in which a portion of the algorithm could be broken up to
run
in parallel, regardless of what the parallel hardware might be.  One
of
the few applications that can truly be nearly infinitely
parallelized is
graphics rendering.  Note I said rendering, not geometry.

When attempting to parallelize the geometry calculations in the 3D
pipeline we run squarely into Amdahl's brick wall.  This is why
nVidia/AMD have severe problems getting multi GPU (SLI/Xfire)
performance to scale anywhere close to linearly.  It's impossible to
take the 3D scene and split the geometry calculations evenly between
GPUs, because vertices overlap across the portions of the frame
buffer
for which each GPU is responsible.  Thus, for every overlapping
vertice,
it must be sent to both GPUs adjacent to the boundary.  For this
reason,
adding multiple GPUs to a system yields a vastly diminishing return
on
investment.  Each additional GPU creates one more frame buffer
boundary.
 When you go from two screen regions to 3, you double the amount of
geometry processing the middle GPU has to perform, because he now
has
two neighbor GPUs.

The only scenario where 3 or 4 GPUs makes any kind of sense for ROI
is
with multiple monitors, at insanely high screen resolutions and
color
depths, with maximum AA/AF and multisampling.  These operations are
almost entirely raster ops, and as mentioned before, raster pixel
operations can be nearly linearly scaled on parallel hardware.

Again, Amdahl's law applies to algorithm scalability, not classic
CPU
multiprocessing.

-- 
Stan


Someone once said a text taken out of context is pretext.  The
original thread concentrated on the potential advantages of adding
CPUs to improve performance and the apparent law of diminishing
return.  I was merely supporting that with the classic law which most
certainly may be applied to coupled multiprocessing.  Disagreements
should be addressed to 
John Hennessy, author Computer Architecture a Quantitative Approach
(out of which I teach), in care of the office of the President,
Stanford University
Larry
-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact
listmas...@lists.debian.org
Archive: http://lists.debian.org/4cc317a2.8010...@hardwarefreak.com





--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/380-2201010623231052...@netptc.net



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread Ron Johnson

On 10/22/2010 12:53 AM, Arthur Machlas wrote:

On Thu, Oct 21, 2010 at 8:15 PM, Andrew Reidrei...@bellatlantic.net  wrote:

  But I'm curious if anyone on the list knows the rationale for
distributing kernels with this set to 32.  Is that just a
reasonable number that's never been updated?  Or is there some
complication that arises after 32 cores, and should I be more
careful about tuning other parameters?


I've always set the number of cores to exactly how many I have x2 when
I roll my own, which on my puny systems is either 4 or 8. I seem to
recall reading that there is a slight performance hit for every core
you support.


Correct.  The amount of effort needed for cross-CPU communication, 
cache coherency and OS process coordination increases much more than 
linearly as you add CPUs.


Crossbar communication (introduced first, I think, by DEC/Compaq in 
2001) eliminated a lot of the latency in multi-CPU communications 
which plagues bus-based systems.


AMD used a similar mesh in it's dual-core CPUs (not surprising, 
since many DEC engineer went to AMD).  Harder to design, but much 
faster.


Intel's first (and 2nd?) gen multi-core machines were bus-based; 
easier to design, quicker to get to market, but a lot slower.


(OP's machine is certainly NUMA, where communication between cores 
on a chip is much faster than communication with cores on a 
different chip.)



 Or was it memory hit? Or was that a bong hit I'm thinking
of?





--
Seek truth from facts.


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/4cc1369d.8070...@cox.net



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread Sven Joachim
On 2010-10-22 03:15 +0200, Andrew Reid wrote:

   I recently deployed some new many-core servers at work, with
 48 cores each (4x 12 core AMD 6174s), and ran into an issue where
 the stock Debian kernel is compiled with CONFIG_NR_CPUS=32,
 meaning it will only use the first 32 cores that it sees.

For the record, CONFIG_NR_CPUS has been increased to 512 (the maximum
supported upstream) in Squeeze.

   For old Debian hands like me, this is an easy fix, I just built 
 a new kernel configured for more cores, and it works just fine.

   But I'm curious if anyone on the list knows the rationale for
 distributing kernels with this set to 32.  Is that just a 
 reasonable number that's never been updated?  Or is there some
 complication that arises after 32 cores, and should I be more
 careful about tuning other parameters?

Basically, 32 is chosen a bit arbitrarily.  But there are some problems
with high values of CONFIG_NR_CPUS:

- each supported CPU adds about eight kilobytes to the kernel image,
  wasting memory on most machines.

- On Linux 2.6.28 (maybe all kernels prior to 2.6.29), module size blows
  up: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=516709.

Sven


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87aam6ofol@turtle.gmx.de



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread Stan Hoeppner
Ron Johnson put forth on 10/22/2010 2:00 AM:
 On 10/22/2010 12:53 AM, Arthur Machlas wrote:
 On Thu, Oct 21, 2010 at 8:15 PM, Andrew Reidrei...@bellatlantic.net 
 wrote:
   But I'm curious if anyone on the list knows the rationale for
 distributing kernels with this set to 32.  Is that just a
 reasonable number that's never been updated?  Or is there some
 complication that arises after 32 cores, and should I be more
 careful about tuning other parameters?

 I've always set the number of cores to exactly how many I have x2 when
 I roll my own, which on my puny systems is either 4 or 8. I seem to
 recall reading that there is a slight performance hit for every core
 you support.
 
 Correct.  The amount of effort needed for cross-CPU communication, cache
 coherency and OS process coordination increases much more than linearly
 as you add CPUs.

All of these things but the scheduler, what you call process
coordination, are invisible to the kernel for the most part and are
irrelevant to the discussion of CONFIG_NR_CPUS.

 Crossbar communication (introduced first, I think, by DEC/Compaq in
 2001) eliminated a lot of the latency in multi-CPU communications which
 plagues bus-based systems.

Crossbar bus controllers have been around for over 30 years, first
implemented by IBM in its mainframes in the late 70s IIRC.  Many
RISC/UNIX systems in the 90s implemented crossbar controllers, including
Data General, HP, SGI, SUN, Unisys, etc.

You refer to the Alpha 21364 processor introduced in the
ES47/GS80/GS1280, which did not implement a crossbar for inter-socket
communication.  The 21364 implemented a NUMA interconnect based on a
proprietary directory protocol for multiprocessor cache coherence.
These circuits in NUMA machines are typically called routers, and,
functionally, replace the crossbar of yore.

 AMD used a similar mesh in it's dual-core CPUs (not surprising, since
 many DEC engineer went to AMD).  Harder to design, but much faster.

You make it sound as if AMD _chose_ this design _over_ a shared bus.
There never was such a choice to be made.  Once you implement multiple
cores on a single die you no longer have the option of using a shared
bus such as GTL as the drive voltage is 3.3v, over double the voltages
used within the die.  By definition buses are _external_ to ICs, and
connect ICs to one another.  Buses aren't used within a die.  Discrete
data paths are.

 Intel's first (and 2nd?) gen multi-core machines were bus-based; easier
 to design, quicker to get to market, but a lot slower.

This is because they weren't multi-core chips, but Multi Chip Modules,
or MCMs:  http://en.wikipedia.org/wiki/Multi-Chip_Module  Communication
between ICs within an MCM is external communication, thus a bus can be
used, as well as NUMA which IBM uses in its pSeries (Power5/6/7) MCMs
and Cray used on the X1 and X1E.

 (OP's machine is certainly NUMA, where communication between cores on a
 chip is much faster than communication with cores on a different chip.)

At least you got this part correct Ron. ;)

Back to the question of the thread, the answer, as someone else already
stated, is that the only downside to setting CONFIG_NR_CPUS= to a value
way above the number of physical cores in the machine is kernel
footprint, but it's not very large given the memories of today's
machines.  Adding netfilter support will bloat the kernel footprint far
more than setting CONFIG_NR_CPUS=256 when you only have 48 cores in the box.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4cc15e87.7050...@hardwarefreak.com



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread owens



 Original Message 
From: ron.l.john...@cox.net
To: debian-user@lists.debian.org
Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Date: Fri, 22 Oct 2010 02:00:45 -0500

On 10/22/2010 12:53 AM, Arthur Machlas wrote:
 On Thu, Oct 21, 2010 at 8:15 PM, Andrew
Reidrei...@bellatlantic.net  wrote:
   But I'm curious if anyone on the list knows the rationale for
 distributing kernels with this set to 32.  Is that just a
 reasonable number that's never been updated?  Or is there some
 complication that arises after 32 cores, and should I be more
 careful about tuning other parameters?

 I've always set the number of cores to exactly how many I have x2
when
 I roll my own, which on my puny systems is either 4 or 8. I seem
to
 recall reading that there is a slight performance hit for every
core
 you support.

Correct.  The amount of effort needed for cross-CPU communication, 
cache coherency and OS process coordination increases much more than

linearly as you add CPUs.

In fact IIRC the additional overhead follows the square of the number
of CPUs.  I seem to recall this was called Amdahl's Law after Gene
Amdahl of IBM (and later his own company)
Larry


Crossbar communication (introduced first, I think, by DEC/Compaq in 
2001) eliminated a lot of the latency in multi-CPU communications 
which plagues bus-based systems.

AMD used a similar mesh in it's dual-core CPUs (not surprising, 
since many DEC engineer went to AMD).  Harder to design, but much 
faster.

Intel's first (and 2nd?) gen multi-core machines were bus-based; 
easier to design, quicker to get to market, but a lot slower.

(OP's machine is certainly NUMA, where communication between cores 
on a chip is much faster than communication with cores on a 
different chip.)

  Or was it memory hit? Or was that a bong hit I'm
thinking
 of?




-- 
Seek truth from facts.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact
listmas...@lists.debian.org
Archive: http://lists.debian.org/4cc1369d.8070...@cox.net





--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/380-2201010522153419...@netptc.net



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread Ron Johnson

On 10/22/2010 10:34 AM, ow...@netptc.net wrote:


 Original Message 
From: ron.l.john...@cox.net
To: debian-user@lists.debian.org
Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Date: Fri, 22 Oct 2010 02:00:45 -0500



Correct.  The amount of effort needed for cross-CPU communication,
cache coherency and OS process coordination increases much more than



linearly as you add CPUs.


In fact IIRC the additional overhead follows the square of the number
of CPUs.


Maybe in brute-force implementations, but otherwise the machine 
would bog down after just a few CPUs.


Note that h/w engineers and OS designers/writers have put a lot of 
work into minimizing the overhead and maximize the parallelism of 
extra CPUs.


--
Seek truth from facts.


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/4cc1cd87.9050...@cox.net



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread owens



 Original Message 
From: ron.l.john...@cox.net
To: debian-user@lists.debian.org
Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Date: Fri, 22 Oct 2010 12:44:39 -0500

On 10/22/2010 10:34 AM, ow...@netptc.net wrote:

  Original Message 
 From: ron.l.john...@cox.net
 To: debian-user@lists.debian.org
 Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
 Date: Fri, 22 Oct 2010 02:00:45 -0500


 Correct.  The amount of effort needed for cross-CPU
communication,
 cache coherency and OS process coordination increases much more
than

 linearly as you add CPUs.

 In fact IIRC the additional overhead follows the square of the
number
 of CPUs.

Ron et al
See the following:
http://en.wikipedia.org/wiki/Amdahl's_law
Larry
Maybe in brute-force implementations, but otherwise the machine 
would bog down after just a few CPUs.

Note that h/w engineers and OS designers/writers have put a lot of 
work into minimizing the overhead and maximize the parallelism of 
extra CPUs.

-- 
Seek truth from facts.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact
listmas...@lists.debian.org
Archive: http://lists.debian.org/4cc1cd87.9050...@cox.net





--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/380-220101051845...@netptc.net



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread Andrew Reid
On Friday 22 October 2010 11:34:19 ow...@netptc.net wrote:

 In fact IIRC the additional overhead follows the square of the number
 of CPUs.  I seem to recall this was called Amdahl's Law after Gene
 Amdahl of IBM (and later his own company)

  Either that's not it, or there's more than one Amdahl's law --
the oen I know is about diminishing returns from increasing effort
to parallelize code.  I don't know it in its pithy form, but
the gist of it is that you can only parallelize *some* of your
code, because all algorithms have a certain amount of set-up
and tear-down overhead that's typically serial.  Even if you
perfectly parallelize the parallelizable part of the code, 
so it runs N times faster, your application as a whole will
run something less than N times faster, and as N gets large,
this serial offset contribution will come to dominate the 
execution time, at which point additional investments in 
parallelization are probably wasted.

-- A.
-- 
Andrew Reid / rei...@bellatlantic.net


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201010222005.49579.rei...@bellatlantic.net



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread Stan Hoeppner
ow...@netptc.net put forth on 10/22/2010 5:18 PM:

 Ron et al
 See the following:
 http://en.wikipedia.org/wiki/Amdahl's_law
 Larry

Amdahl's law doesn't apply to capacity systems, only capability systems.
 Capacity systems are limited almost exclusively by memory,
IPC/coherence, and I/O bandwidth, most often the last of the three.

I think many folks forget about this 3rd rail of system performance
when they go shopping for the latest/greatest high frequency multi-core
processor.  Even mainstream desktops all ship with at minimum a dual
core CPU today, and a single 7.2K RPM disk.  Those two cores combined
have I/O bandwidth a few thousand times greater than the disk.

Until the average system has I/O bandwidth that's much closer to parity
with the CPU bandwidth, we don't really need to worry about Amdahl's
law, or any other scalability laws.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4cc226f4.6040...@hardwarefreak.com



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread Andrew Reid
On Friday 22 October 2010 03:22:02 Sven Joachim wrote:
 On 2010-10-22 03:15 +0200, Andrew Reid wrote:
I recently deployed some new many-core servers at work, with
  48 cores each (4x 12 core AMD 6174s), and ran into an issue where
  the stock Debian kernel is compiled with CONFIG_NR_CPUS=32,
  meaning it will only use the first 32 cores that it sees.

 For the record, CONFIG_NR_CPUS has been increased to 512 (the maximum
 supported upstream) in Squeeze.

  This is good to know.  When I built my own 2.6.26, I also
noticed that the maximum value offered by the configurator
was 256 -- we'll likely be seeing systems with that many cores
within a few years, if current trends continue.

 Basically, 32 is chosen a bit arbitrarily.  But there are some problems
 with high values of CONFIG_NR_CPUS:

 - each supported CPU adds about eight kilobytes to the kernel image,
   wasting memory on most machines.

 - On Linux 2.6.28 (maybe all kernels prior to 2.6.29), module size blows
   up: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=516709.

-- A.
-- 
Andrew Reid / rei...@bellatlantic.net


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201010222008.44192.rei...@bellatlantic.net



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread owens



 Original Message 
From: rei...@bellatlantic.net
To: debian-user@lists.debian.org
Subject: Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?
Date: Fri, 22 Oct 2010 20:05:49 -0400

On Friday 22 October 2010 11:34:19 ow...@netptc.net wrote:

 In fact IIRC the additional overhead follows the square of the
number
 of CPUs.  I seem to recall this was called Amdahl's Law after Gene
 Amdahl of IBM (and later his own company)

  Either that's not it, or there's more than one Amdahl's law --
the oen I know is about diminishing returns from increasing effort
to parallelize code.  I don't know it in its pithy form, but
the gist of it is that you can only parallelize *some* of your
code, because all algorithms have a certain amount of set-up
and tear-down overhead that's typically serial.  Even if you
perfectly parallelize the parallelizable part of the code, 
so it runs N times faster, your application as a whole will
run something less than N times faster, and as N gets large,
this serial offset contribution will come to dominate the 
execution time, at which point additional investments in 
parallelization are probably wasted.

Actually Amdahl's Law IS a law of diminishing returns but is intended
to be applied to hardware, not software.  The usual application is to
compute the degree to which adding another processor increases the
processing power of the system
Larry
  -- A.
-- 
Andrew Reid / rei...@bellatlantic.net


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact
listmas...@lists.debian.org
Archive: http://lists.debian.org/201010222005.49579.rei...@bellatlan
tic.net





--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/380-220101062311547...@netptc.net



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-22 Thread Ron Johnson

On 10/22/2010 07:08 PM, Andrew Reid wrote:

On Friday 22 October 2010 03:22:02 Sven Joachim wrote:

On 2010-10-22 03:15 +0200, Andrew Reid wrote:

   I recently deployed some new many-core servers at work, with
48 cores each (4x 12 core AMD 6174s), and ran into an issue where
the stock Debian kernel is compiled with CONFIG_NR_CPUS=32,
meaning it will only use the first 32 cores that it sees.


For the record, CONFIG_NR_CPUS has been increased to 512 (the maximum
supported upstream) in Squeeze.


   This is good to know.  When I built my own 2.6.26, I also
noticed that the maximum value offered by the configurator
was 256 -- we'll likely be seeing systems with that many cores
within a few years, if current trends continue.



Bah, humbug.

Instead of a quad-core at lower GHz, I just got my wife a dual-core 
at higher speed.


--
Seek truth from facts.


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/4cc23f05.3070...@cox.net



Re: Debian stock kernel config -- CONFIG_NR_CPUS=32?

2010-10-21 Thread Arthur Machlas
On Thu, Oct 21, 2010 at 8:15 PM, Andrew Reid rei...@bellatlantic.net wrote:
  But I'm curious if anyone on the list knows the rationale for
 distributing kernels with this set to 32.  Is that just a
 reasonable number that's never been updated?  Or is there some
 complication that arises after 32 cores, and should I be more
 careful about tuning other parameters?

I've always set the number of cores to exactly how many I have x2 when
I roll my own, which on my puny systems is either 4 or 8. I seem to
recall reading that there is a slight performance hit for every core
you support. Or was it memory hit? Or was that a bong hit I'm thinking
of?

I really can't remember, but I think it was in Greg's Linux Kernel In
A Nutshell book, by O'Reilly, though free to download.

This exchange seems appropriate now...
Peter Griffin: It's true. I read it in a book somewhere
Brian Griffin: Are you sure it was a book? Are you sure it wasn't... nothing?


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlkti=vuzcd8znex5ywagcvtjevwqhwr_mf3c3l1...@mail.gmail.com