On Sun, 2005-12-25 at 09:34 +1100, O Plameras wrote:
> James Gregory wrote:
> 
> >On Wed, 2005-12-21 at 16:40 +1100, O Plameras wrote:
> >  
> >
> >>#define spin_lock_init(lock)    do { (void)(lock); } while(0)
> >>#define _raw_spin_lock(lock)    do { (void)(lock); } while(0)
> >> 
> >>I am reading this cited macros. This is very very clear.
> >>
> >>It is very very clear what it says, and, i.e., regardless of what
> >>is the value of 'lock' the kernel should do nothing.
> >>    
> >>
> >
> >That's exactly right. And if you compile without CONFIG_SMP, that's what
> >gets built into your kernel. You can get away with it because of the
> >clever way in which a CPU does one thing at a time; there is no "true"
> >parallelism.
> >
> 
> By the way, is it not true that 'pipelining'  that's a feature of x86 
> CPU's starting
> with i586 which I have pointed out in one of my previous post  is 
> (another name)
> implementation of 'parallel' processing ?

I suppose I wasn't entirely clear here. The precise kind of parallelism
I was referring to was concurrent threads of execution. The kind that
requires locking (as you may have inferred from the context).

Just about every CPU I can think of implements some form of
'pipelining', and it's good friends 'super-scalar, out-of-order
execution'. There is an important distinction between these
optimisations and using explicit threads: the former is invisible to the
programmer.

My statement wasn't entirely true, CPUs do many things simoultaneously,
and of import, they can perform a very limited number of instructions in
parallel. But, in addition to evaluating these numerous instructions
concurrently, they are also ensuring that at the other end, it all looks
like it happened serially, but very, very fast.

It's kinda interesting actually, the chip looks ahead some number of
instructions and will identify groups of instructions that don't have
dependencies on each others' outcomes. It then tries to fit those sets
of instructions to the various bits of silicon in the chip (for example,
the integer add unit and the floating-point divide unit will (almost
certainly) be different bits of silicon), those operations will be
performed concurrently and since they are all independent (no one of
this group depends on another's outcome), it won't make any difference.
That's where the all-important "instruction re-ordering" comes in; the
compiler must try to order instructions to make it possible for the CPU
to identify such groups of instructions.

Note however that isomorphic re-orderings of code will have no effect on
the outcome of executing it, so are somewhat irrelevant to a discussion
of locking.

> This means that more than one 
> instructions may be
> executed in one clock cycle.  This is implemented by using a bus 
> interface unit
> (BIU) and an execution unit. Experts on Intel Arch may confirm the 
> truthfullness or
> falsehood of this assertion. (I'm not an expert, I just know by 
> researching).

There's a heap of 'units', dude. Modern CPUs are insane.

> With pipelining,  the CPU overlaps instruction fetching and decoding with
> instruction execution, i.e., while one instruction is executing BIU is 
> fetching
> and decoding the next instruction. So, assuming you're willing to add 
> hardware
> you can execute more and more operations in parallel.

Yes and no. I suppose you could construct a chip whose explicit purpose
is "to pipeline", but that's not really why hardware engineers started
doing it. It's more about making the most of the silicon you've got. You
don't want bits of silicon (that cost money) to be sitting there idle.
Pipelining just lets you make sure that more of your silicon is busy
more of the time.

> So, in this way there is true parallelism in x86 arch.

I suppose it's a matter of interpretation. It was extremely intentional
that I chose to refer to the type of parallelism in UP-x86 as bereft of
"true parallelism". You can of course choose to interpret those words
however you will. Having made explicit exactly what level of parallelism
UP-x86 provides, the only really important thing to realise is that it
is not a sufficient level of parallelism as to have any effect on
locking. UP kernels don't crash because they don't have locking. They
would if they had what I have referred to as "true parallelism". I'll
happily adopt another term for it if you can suggest a better one.

Merrily,

James.


-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to