Re: [PATCH v3] powerpc: Implement csum_ipv6_magic in assembly

2018-05-24 Thread Segher Boessenkool
On Thu, May 24, 2018 at 10:18:44AM +, Christophe Leroy wrote:
> On 05/24/2018 06:20 AM, Christophe LEROY wrote:
> >Le 23/05/2018 à 20:34, Segher Boessenkool a écrit :
> >>On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
> >>>The generic csum_ipv6_magic() generates a pretty bad result
> >>
> >>
> >>
> >>Please try with a more recent compiler, what you used is pretty ancient.
> >>It's not like recent compilers do great on this either, but it's not
> >>*that* bad anymore ;-)
> 
> Here is what I get with GCC 8.1
> It doesn't look much better, does it ?

There are no more mfocrf, which is a big speedup.  Other than that it is
pretty lousy still, I totally agree.  This improvement happened quite a
while ago, it's fixed in GCC 6.


Segher


Re: [PATCH v3] powerpc: Implement csum_ipv6_magic in assembly

2018-05-24 Thread Segher Boessenkool
On Thu, May 24, 2018 at 08:20:16AM +0200, Christophe LEROY wrote:
> Le 23/05/2018 à 20:34, Segher Boessenkool a écrit :
> >On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
> >>+_GLOBAL(csum_ipv6_magic)
> >>+   lwz r8, 0(r3)
> >>+   lwz r9, 4(r3)
> >>+   lwz r10, 8(r3)
> >>+   lwz r11, 12(r3)
> >>+   addcr0, r5, r6
> >>+   adder0, r0, r7
> >>+   adder0, r0, r8
> >>+   adder0, r0, r9
> >>+   adder0, r0, r10
> >>+   adder0, r0, r11
> >>+   lwz r8, 0(r4)
> >>+   lwz r9, 4(r4)
> >>+   lwz r10, 8(r4)
> >>+   lwz r11, 12(r4)
> >>+   adder0, r0, r8
> >>+   adder0, r0, r9
> >>+   adder0, r0, r10
> >>+   adder0, r0, r11
> >>+   addze   r0, r0
> >>+   rotlwi  r3, r0, 16
> >>+   add r3, r0, r3
> >>+   not r3, r3
> >>+   rlwinm  r3, r3, 16, 16, 31
> >>+   blr
> >>+EXPORT_SYMBOL(csum_ipv6_magic)
> >
> >Clustering the loads and carry insns together is pretty much the worst you
> >can do on most 32-bit CPUs.
> 
> Oh, really ? __csum_partial is written that way too.

I thought I told you about this before?  Maybe not.

> Right, now I tried interleaving the lwz and adde. I get no improvment at 
> all on a 885, but I get a 15% improvment on a 8321.

It won't likely help on single-issue cores (like the one 885 has), yes.


Segher


Re: [PATCH v3] powerpc: Implement csum_ipv6_magic in assembly

2018-05-23 Thread Segher Boessenkool
On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
> The generic csum_ipv6_magic() generates a pretty bad result



Please try with a more recent compiler, what you used is pretty ancient.
It's not like recent compilers do great on this either, but it's not
*that* bad anymore ;-)

> --- a/arch/powerpc/lib/checksum_32.S
> +++ b/arch/powerpc/lib/checksum_32.S
> @@ -293,3 +293,36 @@ dst_error:
>   EX_TABLE(51b, dst_error);
>  
>  EXPORT_SYMBOL(csum_partial_copy_generic)
> +
> +/*
> + * static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> + * const struct in6_addr *daddr,
> + * __u32 len, __u8 proto, __wsum sum)
> + */
> +
> +_GLOBAL(csum_ipv6_magic)
> + lwz r8, 0(r3)
> + lwz r9, 4(r3)
> + lwz r10, 8(r3)
> + lwz r11, 12(r3)
> + addcr0, r5, r6
> + adder0, r0, r7
> + adder0, r0, r8
> + adder0, r0, r9
> + adder0, r0, r10
> + adder0, r0, r11
> + lwz r8, 0(r4)
> + lwz r9, 4(r4)
> + lwz r10, 8(r4)
> + lwz r11, 12(r4)
> + adder0, r0, r8
> + adder0, r0, r9
> + adder0, r0, r10
> + adder0, r0, r11
> + addze   r0, r0
> + rotlwi  r3, r0, 16
> + add r3, r0, r3
> + not r3, r3
> + rlwinm  r3, r3, 16, 16, 31
> + blr
> +EXPORT_SYMBOL(csum_ipv6_magic)

Clustering the loads and carry insns together is pretty much the worst you
can do on most 32-bit CPUs.


Segher


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-21 Thread Segher Boessenkool

And no, RMW on MMIO isn't problematic at all, either.

An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode.  But three actual operations.


Maybe for some CPUs, but not all.  ARM for instance can't use the
load exclusive and store exclusive instructions to MMIO space.


Sure, your CPU doesn't have RMW instructions -- how to emulate
those if you don't have them is a totally different thing.


Let me say it more clearly: On ARM, it is impossible to perform atomic
operations on MMIO space.


It's all completely beside the point, see the other subthread, but...

Yeah, you can't do LL/SC to MMIO space; ARM isn't alone in that.
You could still implement atomic operations on MMIO space by taking
a lock elsewhere, in normal cacheable memory space.  Why you would
do this is a separate question, you probably don't want it :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-21 Thread Segher Boessenkool

Let me say it more clearly: On ARM, it is impossible to perform atomic
operations on MMIO space.


Actually, no one is suggesting that we try to do that at all.

The discussion about RMW ops on MMIO space started with a comment
attributed to the gcc developers that one reason why gcc on x86
doesn't use instructions that do RMW ops on volatile variables is that
volatile is used to mark MMIO addresses, and there was some
uncertainty about whether (non-atomic) RMW ops on x86 could be used on
MMIO.  This is in regard to the question about why gcc on x86 always
moves a volatile variable into a register before doing anything to it.


This question is GCC PR33102, which was incorrectly closed as a 
duplicate

of PR3506 -- and *that* PR was closed because its reporter seemed to
claim the GCC generated code for an increment on a volatile (namely, 
three

machine instructions: load, modify, store) was incorrect, and it has to
be one machine instruction.


So the whole discussion is irrelevant to ARM, PowerPC and any other
architecture except x86[-64].


And even there, it's not something the kernel can take advantage of
before GCC 4.4 is in widespread use, if then.  Let's move on.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-21 Thread Segher Boessenkool
At some point in the future, barrier() will be universally regarded as 
a hammer too big for most purposes.  Whether or not removing it now


You can't just remove it, it is needed in some places; you want to
replace it in most places with a more fine-grained compiler barrier,
I presume?

constitutes premature optimization is arguable, but I think we should 
allow such optimization to happen (or not happen) in 
architecture-dependent code, and provide a consistent API that doesn't 
require the use of such things in arch-independent code where it might 
turn into a totally superfluous performance killer depending on what 
hardware it gets compiled for.


Explicit barrier()s won't be too hard to replace -- but what to do
about the implicit barrier()s in rmb() etc. etc. -- *those* will be
hard to get rid of, if only because it is hard enough to teach driver
authors about how to use those primitives *already*.  It is far from
clear what a good interface like that would look like, anyway.

Probably we should first start experimenting with a forget()-style
micro-barrier (but please, find a better name), and see if a nice
usage pattern shows up that can be turned into an API.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-20 Thread Segher Boessenkool
Right. ROTFL... volatile actually breaks atomic_t instead of making 
it safe. x++ becomes a register load, increment and a register store. 
Without volatile we can increment the memory directly. It seems that 
volatile requires that the variable is loaded into a register first 
and then operated upon. Understandable when you think about volatile 
being used to access memory mapped I/O registers where a RMW 
operation could be problematic.


So, if we want consistent behavior, we're pretty much screwed unless 
we use inline assembler everywhere?


Nah, this whole argument is flawed -- without volatile we still
*cannot* increment the memory directly.  On x86, you need a lock
prefix; on other archs, some other mechanism to make the memory
increment an *atomic* memory increment.

And no, RMW on MMIO isn't problematic at all, either.

An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode.  But three actual operations.


The advantages of asm code for atomic_{read,set} are:
1) all the other atomic ops are implemented that way already;
2) you have full control over the asm insns selected, in particular,
   you can guarantee you *do* get an atomic op;
3) you don't need to use volatile data which generates
   not-all-that-good code on archs like x86, and we want to get rid
   of it anyway since it is problematic in many ways;
4) you don't need to use *(volatile type*)data, which a) doesn't
   exist in C; b) isn't documented or supported in GCC; c) has a recent
   history of bugginess; d) _still uses volatile objects_; e) _still_
   is problematic in almost all those same ways as in 3);
5) you can mix atomic and non-atomic accesses to the atomic_t, which
   you cannot with the other alternatives.

The only disadvantage I know of is potentially slightly worse
instruction scheduling.  This is a generic asm() problem: GCC
cannot see what actual insns are inside the asm() block.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-20 Thread Segher Boessenkool
Such code generally doesn't care precisely when it gets the update, 
just that the update is atomic, and it doesn't loop forever.


Yes, it _does_ care that it gets the update _at all_, and preferably
as early as possible.


Regardless, I'm convinced we just need to do it all in assembly.


So do you want volatile asm or plain asm, for atomic_read()?
The asm version has two ways to go about it too...


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-20 Thread Segher Boessenkool

And no, RMW on MMIO isn't problematic at all, either.

An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode.  But three actual operations.


Maybe for some CPUs, but not all.  ARM for instance can't use the
load exclusive and store exclusive instructions to MMIO space.


Sure, your CPU doesn't have RMW instructions -- how to emulate
those if you don't have them is a totally different thing.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew 
where they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.



So why is this a good tradeoff?

It certainly is better than disabling all compiler optimisations!


It's easy to be better than something really stupid :)


Sure, it wasn't me who made the comparison though.


So i386 and x86-64 don't have volatiles there, and it saves them a
few K of kernel text.


Which has to be investigated.  A few kB is a lot more than expected.


What you need to justify is why it is a good
tradeoff to make them volatile (which btw, is much harder to go
the other way after we let people make those assumptions).


My point is that people *already* made those assumptions.  There
are two ways to clean up this mess:

1) Have the volatile semantics by default, change the users
   that don't need it;
2) Have non-volatile semantics by default, change the users
   that do need it.

Option 2) randomly breaks stuff all over the place, option 1)
doesn't.  Yeah 1) could cause some extremely minor speed or
code size regression, but only temporarily until everything has
been audited.


I also think that just adding things to APIs in the hope it might fix
up some bugs isn't really a good road to go down. Where do you stop?

I look at it the other way: keeping the volatile semantics in
atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;


Yeah, but we could add lots of things to help prevent bugs and
would never be included. I would also contend that it helps _hide_
bugs and encourages people to be lazy when thinking about these
things.


Sure.  We aren't _adding_ anything here though, not on the platforms
where it is most likely to show up, anyway.


Also, you dismiss the fact that we'd actually be *adding* volatile
semantics back to the 2 most widely tested architectures (in terms
of test time, number of testers, variety of configurations, and
coverage of driver code).


I'm not dismissing that.  x86 however is one of the few architectures
where mistakenly leaving out a volatile will not easily show up on
user testing, since the compiler will very often produce a memory
reference anyway because it has no registers to play with.


This is a very important different from
just keeping volatile semantics because it is basically a one-way
API change.


That's a good point.  Maybe we should create _two_ new APIs, one
explicitly going each way.


certainly most people expect that behaviour, and also that behaviour
is *needed* in some places and no other interface provides that
functionality.


I don't know that most people would expect that behaviour.


I didn't conduct any formal poll either :-)


Is there any documentation anywhere that would suggest this?


Not really I think, no.  But not the other way around, either.
Most uses of it seem to expect it though.


[some confusion about barriers wrt atomics snipped]


What were you confused about?


Me?  Not much.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

Of course, since *normal* accesses aren't necessarily limited wrt
re-ordering, the question then becomes one of with regard to *what* 
does

it limit re-ordering?.

A C compiler that re-orders two different volatile accesses that have a
sequence point in between them is pretty clearly a buggy compiler. So 
at a

minimum, it limits re-ordering wrt other volatiles (assuming sequence
points exists). It also means that the compiler cannot move it
speculatively across conditionals, but other than that it's starting to
get fuzzy.


This is actually really well-defined in C, not fuzzy at all.
Volatile accesses are a side effect, and no side effects can
be reordered with respect to sequence points.  The side effects
that matter in the kernel environment are: 1) accessing a volatile
object; 2) modifying an object; 3) volatile asm(); 4) calling a
function that does any of these.

We certainly should avoid volatile whenever possible, but because
it's fuzzy wrt reordering is not a reason -- all alternatives have
exactly the same issues.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

(and yes, it is perfectly legitimate to
want a non-volatile read for a data type that you also want to do
atomic RMW operations on)


...which is undefined behaviour in C (and GCC) when that data is
declared volatile, which is a good argument against implementing
atomics that way in itself.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

In a reasonable world, gcc should just make that be (on x86)

addl $1,i(%rip)

on x86-64, which is indeed what it does without the volatile. But with 
the
volatile, the compiler gets really nervous, and doesn't dare do it in 
one

instruction, and thus generates crap like

movli(%rip), %eax
addl$1, %eax
movl%eax, i(%rip)

instead. For no good reason, except that volatile just doesn't have 
any
good/clear semantics for the compiler, so most compilers will just 
make it

be I will not touch this access in any way, shape, or form. Including
even trivially correct instruction optimization/combination.


It's just a (target-specific, perhaps) missed-optimisation kind
of bug in GCC.  Care to file a bug report?


but is
(again) something that gcc doesn't dare do, since i is volatile.


Just nobody taught it it can do this; perhaps no one wanted to
add optimisations like that, maybe with a reasoning like people
who hit the go-slow-in-unspecified-ways button should get what
they deserve ;-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

Now the second wording *IS* technically correct, but come on, it's
24 words long whereas the original one was 3 -- and hopefully anybody
reading the shorter phrase *would* have known anyway what was meant,
without having to be pedantic about it :-)


Well you were talking pretty formal (and detailed) stuff, so
IMHO it's good to have that exactly correct.  Sure it's nicer
to use small words most of the time :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

#define forget(a)   __asm__ __volatile__ ( :=m (a) :m (a))

[ This is exactly equivalent to using +m in the constraints, as 
recently
  explained on a GCC list somewhere, in response to the patch in my 
bitops

  series a few weeks back where I thought +m was bogus. ]


[It wasn't explained on a GCC list in response to your patch, as
far as I can see -- if I missed it, please point me to an archived
version of it].

One last time: it isn't equivalent on older (but still supported
by Linux) versions of GCC.  Current versions of GCC allow it, but
it has no documented behaviour at all, so use it at your own risk.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Here, I should obviously admit that the semantics of *(volatile int 
*)
aren't any neater or well-defined in the _language standard_ at all. 
The

standard does say (verbatim) precisely what constitutes as access to
object of volatile-qualified type is implementation-defined, but GCC
does help us out here by doing the right thing.


Where do you get that idea?


Try a testcase (experimentally verify).


That doesn't prove anything.  Experiments can only disprove
things.


GCC manual, section 6.1, When
is a Volatile Object Accessed? doesn't say anything of the
kind.


True, implementation-defined as per the C standard _is_ supposed to 
mean
unspecified behaviour where each implementation documents how the 
choice

is made. So ok, probably GCC isn't documenting this
implementation-defined behaviour which it is supposed to, but can't 
really

fault them much for this, probably.


GCC _is_ documenting this, namely in this section 6.1.  It doesn't
mention volatile-casted stuff.  Draw your own conclusions.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

#define forget(a)   __asm__ __volatile__ ( :=m (a) :m (a))

[ This is exactly equivalent to using +m in the constraints, as 
recently
  explained on a GCC list somewhere, in response to the patch in my 
bitops

  series a few weeks back where I thought +m was bogus. ]


[It wasn't explained on a GCC list in response to your patch, as
far as I can see -- if I missed it, please point me to an archived
version of it].


http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html


Ah yes, that old thread, thank you.


That's when _I_ came to know how GCC interprets +m, but probably
this has been explained on those lists multiple times. Who cares,
anyway?


I just couldn't find the thread you meant, I thought I missed
have it, that's all :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

atomic_dec() writes
to memory, so it _does_ have volatile semantics, implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


I don't think an atomic_dec() implemented as an inline asm volatile
or one that uses a forget macro would have the same re-ordering
guarantees as an atomic_dec() that uses a volatile access cast.


The asm volatile implementation does have exactly the same
reordering guarantees as the volatile cast thing,


I don't think so.


asm volatile creates a side effect.  Side effects aren't
allowed to be reordered wrt sequence points.  This is exactly
the same reason as why volatile accesses cannot be reordered.


if that is
implemented by GCC in the obvious way.  Even a plain asm()
will do the same.


Read the relevant GCC documentation.


I did, yes.


[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]


There was (and is) nothing wrong about the +m documentation, if
that is what you are talking about.  It could be extended now, to
allow +m -- but that takes more than just fixing the documentation.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
No it does not have any volatile semantics. atomic_dec() can be 
reordered
at will by the compiler within the current basic unit if you do not 
add a

barrier.


volatile has nothing to do with reordering.


If you're talking of volatile the type-qualifier keyword, then
http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
otherwise.


I'm not sure what in that mail you mean, but anyway...

Yes, of course, the fact that volatile creates a side effect
prevents certain things from being reordered wrt the atomic_dec();
but the atomic_dec() has a side effect *already* so the volatile
doesn't change anything.


atomic_dec() writes
to memory, so it _does_ have volatile semantics, implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


I don't think an atomic_dec() implemented as an inline asm volatile
or one that uses a forget macro would have the same re-ordering
guarantees as an atomic_dec() that uses a volatile access cast.


The asm volatile implementation does have exactly the same
reordering guarantees as the volatile cast thing, if that is
implemented by GCC in the obvious way.  Even a plain asm()
will do the same.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

The asm volatile implementation does have exactly the same
reordering guarantees as the volatile cast thing,


I don't think so.


asm volatile creates a side effect.


Yeah.


Side effects aren't
allowed to be reordered wrt sequence points.


Yeah.


This is exactly
the same reason as why volatile accesses cannot be reordered.


No, the code in that sub-thread I earlier pointed you at *WAS* written
such that there was a sequence point after all the uses of that 
volatile

access cast macro, and _therefore_ we were safe from re-ordering
(behaviour guaranteed by the C standard).


And exactly the same is true for the asm version.

Now, one cannot fantasize that volatile asms are also sequence 
points.


Sure you can do that.  I don't though.


In fact such an argument would be sadly mistaken, because sequence
points are defined by the C standard and it'd be horribly wrong to
even _try_ claiming that the C standard knows about volatile asms.


That's nonsense.  GCC can extend the C standard any way they
bloody well please -- witness the fact that they added an
extra class of side effects...


Read the relevant GCC documentation.


I did, yes.


No, you didn't read:

http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Read the bit about the need for artificial dependencies, and the 
example

given there:

asm volatile(mtfsf 255,%0 : : f (fpenv));
sum = x + y;

The docs explicitly say the addition can be moved before the volatile
asm. Hopefully, as you know, (x + y) is an C expression and hence
a sequence point as defined by the standard.


The _end of a full expression_ is a sequence point, not every
expression.  And that is irrelevant here anyway.

It is perfectly fine to compute x+y any time before the
assignment; the C compiler is allowed to compute it _after_
the assignment even, if it could figure out how ;-)

x+y does not contain a side effect, you know.


I know there is also stuff written about side-effects there which
_could_ give the same semantic w.r.t. sequence points as the volatile
access casts,


s/could/does/


but hey, it's GCC's own documentation, you obviously can't
find fault with _me_ if there's wrong stuff written in there. Say that
to GCC ...


There's nothing wrong there.


See, volatile C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs).


If you find any problems/shortcomings in the GCC documentation,
please file a PR, don't go whine on some unrelated mailing lists.
Thank you.


Semantics of volatile as applies to inline
asm, OTOH? You're completely relying on the compiler for that ...


Yes, and?  GCC promises the behaviour it has documented.


[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]


There was (and is) nothing wrong about the +m documentation, if
that is what you are talking about.  It could be extended now, to
allow +m -- but that takes more than just fixing the 
documentation.


No, there was (and is) _everything_ wrong about the + documentation 
as

applies to memory-constrained operands. I don't give a whit if it's
some workaround in their gimplifier, or the other, that makes it 
possible

to use +m (like the current kernel code does). The docs suggest
otherwise, so there's obviously a clear disconnect between the docs and
actual GCC behaviour.


The documentation simply doesn't say +m is allowed.  The code to
allow it was added for the benefit of people who do not read the
documentation.  Documentation for +m might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly obvious).

[ You seem to often take issue with _amazingly_ petty and pedantic 
things,

  by the way :-) ]


If you're talking details, you better get them right.  Handwaving is
fine with me as long as you're not purporting you're not.

And I simply cannot stand false assertions.

You can always ignore me if _you_ take issue with _that_ :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

The documentation simply doesn't say +m is allowed.  The code to
allow it was added for the benefit of people who do not read the
documentation.  Documentation for +m might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly obvious).


Huh?

If the (current) documentation doesn't match up with the (current)
code, then _at least one_ of them has to be (as of current) wrong.

I wonder how could you even try to disagree with that.


Easy.

The GCC documentation you're referring to is the user's manual.
See the blurb on the first page:

This manual documents how to use the GNU compilers, as well as their
features and incompatibilities, and how to report bugs.  It corresponds
to GCC version 4.3.0.  The internals of the GNU compilers, including
how to port them to new targets and some information about how to write
front ends for new languages, are documented in a separate manual.

_How to use_.  This documentation doesn't describe in minute detail
everything the compiler does (see the source code for that -- no, it
isn't described in the internals manual either).

If it doesn't tell you how to use +m, and even tells you _not_ to
use it, maybe that is what it means to say?  It doesn't mean +m
doesn't actually do something.  It also doesn't mean it does what
you think it should do.  It might do just that of course.  But treating
writing C code as an empirical science isn't such a smart idea.


And I didn't go whining about this ... you asked me. (I think I'd said
something to the effect of GCC docs are often wrong,


No need to guess at what you said, even if you managed to delete
your own mail already, there are plenty of free web-based archives
around.  You said:


See, volatile C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs).


and that to me reads as complaining that the ISO C standard isn't
very good and that the GCC documentation is 10**5482 times worse
even.  Which of course is hyperbole and cannot be true.  It also
isn't helpful in any way or form for anyone on this list.  I call
that whining.


which is true,


Yes, documentation of that size often has shortcomings.  No surprise
there.  However, great effort is made to make it better documentation,
and especially to keep it up to date; if you find any errors or
omissions, please report them.  There are many ways how to do that,
see the GCC homepage./end-of-marketing-blurb


but probably you feel saying that is not allowed on non-gcc lists?)


You're allowed to say whatever you want.  Let's have a quote again
shall we?  I said:


If you find any problems/shortcomings in the GCC documentation,
please file a PR, don't go whine on some unrelated mailing lists.
Thank you.


I read that as a friendly request, not a prohibition.  Well maybe
not actually friendly, more a bit angry.  A request, either way.


As for the PR


Problem report, a bugzilla ticket.  Sorry for using terminology
unknown to you.


you're requesting me to file with GCC for this, that
gcc-patches@ thread did precisely that


Actually not -- PRs make sure issues aren't forgotten (although
they might gather dust, sure).  But yes, submitting patches is a
Great Thing(tm).


and more (submitted a patch to
said documentation -- and no, saying documentation might get added
later is totally bogus and nonsensical -- documentation exists to
document current behaviour, not past).


When code like you want to write becomes a supported feature, that
will be reflected in the user manual.  It is completely nonsensical
to expect everything that is *not* a supported feature to be mentioned
there.


I wouldn't have replied, really, if you weren't so provoking.


Hey, maybe that character trait is good for something, then.
Now to build a business plan around it...


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew where 
they



By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.

Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.


So why is this a good tradeoff?


It certainly is better than disabling all compiler optimisations!


I also think that just adding things to APIs in the hope it might fix
up some bugs isn't really a good road to go down. Where do you stop?


I look at it the other way: keeping the volatile semantics in
atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;
certainly most people expect that behaviour, and also that behaviour
is *needed* in some places and no other interface provides that
functionality.


[some confusion about barriers wrt atomics snipped]


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler do not
remove this asm even if you don't need any of its outputs.

It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an asm
that needs it simply _is_ a bug :-)


Yep.  And the reason it is a bug is that it fails to disable
the relevant compiler optimizations.  So I suspect that we might
actually be saying the same thing here.


We're not saying the same thing, but we do agree :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

I'd go so far as to say that anywhere where you want a non-volatile
atomic_read, either your code is buggy, or else an int would work just
as well.


Even, the only way to implement a non-volatile atomic_read() is
essentially as a plain int (you can do some tricks so you cannot
assign to the result and stuff like that, but that's not the issue
here).

So if that would be the behaviour we wanted, just get rid of that
whole atomic_read() thing, so no one can misuse it anymore.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool
I can't speak for this particular case, but there could be similar 
code

examples elsewhere, where we do the atomic ops on an atomic_t object
inside a higher-level locking scheme that would take care of the kind 
of
problem you're referring to here. It would be useful for such or 
similar
code if the compiler kept the value of that atomic object in a 
register.


If there is a higher-level locking scheme then there is no point to
using atomic_t variables.  Atomic_t is specifically for the situation
where multiple CPUs are updating a variable without locking.


And don't forget about the case where it is an I/O device that is
updating the memory (in buffer descriptors or similar).  The driver
needs to do a volatile atomic read to get at the most recent version
of that data, which can be important for optimising latency (or 
throughput

even).  There is no other way the kernel can get that info -- doing an
MMIO read is way way too expensive.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

Note that volatile
is a type-qualifier, not a type itself, so a cast of the _object_ 
itself
to a qualified-type i.e. (volatile int) would not make the access 
itself

volatile-qualified.


There is no such thing as volatile-qualified access defined
anywhere; there only is the concept of a volatile-qualified
*object*.

To serve our purposes, it is necessary for us to take the address of 
this
(non-volatile) object, cast the resulting _pointer_ to the 
corresponding
volatile-qualified pointer-type, and then dereference it. This makes 
that
particular _access_ be volatile-qualified, without the object itself 
being
such. Also note that the (dereferenced) result is also a valid lvalue 
and

hence can be used in *(volatile int *)a = b; kind of construction
(which we use for the atomic_set case).


There is a quite convincing argument that such an access _is_ an
access to a volatile object; see GCC PR21568 comment #9.  This
probably isn't the last word on the matter though...


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

Here, I should obviously admit that the semantics of *(volatile int *)
aren't any neater or well-defined in the _language standard_ at all. 
The

standard does say (verbatim) precisely what constitutes as access to
object of volatile-qualified type is implementation-defined, but GCC
does help us out here by doing the right thing.


Where do you get that idea?  GCC manual, section 6.1, When
is a Volatile Object Accessed? doesn't say anything of the
kind.  PR33053 and some others.


Honestly, given such confusion, and the propensity of the volatile
type-qualifier keyword to be ill-defined (or at least poorly 
understood,
often inconsistently implemented), I'd (again) express my opinion that 
it

would be best to avoid its usage, given other alternatives do exist.


Yeah.  Or we can have an email thread like this every time
someone proposes a patch that uses an atomic variable ;-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

Volatile behaviour itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms),


It should be consistent across platforms; if not, file a bug please.


but it is /expected/ to mean something like: ensure that
every such access actually goes all the way to memory, and is not
re-ordered w.r.t. to other accesses, as far as the compiler can take
care of these. The last as far as compiler can take care disclaimer
comes about due to CPUs doing their own re-ordering nowadays.


You can *expect* whatever you want, but this isn't in line with
reality at all.

volatile _does not_ make accesses go all the way to memory.
volatile _does not_ prevent reordering wrt other accesses.

What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point.  This does not mean other accesses cannot be
reordered wrt the volatile access.

If the abstract machine would do an access to a volatile-
qualified object, the generated machine code will do that
access too.  But, for example, it can still be optimised
away by the compiler, if it can prove it is allowed to.

If you want stuff to go all the way to memory, you need some
architecture-specific flush sequence; to make a store globally
visible before another store, you need mb(); before some following
read, you need mb(); to prevent reordering you need a barrier.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool
Well if there is only one memory location involved, then smp_rmb() 
isn't

going to really do anything anyway, so it would be incorrect to use it.


rmb() orders *any* two reads; that includes two reads from the same
location.


Consider that smp_rmb basically will do anything from flushing the
pipeline to invalidating loads speculatively executed out of order. 
AFAIK
it will not control the visibility of stores coming from other CPUs 
(that

is up to the cache coherency).


The writer side should typically use wmb() whenever the reader side
uses rmb(), sure.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use 
it.


rmb() orders *any* two reads; that includes two reads from the same
location.


If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.


That's true of course, although there is no real guarantee for that.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use
it.


rmb() orders *any* two reads; that includes two reads from the same
location.


If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.


That's true of course, although there is no real guarantee for that.


A CPU that did not provide this property (cache coherence) would be
most emphatically reviled.


That doesn't have anything to do with coherency as far as I can see.

It's just about the order in which a CPU (speculatively) performs the 
loads

(which isn't necessarily the same as the order in which it executes the
corresponding instructions, even).


So we are pretty safe assuming that CPUs
will provide it.


Yeah, pretty safe.  I just don't like undocumented assumptions :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

How does the compiler know that msleep() has got barrier()s?


Because msleep_interruptible() is in a separate compilation unit,
the compiler has to assume that it might modify any arbitrary global.


No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.

What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)

In many cases, the compiler also has to assume that 
msleep_interruptible()

might call back into a function in the current compilation unit, thus
possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
I think this was just terminology confusion here again. Isn't any 
code

that it cannot currently see the same as another compilation unit,
and wouldn't the compilation unit itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
definition for compilation unit (in gcc lingo, possibly?)


This is indeed my understanding -- compilation unit is whatever the
compiler looks at in one go.  I have heard the word module used for
the minimal compilation unit covering a single .c file and everything
that it #includes, but there might be a better name for this.


Yes, that's what's called compilation unit :-)

[/me double checks]

Erm, the C standard actually calls it translation unit.

To be exact, to avoid any more confusion:

5.1.1.1/1:
A C program need not all be translated at the same time. The
text of the program is kept in units called source files, (or
preprocessing files) in this International Standard. A source
file together with all the headers and source files included
via the preprocessing directive #include is known as a
preprocessing translation unit. After preprocessing, a
preprocessing translation unit is called a translation unit.



Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point.  This does not mean other accesses cannot be
reordered wrt the volatile access.

If the abstract machine would do an access to a volatile-
qualified object, the generated machine code will do that
access too.  But, for example, it can still be optimised
away by the compiler, if it can prove it is allowed to.


As (now) indicated above, I had meant multiple volatile accesses to
the same object, obviously.


Yes, accesses to volatile objects are never reordered with
respect to each other.


BTW:

#define atomic_read(a)  (*(volatile int *)(a))
#define atomic_set(a,i) (*(volatile int *)(a) = (i))

int a;

void func(void)
{
int b;

b = atomic_read(a);
atomic_set(a, 20);
b = atomic_read(a);
}

gives:

func:
pushl   %ebp
movla, %eax
movl%esp, %ebp
movl$20, a
movla, %eax
popl%ebp
ret

so the first atomic_read() wasn't optimized away.


Of course.  It is executed by the abstract machine, so
it will be executed by the actual machine.  On the other
hand, try

b = 0;
if (b)
b = atomic_read(a);

or similar.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)


I think this was just terminology confusion here again. Isn't any code
that it cannot currently see the same as another compilation unit,


It is not; try  gcc -combine  or the upcoming link-time optimisation
stuff, for example.


and wouldn't the compilation unit itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
definition for compilation unit (in gcc lingo, possibly?)


compilation unit is a C standard term.  It typically boils down
to single .c file.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Of course, if we find there are more callers in the kernel who want 
the

volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions 
to

somethine else, say atomic_read_nonvolatile for all I care.


Do we really need another set of APIs?


Well, if there's one set of users who do care about volatile behaviour,
and another set that doesn't, it only sounds correct to provide both
those APIs, instead of forcing one behaviour on all users.


But since there currently is only one such API, and there are
users expecting the stronger behaviour, the only sane thing to
do is let the API provide that behaviour.  You can always add
a new API with weaker behaviour later, and move users that are
okay with it over to that new API.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.


Last I checked, the Linux kernel build system did compile each .c file
as a separate compilation unit.


I have some patches to use -combine -fwhole-program for Linux.
Highly experimental, you need a patched bleeding edge toolchain.
If there's interest I'll clean it up and put it online.

David Woodhouse had some similar patches about a year ago.


In many cases, the compiler also has to assume that
msleep_interruptible()
might call back into a function in the current compilation unit, thus
possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Yep.  For example, if it knows the current value of a given such local
variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.


Or the most common thing: if neither the address of the translation-
unit local variable nor the address of any function writing to that
variable can escape from that translation unit, nothing outside
the translation unit can write to the variable.

At least given that gcc doesn't know about multiple threads of 
execution!


Heh, only about the threads it creates itself (not relevant to
the kernel, for sure :-) )


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Possibly these were too trivial to expose any potential problems that 
you
may have been referring to, so would be helpful if you could write a 
more

concrete example / sample code.


The trick is to have a sufficiently complicated expression to force
the compiler to run out of registers.


You can use -ffixed-XXX to keep the testcase simple.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Please check the definition of cache coherence.


Which of the twelve thousand such definitions?  :-)

Summary: the CPU is indeed within its rights to execute loads and 
stores
to a single variable out of order, -but- only if it gets the same 
result
that it would have obtained by executing them in order.  Which means 
that

any reordering of accesses by a single CPU to a single variable will be
invisible to the software.


I'm still not sure if that applies to all architectures.
Doesn't matter anyway, let's kill this thread :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.


Last I checked, the Linux kernel build system did compile each .c 
file

as a separate compilation unit.


I have some patches to use -combine -fwhole-program for Linux.
Highly experimental, you need a patched bleeding edge toolchain.
If there's interest I'll clean it up and put it online.

David Woodhouse had some similar patches about a year ago.


Sounds exciting...  ;-)


Yeah, the breakage is *quite* spectacular :-)


In many cases, the compiler also has to assume that
msleep_interruptible()
might call back into a function in the current compilation unit, 
thus

possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Yep.  For example, if it knows the current value of a given such 
local

variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.


Or the most common thing: if neither the address of the translation-
unit local variable nor the address of any function writing to that
variable can escape from that translation unit, nothing outside
the translation unit can write to the variable.


But there is usually at least one externally callable function in
a .c file.


Of course, but often none of those will (indirectly) write a certain
static variable.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew where 
they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew where 
they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Precisely the point -- use of volatile (whether in casts or on asms)
in these cases are intended to disable those optimizations likely to
result in heisenbugs.


The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler do not
remove this asm even if you don't need any of its outputs.

It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an asm
that needs it simply _is_ a bug :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re:

2007-08-15 Thread Segher Boessenkool

compilation unit is a C standard term.  It typically boils down
to single .c file.


As you mentioned later, single .c file with all the other files 
(headers
or other .c files) that it pulls in via #include is actually 
translation

unit, both in the C standard as well as gcc docs.


Yeah.  single .c file after preprocessing.  Same thing :-)


Compilation unit
doesn't seem to be nearly as standard a term, though in most places it
is indeed meant to be same as translation unit, but with the new gcc
inter-module-analysis stuff that you referred to above, I suspect one 
may
reasonably want to call a compilation unit as all that the compiler 
sees

at a given instant.


That would be a bit confusing, would it not?  They'd better find
some better name for that if they want to name it at all (remember,
none of these optimisations should have any effect on the semantics
of the program, you just get fewer .o files etc.).


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

A volatile default would disable optimizations for atomic_read.
atomic_read without volatile would allow for full optimization by the
compiler. Seems that this is what one wants in many cases.


Name one such case.

An atomic_read should do a load from memory.  If the programmer puts
an atomic_read() in the code then the compiler should emit a load for
it, not re-use a value returned by a previous atomic_read.  I do not
believe it would ever be useful for the compiler to collapse two
atomic_read statements into a single load.


An atomic_read() implemented as a normal C variable read would
allow that read to be combined with another normal read from
that variable.  This could perhaps be marginally useful, although
I'd bet you cannot see it unless counting cycles on a simulator
or counting bits in the binary size.

With an asm() implementation, the compiler can not do this; with
a volatile implementation (either volatile variable or volatile-cast),
this invokes undefined behaviour (in both C and GCC).


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-12 Thread Segher Boessenkool

Note that last line.


Segher, how about you just accept that Linux uses gcc as per reality, 
and

that sometimes the reality is different from your expectations?

+m works.


It works _most of the time_.  Ask Martin.  Oh you don't even have to,
he told you two mails ago.  My last mail simply pointed out that this
isn't a GCC bug, but merely documented behaviour.  Well okay, it was
documented only after people found problems with it; it is an edge
case and pretty hard to trigger (impossible to trigger on RISC
architectures, and not very likely on x86 either, for different
reasons).

Since realistic people keep using +m anyhow, current GCC has added
a workaround that splits +m into an =m and an m constraint.  It
likely will be accepted as a permanent feature.  However, that 
workaround

doesn't yet exist on some of the compilers Linux still allows building
with (but the problem does).


We use it. It's better than the alternatives.


How is this better than simply writing the slightly more verbose
asm(... : =m(XX) : m(XX)) ?  It's a few more characters.
asm() is hard to write (correctly) anyway.  I don't see the problem.


Pointing to stale documentation doesn't change anything.


It's not stale documentation, it's freshly built from GCC mainline.
It's also not stale in the sense that it isn't updated; in fact,
the email I pointed to is from a thread where a change in this exact
paragraph in the documentation was suggested (a couple of weeks ago).


The same is true of accesses through a volatile pointer. The kernel has
done that for a *loong* time (all MMIO IO is done that way),


(All MMIO on certain architectures).  This however is a completely
different case; there _is_ no underlying C object declared anywhere,
so no way can GCC prove that that object isn't volatile, so it _has_
to assume it is.

Also, in case of e.g. readb(), if the compiler can prove the resulting
value isn't used it doesn't have to perform the access at all -- the
documentation (again, not stale documentation) points this out quite
literally.


and it's
totally pointless to say that volatile isn't guaranteed to do what it
does.


I actually asked a couple of other GCC developers last night, and
they all agreed with me.  If you look at historic GCC problem reports
(I pointed at a few earlier in these threads) you would see that the
situation is far from clear.


It works, and quite frankly, if it didn't work, it would be a gcc
BUG.


How can it be a bug, if the semantics you think are guaranteed are
guaranteed in your (and others') minds only?

Of course, GCC likes to cater to its users, and tries to make things
work the way the programmer probably intended.  However it a) cannot
*actually* read your mind, it just looks that way; b) it _also_ has
to implement the _correct_ semantics of volatile; c) since this isn't
part of any C version (no, not GNU99 or similar either), sometimes
GCC developers do not think about what some people expect the compiler
should do with such code, but just implement the requirements they
*know* exist.  And that's when BUGs happen.

If you want GCC to do what you want it to do instead of what it does,
why not file an enhancement request?  http://gcc.gnu.org/bugzilla.

To save you the trouble, I just did: gcc.gnu.org/PR33053.  Let's
see what comes from it.

And again, this is not a C standard issue. It's a sane 
implementation

issue.


Maybe what you consider sane isn't as clear-cut as you think?  I'm
not alone with this opinion, as the mailing lists archives readily
show.


Linux expects the compiler to be sane.


And the compiler expects the code it is asked to translate to
be reasonably sane.  Nothing new here.  The compiler is more
forgiving than the Linux code, IMHO; that's why I suggested
using the obviously correct asm implementation of the atomic
get/set, rather than using the not-so-obviously-correct and
error-prone volatile thing.  The semantics of the asm() are
exactly the semantics you expect from an atomic get/set, while
that of the volatile implementation are *not*; and on top of
that it is very well-tested well-understood technology.


If it isn't, that's not our problem.


What, if the compiler doesn't work as you expect, like a silent
miscompilation, or even only the compiler ICEs, that is not a
problem for users and/or developers of Linux?  Interesting
opinion.  I'd rather stay clear of known pitfalls.


gcc *is* sane,


Yes, it is.


and I don't see why you constantly act as if it wasn't.


I guess I didn't express myself clearly enough then.  Hopefully
some other people _did_ understand what I meant.  In very harsh
words: not all (proposed) kernel code is sane.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-12 Thread Segher Boessenkool

Yes, though I would use =m on the output list and m on the input
list. The reason is that I've seen gcc fall on its face with an ICE 
on
s390 due to +m. The explanation I've got from our compiler people 
was

quite esoteric, as far as I remember gcc splits +m to an input
operand
and an output operand. Now it can happen that the compiler chooses 
two

different registers to access the same memory location. +m requires
that the two memory references are identical which causes the ICE if
they are not.


The problem is very nicely described here, last paragraph:
http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01816.html

It's not a problem anymore in (very) recent GCC, although
that of course won't help you in the kernel (yet).


So you are saying that gcc 3.x still has this problem ?


Yes.  A warning (read-write constraint does not allow a register)
was added for GCC-3.4, but the fix/workaround is more recent
(4.2 I believe, perhaps it was backported to the 4.1 branch).


I do not know if the current compilers still do this. Has
anyone else seen this happen ?


In recent GCC, it's actually documented:

	 The ordinary output operands must be write-only; GCC will assume 
that

the values in these operands before the instruction are dead and need
not be generated.  Extended asm supports input-output or read-write
	operands.  Use the constraint character `+' to indicate such an 
operand

and list it with the output operands.  You should only use read-write
	operands when the constraints for the operand (or the operand in 
which

only some of the bits are to be changed) allow a register.

Note that last line.


I see, thanks for the info.


My pleasure.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-12 Thread Segher Boessenkool

+m works. We use it. It's better than the alternatives. Pointing to
stale documentation doesn't change anything.


Well, perhaps on i386. I've seen some older versions of the s390 gcc 
die
with an ICE because I have used +m in some kernel inline assembly. 
I'm

happy to hear that this issue is fixed in recent gcc. Now I'll have to
find out if this is already true with gcc 3.x.


It was fixed (that is, +m is translated into a separate read
and write by GCC itself) in GCC-4.0.0, I just learnt.

The duplication =m and m with the same constraint is rather 
annoying.


Yeah.  Compiler errors are more annoying though I dare say ;-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-12 Thread Segher Boessenkool

Yeah.  Compiler errors are more annoying though I dare say ;-)


Actually, compile-time errors are fine,


Yes, they don't cause data corruption or anything like that,
but I still don't think the 390 people want to ship a kernel
that doesn't build -- and it seems they still need to support
GCC versions before 4.0.  Up until the day they can finally
drop 3.x, they'll have to...


and easy to work around.


...use the simple workaround, annoying as it may be, at least
it works.  That's all I'm saying.


*Much*
more annoying is when gcc actively generates subtly bad code.


Quite so.


We've had
use-after-free issues due to incorrect gcc liveness calculations etc, 
and
inline asm has beeen one of the more common causes - exactly because 
the

kernel is one of the few users (along with glibc) that uses it at all.


The good news is that things are getting better since more and
more of the RTL transformations are ripped out.


Now *those* are hard to find - the code works most of the time, but the
compiler has inserted a really subtle race condition into the code
(deallocated a local stack entry before last use). We had that with our
semaphore code at some point.


Yeah, magnifying glass + lots of time kind of bug.  Lovely :-/


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-11 Thread Segher Boessenkool

You'd have to use +m.


Yes, though I would use =m on the output list and m on the input
list. The reason is that I've seen gcc fall on its face with an ICE on
s390 due to +m. The explanation I've got from our compiler people was
quite esoteric, as far as I remember gcc splits +m to an input 
operand

and an output operand. Now it can happen that the compiler chooses two
different registers to access the same memory location. +m requires
that the two memory references are identical which causes the ICE if
they are not.


The problem is very nicely described here, last paragraph:
http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01816.html

It's not a problem anymore in (very) recent GCC, although
that of course won't help you in the kernel (yet).


I do not know if the current compilers still do this. Has
anyone else seen this happen ?


In recent GCC, it's actually documented:

 The ordinary output operands must be write-only; GCC will assume that
the values in these operands before the instruction are dead and need
not be generated.  Extended asm supports input-output or read-write
operands.  Use the constraint character `+' to indicate such an operand
and list it with the output operands.  You should only use read-write
operands when the constraints for the operand (or the operand in which
only some of the bits are to be changed) allow a register.

Note that last line.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Segher Boessenkool

The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine.  I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard 
really

does permit this.


Code all over the kernel assumes that 32-bit reads/writes
are atomic so while such a compiler might be legal it certainly
can't compile Linux.


That means GCC cannot compile Linux; it already optimises
some accesses to scalars to smaller accesses when it knows
it is allowed to.  Not often though, since it hardly ever
helps in the cost model it employs.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Segher Boessenkool

That means GCC cannot compile Linux; it already optimises
some accesses to scalars to smaller accesses when it knows
it is allowed to.  Not often though, since it hardly ever
helps in the cost model it employs.


Please give an example code snippet + gcc version + arch
to back this up.


unsigned char f(unsigned long *p)
{
return *p  1;
}

with both

powerpc64-linux-gcc (GCC) 4.3.0 20070731 (experimental)

and

powerpc64-linux-gcc-4.2.0 (GCC) 4.2.0

(sorry, I don't have anything newer or older right now; if you
really care, I can test with those too)

generate (in 64-bit mode):

.L.f:
lbz 3,7(3)
rldicl 3,3,0,63
blr

and in 32-bit mode:

f:
stwu 1,-16(1)
nop
nop
lbz 3,3(3)
addi 1,1,16
rlwinm 3,3,0,31,31
blr

(the nops are because I use --with-cpu=970).


But perhaps you do not care for PowerPC, in which case:

i686-linux-gcc (GCC) 4.2.0 20060410 (experimental)

(sorry for the old version, I don't build x86 compilers
all that often; also I don't have a 64-bit version right
now):

f:
pushl   %ebp
movl%esp, %ebp
movl8(%ebp), %eax
popl%ebp
movzbl  (%eax), %eax
andl$1, %eax
ret


If you want testing with any other versions, and/or for
any other target architecture, I can do that; it takes a
few minutes to build a compiler.

It is quite hard to build a testcase that reads more than
one part of the long, since for small testcases the
compiler will almost always be smart enough to do one
bigger read instead; but it certainly isn't inconceivable,
and anyway the compiler would be fully in its right to do
reads non-atomically if not instructed otherwise.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Segher Boessenkool

That means GCC cannot compile Linux; it already optimises
some accesses to scalars to smaller accesses when it knows
it is allowed to.  Not often though, since it hardly ever
helps in the cost model it employs.


Please give an example code snippet + gcc version + arch
to back this up.


unsigned char f(unsigned long *p)
{
return *p  1;
}


This doesn't really matter since we only care about the LSB.


It is exactly what I claimed, and what you asked proof of.


Do you have an example where gcc reads it non-atmoically and
we care about all parts?


Like I explained in the original mail; no, I suspect such
a testcase will be really hard to construct, esp. as a small
testcase.  I have no reason to believe it is impossible to
do so though -- maybe someone else can write trickier code
than I can, in which case, please do so.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool
We can't have split stores because we don't use atomic64_t on 32-bit 
architectures.


That's not true; the compiler is free to split all stores
(and reads) from memory however it wants.  It is debatable
whether volatile would prevent this as well, certainly
it is unsafe if you want to be portable.  GCC will do its
best to not split volatile memory accesses, but bugs in
this area do happen a lot (because the compiler code for
volatile isn't as well exercised as most other compiler
code, and because it is simply a hard subject; and there
is no real formalised model for what GCC should do).

The only safe way to get atomic accesses is to write
assembler code.  Are there any downsides to that?  I don't
see any.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/24] document volatile atomic_read() behavior

2007-08-09 Thread Segher Boessenkool

Historically this has been
+accomplished by declaring the counter itself to be volatile, but the
+ambiguity of the C standard on the semantics of volatile make this 
practice

+vulnerable to overly creative interpretation by compilers.


It's even worse when accessing through a volatile casted pointer;
see for example the recent(*) GCC bugs in that area.

(*) Well, not _all_ that recent.  No one should be using the 3.x
series anymore, right?


Explicit
+casting in atomic_read() ensures consistent behavior across 
architectures

+and compilers.


Even modulo compiler bugs, what makes you believe that?


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool

The only safe way to get atomic accesses is to write
assembler code.  Are there any downsides to that?  I don't
see any.


The assumption that aligned word reads and writes are atomic, and that 
words are aligned unless explicitly packed otherwise, is endemic in 
the kernel.  No sane compiler violates this assumption.  It's true 
that we're not portable to insane compilers after this patch, but we 
never were in the first place.


You didn't answer my question: are there any downsides to using
explicit coded-in-assembler accesses for atomic accesses?  You
can handwave all you want that it should just work with
volatile accesses, but volatility != atomicity, volatile in C
is really badly defined, GCC never officially gave stronger
guarantees, and we have a bugzilla full of PRs to show what a
minefield it is.

So, why not use the well-defined alternative?


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool

The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine.  I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard 
really

does permit this.


Yes, but we don't write code for these compilers.  There are countless 
pieces of kernel code which would break in this condition, and there 
doesn't seem to be any interest in fixing this.


Other things are broken too.  Great argument :-)

Sequence points enforce read-after-write ordering, not 
write-after-write.


Sequence points order *all* side effects; sequence points exist in the
domain of the abstract sequential model of the C language only.  The
compiler translates that to machine code that is equivalent to that C
code under the as-if rule; but this is still in that abstract model,
which doesn't include things such as SMP, visibility by I/O devices,
store queues, etc. etc.

We flush writes with reads for MMIO because of this effect as well as 
the CPU/bus effects.


You cannot flush all MMIO writes with reads; this is a PCI-specific
thing.  And even then, you need more than just the read itself: you
have to make sure the read completed and returned data.


In short, please retain atomic_set()'s volatility, especially on those
architectures that declared the atomic_t's counter to be volatile.


Like i386 and x86_64?  These used to have volatile in the atomic_t 
declaration.  We removed it, and the sky did not fall.


And this proves what?  Lots of stuff works by accident.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool

The only safe way to get atomic accesses is to write
assembler code.  Are there any downsides to that?  I don't
see any.


The assumption that aligned word reads and writes are atomic, and 
that words are aligned unless explicitly packed otherwise, is 
endemic in the kernel.  No sane compiler violates this assumption.  
It's true that we're not portable to insane compilers after this 
patch, but we never were in the first place.

You didn't answer my question: are there any downsides to using
explicit coded-in-assembler accesses for atomic accesses?  You
can handwave all you want that it should just work with
volatile accesses, but volatility != atomicity, volatile in C
is really badly defined, GCC never officially gave stronger
guarantees, and we have a bugzilla full of PRs to show what a
minefield it is.
So, why not use the well-defined alternative?


Because we don't need to,


You don't need to use volatile objects, or accesses through
valatile-cast pointers, either.


and it hurts performance.


Please show how it does this -- one load is one load either way,
and one store is one store.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool
If you need to guarantee that the value is written to memory at a 
particular time in your execution sequence, you either have to read it 
from memory to force the compiler to store it first


That isn't enough.  The CPU will happily read the datum back from
its own store queue before it ever hit memory or cache or any
external bus.  If you need the value to be globally visible before
another store, you use wmb(); if you need it before some read,
you use mb().

These concepts are completely separate from both atomicity and
volatility (which aren't the same themselves).

No, but I can reason about it and be confident that this won't break 
anything that isn't already broken.  At worst, this patch will make 
any existing subtly incorrect uses of atomic_t much more obvious and 
easier to track down.


PR22278, PR29753 -- both P2 bugs, and both about basic *(volatile *)
usage.  Yeah, it's definitely not problematic, esp. not if you want
to support older compilers than 4.2 too./sarcasm


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/24] document volatile atomic_read() behavior

2007-08-09 Thread Segher Boessenkool

Explicit
+casting in atomic_read() ensures consistent behavior across 
architectures

+and compilers.

Even modulo compiler bugs, what makes you believe that?


When you declare a variable volatile, you don't actually tell the 
compiler where you want to override its default optimization behavior, 
giving it some freedom to guess your intentions incorrectly.  When you 
put the cast on the data access itself, there is no question about 
precisely where in the code you want to override the compiler's 
default optimization behavior.


...except for the small point that this isn't how volatile works.

Rule of thumb: even people who know the semantics of volatile
shouldn't use it.

If the compiler doesn't do what you want with a volatile declaration, 
it might have a plausible excuse in the ambiguity of the C standard.  
If the compiler doesn't do what you want in a cast specific to a 
single dereference, it's just plain broken.


The other way around.  volatile has pretty weak semantics, and
certainly not the semantics you think it has, or that you wish it
had; but *(volatile XX *) doesn't have *any* semantics.  However
much you cast that pointer it still doesn't point to a volatile
object.

GCC will generally take the path of least surprise and perform a
volatile access anyway, but this has only be decided recently (a
year ago or so), and there very likely still are some bugs in
that area.

We try to be compatible with plausibly correct compilers, but if 
they're completely broken, we're screwed no matter what.


If you don't know what to expect, you're screwed for sure.


Anyway, what's the supposed advantage of *(volatile *) vs. using
a real volatile object?  That you can access that same object in
a non-volatile way?


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/24] document volatile atomic_read() behavior

2007-08-09 Thread Segher Boessenkool

Anyway, what's the supposed advantage of *(volatile *) vs. using
a real volatile object?  That you can access that same object in
a non-volatile way?


That's my understanding.  That way accesses where you don't care about 
volatility may be optimised.


But those accesses might be done non-atomically then (for example,
if the compiler knows it only needs to write one byte, it might not
bother writing the rest), so that's no good if you want to read the
thing atomically too.

For instance, in cases where there are already other things 
controlling visibility (as are needed for atomic increment, for 
example) you don't need to make the access itself volatile.


Hrm, you mean within a lock or similar?  You'll get the same semantics
as volatile anyway there.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/24] document volatile atomic_read() behavior

2007-08-09 Thread Segher Boessenkool

Anyway, what's the supposed advantage of *(volatile *) vs. using
a real volatile object?  That you can access that same object in
a non-volatile way?


You'll have to take that up with Linus and the minds behind Volatile 
Considered Harmful, but the crux of it is that volatile objects are 
prone to compiler bugs too, and if we have to track down a compiler 
bug, it's a lot easier when we know exactly where the load is supposed 
to be because we deliberately put it there, rather than letting the 
compiler re-order everything that lacks a strict data dependency and 
trying to figure out where in a thousand lines of assembler the 
compiler should have put the load for the volatile object.


So, why not do the access explicitly via an inline asm?  It
generates the same code, it's obviously correct, and it's
even *actually* correct.  Plus, you get good compiler
support (and compiler people support).

If we're going to assume that the compiler has bugs we'll never be 
able to find, we all need to find new careers.


If we cannot find the bug in finite time, we cannot observe
the bug in finite time either, so either way that's fine :-)

If we're going to assume that it has bugs we *can* find, then let's 
use code that makes it easier to do that.


And I'm saying this is a step in the wrong direction for that.

I initially proposed a patch that made all the objects volatile, on 
the grounds that this was a special case where there wasn't much room 
to have a misunderstanding that resulted in anything worse than wasted 
loads.  Linus objected, and now that I've seen all the responses to 
the new patchset, I understand exactly why.  If our compilers really 
suck as much as everyone says they do, it'll be much easier to detect 
that with volatile casts than with volatile declarations.


Except that accesses via volatile pointer casts open up a whole
new can of worms.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool

So, why not use the well-defined alternative?

Because we don't need to, and it hurts performance.

It hurts performance by implementing 32-bit atomic reads in assembler?


No, I misunderstood the question.  Implementing 32-bit atomic reads in 
assembler is redundant, because any sane compiler, *particularly* and 
optimizing compiler (and we're only in this mess because of optimizing 
compilers)


Oh please, don't tell me you don't want an optimising compiler.
And if you _do_ want one, well you're in this mess because you
chose C as implementation language and C has some pretty strange
rules.  Trying to use not-all-that-well-defined-and-completely-
misunderstood features of the language doesn't make things easier;
trying to use something that isn't even part of the language and
that your particular compiler originally supported by accident,
and that isn't yet an officially supported feature, and that on
top of it all has a track record of problems -- well it makes me
wonder if you're in this game for fun or what.


 will give us that automatically without the assembler.


No, it does *not* give it to you automatically; you have to do
either the asm() thing, or the not-defined-at-all *(volatile *)
thing.

Yes, it is legal for a compiler to violate this assumption.  It is 
also legal for us to refuse to maintain compatibility with compilers 
that suck this badly.


So that's rm include/linux/compiler-gcc*.h then.  Good luck with
the intel compiler, maybe it works more to your liking.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ucc_geth.c, make PHY device optional.

2007-07-04 Thread Segher Boessenkool
How about separate autoneg to a property dumb-phy, which  
indicates the

PHY/switch doesn't provide MII register interface.


Something like that I suppose.  But don't call it dumb phy,
nor fake phy, nor anything similar -- there simply is _no_
phy.  If the Linux code wants to pretend there is one, that's
one thing, but there is no need to do any of this trickery in
the device tree.


Therefore, it should
use the fixed speed and duplex from device node rather than registers.


Yes.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes

2007-06-13 Thread Segher Boessenkool

I wish there was a git option to just make my shit look like the
remote, dammit!  The above is the easiest way I know how to do that.


git-fetch -f remote:local ?


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 4/10] spidernet: zero out a pointer.

2007-05-22 Thread Segher Boessenkool
If you're going to be paranoid, shouldn't you do something here to 
make

sure the value's hit the device?


I thought the whole point of paranoia is that its inexplicable.

Here's a delusional reply: I didn't see any point to it.
1) a wmb would add overhead


A wmb() doesn't guarantee the write has reached the device.


2) the hardware is supposed to be looking at the status flag,
   anyway, and not misbehaving.


But you're paranoid, right?  Can't trust that device! :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] gianfar: Add I/O barriers when touching buffer descriptor ownership.

2007-05-04 Thread Segher Boessenkool

Well, Segher doesn't want me to use iobarrier (because it's not I/O).
Andy doesn't want me to use wmb() (because it's sync).  I don't think
something like gfar_wmb() would be appropriate.  So the remaining
options are either eieio(),


? Just curious... the original intent of eieio was to order I/O,
such as MMIO; it has no effect on memory that isn't marked
cache-inhibited or write-trhough or guarded. Has this changed?


eieio orders all accesses to address space that is WIMG=1xxx
or WIMG=x1x1; separately, it orders stores to address space
that is WIMG=001x.


I guess I haven't kept up with the times ... is eieio now
being used to provide some other kind of barrier?


Nothing changed.


Is eieio providing some sort of SMP synchronization side-effect?


It orders stores to well-behaved memory yes.


Point being: if Segher doesn't let you use iobarrier (because
it's not I/O), then I don't understand why eieio would work (since
that's for io only).


iobarrier() is a kernel-level primitive, meant for ordering
I/O only, as its name indicates.  eieio is a CPU insn that
orders stores to main memory (amongst other things), not that
its name would tell you.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] gianfar: Add I/O barriers when touching buffer descriptor ownership.

2007-05-03 Thread Segher Boessenkool

So what about some thing like this where we do the read only once?

- k

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index a06d8d1..9cd7d1e 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1438,31 +1438,35 @@ int gfar_clean_rx_ring(struct net_device 
*dev, int rx_work_limit)

 {
struct rxbd8 *bdp;
struct sk_buff *skb;
-   u16 pkt_len;
+   u16 pkt_len, status;
+   u32 bd_info;



I suggested that on IRC yesterday, and Segher was concerned that the
compiler might, in theory, optimize it into to two lhz instructions.


Yes.  The same is true of the original code btw, but since
you test only one bit there, all is fine.


I'm rather skeptical that it would actually do so (even if it needs to
load twice due to register pressure, why not just use lwz both times?),


Sure.  That doesn't make this code correct though.


and there's probably many other places that would break if it did,


Most other network drivers read from an MMIO reg to see
which RX ring entries are kernel owned AFAICS.


but I wasn't up for digging around GCC to prove otherwise.


It doesn't matter what current GCC does -- simply look
at what it is *allowed* to do instead.

If you want a 32-bit read to be atomic, you should
do the read via a (volatile u32 *).  Doing this with
a cast in the places where you need the atomic access
makes sure you don't get unnecessary rereads.

Plus, that wouldn't synchronize the bd_info read with the buffer data 
reads.


Yes, you still need the rmb().


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gianfar: Add I/O barriers when touching buffer descriptor ownership.

2007-05-02 Thread Segher Boessenkool

AFAICS you need stronger barriers though; {w,r,}mb(),
to prevent _any_ reordering of those memory accesses,
not just the compiler-generated ones.


My impression was that the eieio used by iobarrier would be sufficient 
for that, as we're not trying to synchronize between accesses to 
different types of memory.  Is sync really required here?


For accesses to main system memory, eieio only orders
writes, not reads, so iobarrier_r() doesn't do what
you want; and iobarrier_w() isn't meant to be used for
main memory access ordering either.

Also, it is better to not use powerpc-specific interfaces
in a device driver if you don't have a strong reason to.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] gianfar: Add I/O barriers when touching buffer descriptor ownership.

2007-05-02 Thread Segher Boessenkool

And the driver is already ppc-specific; it uses in/out_be32.


True, but its hidden behind the gfar_read/write accessors.

Your change is a bit more blatant.


Well, Segher doesn't want me to use iobarrier (because it's not I/O).
Andy doesn't want me to use wmb() (because it's sync).


You should use wmb(), but unfortunately too strong
semantics are required for that (ordering wrt I/O)
so it's a full sync on PowerPC.  I don't believe
a priori that that would be notably slower, but if
actually is, you could use eieio() I suppose since
you say the driver is powerpc specific -- but please
put a comment in the source code then saying why you
don't use wmb() there.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gianfar: Add I/O barriers when touching buffer descriptor ownership.

2007-05-01 Thread Segher Boessenkool
The hardware must not see that is given ownership of a buffer until it 
is

completely written, and when the driver receives ownership of a buffer,
it must ensure that any other reads to the buffer reflect its final
state.  Thus, I/O barriers are added where required.

Without this patch, I have observed GCC reordering the setting of
bdp-length and bdp-status in gfar_new_skb.


The :::memory in the barriers you used prevent GCC
from reordering accesses around the barriers.

AFAICS you need stronger barriers though; {w,r,}mb(),
to prevent _any_ reordering of those memory accesses,
not just the compiler-generated ones.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-26 Thread Segher Boessenkool
I thought the motivation for div64() was that a 64:32-32 divide could 
be done a lot faster on a number of platforms (including the important 
x86) than a generic 64:64-64 divide, but gcc doesn't handle the 
devolution automatically -- there is no such libgcc function.


That there's no such function in libgcc doesn't mean GCC
cannot handle this; libgcc is a bunch of library functions
that are really needed for generated code (because you
really don't want to expand those functions inline
everywhere) -- you won't find an addsi3 in libgcc either.

There does exist a divmoddisi4, sort of.

It used to be defined in three GCC targets, but commented
out in all three.  The NS32k is long gone.  For Vax, a
comment says the machine insn for this isn't used because
it is just too slow.  And the i386 version is disabled
because it returns the wrong result on overflow (not the
truncated 64-bit result, required by the implicit cast
to 32-bit, but the i386 arch traps to the overflow handler).

The only way to express the semantics you want in (GNU) C
is to use asm() -- and that's exactly what div64() does :-)

Blame it on the C language, but not on GCC.  Not this time.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove powerpc specific parts of 3c509 driver

2006-09-20 Thread Segher Boessenkool
Sure, PCI busses are little-endian.  But is readX()/writeX() for  
PCI

only?


Yes.

For other buses, use foo_writel(), etc.

Can this please be documented then?  Never heard this before...


You have come late to the party.


WHat do you mean here?  Could you please explain?


This has been the case for many, many years.


No, it was never documented AFAICS.


And there is no point in a massive rename to pci_writel(), either.


That would be really inconvenient, sure.  It's also inconvenient
that all the nice short names are PCI-only.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove powerpc specific parts of 3c509 driver

2006-09-19 Thread Segher Boessenkool
Nah. We have the basic rule that readl/writel are little endian.  
PowerPC

additionally provides arch specific low level in_{be,le}32 type
accessors with explicit endianness. Or you can also use
cpu_to_le32/le32_to_cpu kind of macros to convert between native and
explicit endianness.


Sure, PCI busses are little-endian.  But is readX()/writeX() for PCI
only?  I sure hope not.

It would make a lot more sense if readX()/writeX() used the endianness
of the bus they are performed on.  PowerPC byteswaps are cheap -- for
16- and 32-bit accesses.  They're quite bad for 64-bit though; it would
be a pity to end up doing two of those for a 64-bit big-endian I/O  
access

(one on the access itself, one to convert the data back to CPU order).

This would happily solve the problem of the various variations of
byte-swapping bus bridges, too (natural swap, 32-bit swap, 64-bit  
swap,

perhaps others that I thankfully have never seen or cannot remember).

Now you can say, use readl_be() or something similar, but that's a)  
ugly,

b) error-prone, c) exponential interface explosion, d) ugly.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove powerpc specific parts of 3c509 driver

2006-09-19 Thread Segher Boessenkool

Sure, PCI busses are little-endian.  But is readX()/writeX() for PCI
only?  I sure hope not.


It's defined for PCI and possibly ISA memory. You can use it for other
things if you whish to, but other things are arch specific in any
case.


Huh?  You're saying that only PCI and ISA are standardised busses?

It would make a lot more sense if readX()/writeX() used the  
endianness

of the bus they are performed on.


No way ! Again, it's evil if such a simple thing start doing different
things depending on random external factors.


That's your opinion, yes.

I'm saying it's *not* doing different things: in both cases it just does
the correct-endian access.  Also it doesn't depend on random external
factors -- they're not random factors, and not external either: it only
depends on the bus the access is done on.


Different bus - different accessor.


Then please rename readX()/writeX() to pci_readX()/pci_writeX().


Now you can say, use readl_be() or something similar, but that's a)
ugly,
b) error-prone, c) exponential interface explosion, d) ugly.


I'd rather has an interface explosion than having black endian magic
happening inside of the accessors.


Any comments on a), b) and d) as well?


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove powerpc specific parts of 3c509 driver

2006-09-19 Thread Segher Boessenkool

Sure, PCI busses are little-endian.  But is readX()/writeX() for PCI
only?


Yes.

For other buses, use foo_writel(), etc.


Can this please be documented then?  Never heard this before...


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove powerpc specific parts of 3c509 driver

2006-09-19 Thread Segher Boessenkool

Well, I'm having trouble thinking of other busses that have as strong
a sense of the address-data style I/O as PCI. Busses like scsi and
ide are primarily command-data or data-data in style.  Only the
address-data style busses need readl/writel-style routines.


SBUS, JBUS, VMEbus, NuBus, RapidIO, eBus, various SoC busses, to name
a few.  There are many, and most are big-endian.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 data corruption (TSO ?)

2006-09-11 Thread Segher Boessenkool

#define tw32_rx_mbox(reg, val)  do { wmb();

tp-write32_rx_mbox(tp, reg, val); } while(0)

#define tw32_tx_mbox(reg, val)  do { wmb();

tp-write32_tx_mbox(tp, reg, val); } while(0)




That should do it.

I think we need those tcpdump after all.  Can you send it to me?


Looks like adding a sync to writel does fix it though... I'm trying to
figure out which specific writel in the driver makes a difference.  
I'll

then look into slicing those tcpdumps.


Adding a PowerPC sync to every writel() will slow down things by
so much that it could well just hide the problem, not solve it...

Michael, we see this problem only with TSO on, and then the failure
we see is bad data being sent out, with the correct header (but the
header is constructed by the tg3 in this case, so no surprise).

I'm theorizing that this same failure can happen with TSO off as well,
but the header sent on the wire will be bogus as well as the data,
so the receiving NIC won't pick it up.  tcpdump probably won't see
it either...  will need a low-level ethernet analyser.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 data corruption (TSO ?)

2006-09-08 Thread Segher Boessenkool

I've been chasing with Segher a data corruption problem lately.
Basically transferring huge amount of data (several Gb) and I get
corrupted data at the rx side. I cannot tell for sure wether what  
I've
been observing here is the same problem that segher's been seing  
on is
blades, he will confirm or not. He also seemed to imply that  
reverting

to an older kernel on the -receiver- side fixed it, which makes me
wonder, since it's looks really like a sending side problem (see
explanation below), if some change in, for exmaple, window scaling,
might hide or trigger it.


Please send me lspci and tg3 probing output so that I know what
tg3 hardware you're using.


I use a 5780 rev. A3, but the problem is not limited to this chip.


I also want to look at the tcpdump or
ethereal on the mirrored port that shows the packet being corrupted.


I don't have such, sorry.


That's all the data I have at this point. I can't guarantee 100% that
it's a TSO bug (it might be a bug that TSO renders visible
due to timing
effects) but it looks like it since I've not reproduced yet with TSO
disabled.


It seems to indeed to only be exposed by TSO, not actually a
bug of it /an sich/.

I've got a patch that seems so solve the problem, it needs more testing
though (maybe Ben can do this :-) ).  The problem is that there should
be quite a few wmb()'s in the code that are just not there; adding some
to tg3_set_txd() seems to fix the immediate problem but more is needed
(and I don't see why those should be needed, unless tg3_set_txd() is
updating a life ring entry in place or something like that).

More testing is needed, but the problem is definitely the lack of memory
ordering.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 problem on powerpc

2006-09-04 Thread Segher Boessenkool

The patch has a couple of places where I reversed 2 assignments, they
are harmless, it was before I figured out that the chip will
(apparently) not access a descriptor before it's been told to do so  
via
MMIO, and thus the order of the writes to the descriptors is  
irrelevant

(I was also adding wmb's though I removed them).


wmb() between writing the descriptors to main memory and writing
an MMIO to the device to tell it to go fetch them?  That's still
required.  The 970 you tested on won't care though ;-)


There is a couple of places where we were doing a BE and not LE
conversion of a descriptor field (typically in the VLAN code). I'm not
sure what's up there but BE felt wrong. I have turned them into LE
conversions but then I haven't tested VLAN, and I might just
misudnerstand what's happening there so I'll let you decide what to do
about those.


VLAN tags are big-endian on the wire, it might well be that these
chips never swizzle that?

--- linux-work.orig/drivers/net/sky2.c	2006-09-04  
17:35:20.0 +1000

+++ linux-work/drivers/net/sky2.c   2006-09-04 17:41:50.0 +1000
@@ -811,8 +811,9 @@ static void rx_set_checksum(struct sky2_
struct sky2_rx_le *le;

le = sky2_next_rx(sky2);
-   le-addr = (ETH_HLEN  16) | ETH_HLEN;
+   le-addr = cpu_to_le32((ETH_HLEN  16) | ETH_HLEN);
le-ctrl = 0;
+   le-length = 0;
le-opcode = OP_TCPSTART | HW_OWNER;


Is this  le-length  thing a separate bugfix?  Or just overly
protective coding (you write the field later again).


+   /* Don't let it byte swap descriptors in hardware. Clear the bit
+* in case a previous driver have set it
+*/
+   reg = sky2_pci_read32(hw, PCI_DEV_REG2);
+   reg = ~PCI_REV_DESC;
+   sky2_pci_write32(hw, PCI_DEV_REG2, reg);


Probably not needed, the chip gets a full reset at startup (or so
I hope anyway, heh).


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html