Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-21 Thread Segher Boessenkool
At some point in the future, barrier() will be universally regarded as 
a hammer too big for most purposes.  Whether or not removing it now


You can't just remove it, it is needed in some places; you want to
replace it in most places with a more fine-grained compiler barrier,
I presume?

constitutes premature optimization is arguable, but I think we should 
allow such optimization to happen (or not happen) in 
architecture-dependent code, and provide a consistent API that doesn't 
require the use of such things in arch-independent code where it might 
turn into a totally superfluous performance killer depending on what 
hardware it gets compiled for.


Explicit barrier()s won't be too hard to replace -- but what to do
about the implicit barrier()s in rmb() etc. etc. -- *those* will be
hard to get rid of, if only because it is hard enough to teach driver
authors about how to use those primitives *already*.  It is far from
clear what a good interface like that would look like, anyway.

Probably we should first start experimenting with a forget()-style
micro-barrier (but please, find a better name), and see if a nice
usage pattern shows up that can be turned into an API.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: drop support for gcc 4.0

2007-08-21 Thread Segher Boessenkool

How many people e.g. test -rc kernels compiled with gcc 3.2?


Why would that matter?  It either works or not.  If it doesn't
work, it can either be fixed, or support for that old compiler
version can be removed.

The only other policy than only remove support if things are
badly broken would be only support what the GCC team supports,
which would be = 4.1 now; and there are very good arguments for
supporting more than that with the Linux kernel.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: drop support for gcc 4.0

2007-08-21 Thread Segher Boessenkool

Last time I tried a mips build, it would fail the compile unless I was
using _exactly_ 3.4.4 (I didn't tried older versions, but did try
3.4.6, for ex.).


If 3.4.4 works where 3.4.6 doesn't, you should report this as a
bug; either here, or to the GCC team (but please be aware that the
3.4 series isn't supported anymore), or to whoever built that
compiler for you.


So I also think the 3.4 series will still have to be
around for a while.


Huh?  3.4 doesn't work for you, so that's why it should stay
a supported compiler?


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: drop support for gcc 4.0

2007-08-21 Thread Segher Boessenkool

How many people e.g. test -rc kernels compiled with gcc 3.2?


Why would that matter?  It either works or not.  If it doesn't
work, it can either be fixed, or support for that old compiler
version can be removed.


One bug report kernel doesn't work / crash / ... when compiled with
gcc 3.2, but works when compiled with gcc 4.2 will most likely be lost
in the big pile of unhandled bugs, not cause the removal of gcc 3.2
support...


While that might be true, it's a separate problem.


The only other policy than only remove support if things are
badly broken would be only support what the GCC team supports,
which would be = 4.1 now; and there are very good arguments for
supporting more than that with the Linux kernel.


No, it's not about bugs in gcc, it's about kernel+gcc combinations that
are mostly untested but officially supported.


What does officially supported mean?  Especially the
officially part.  Is this documented somewhere?


E.g. how many kernel developers use kernels compiled without
unit-at-a-time? And unit-at-a-time does paper over some bugs,
e.g. at about half a dozen section mismatch bugs I've fixed
recently are not present with it.


If any developer is interested in supporting some certain old
compiler version, he should be testing regularly with it.  Sounds
like that's you ;-)

If no developer is interested, we shouldn't claim to support
using that compiler version.


But as the discussions have shown gcc 4.0 is currently too high for
making a cut, and it is not yet the right time for raising the minimum
required gcc version.


Agreed.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-20 Thread Segher Boessenkool

And no, RMW on MMIO isn't "problematic" at all, either.

An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode.  But three actual operations.


Maybe for some CPUs, but not all.  ARM for instance can't use the
load exclusive and store exclusive instructions to MMIO space.


Sure, your CPU doesn't have RMW instructions -- how to emulate
those if you don't have them is a totally different thing.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [POWERPC] Fix for assembler -g

2007-08-20 Thread Segher Boessenkool

But there is no lparmap.o!  lparmap.s is the generated file.


Yeah, tell that to scripts/Makefile.lib:

_c_flags   = $(CFLAGS) $(EXTRA_CFLAGS) $(CFLAGS_$(basetarget).o)

What would do what a person expects is $(CFLAGS_$(@F)), I think.


Looks good to me.  Sam?  We wanted to set CFLAGS_lparmap.s .


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [POWERPC] Fix for assembler -g

2007-08-20 Thread Segher Boessenkool

In a way though, good thing, or mainline would have continued to be
broken :)


What do I need to do to see this bug, btw?  A special config
symbol perhaps?


CONFIG_DEBUG_INFO=y
and pass-g-to-assembler-under-config_debug_info.patch from -mm


Ah, an -mm patch.  Gotcha.

The fix most likely won't be necessary anymore for .24, since all
of lparmap.s will be gone -- see patch earlier today to linuxppc-dev.
Good riddance :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-20 Thread Segher Boessenkool
Such code generally doesn't care precisely when it gets the update, 
just that the update is atomic, and it doesn't loop forever.


Yes, it _does_ care that it gets the update _at all_, and preferably
as early as possible.


Regardless, I'm convinced we just need to do it all in assembly.


So do you want "volatile asm" or "plain asm", for atomic_read()?
The asm version has two ways to go about it too...


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-20 Thread Segher Boessenkool
Right. ROTFL... volatile actually breaks atomic_t instead of making 
it safe. x++ becomes a register load, increment and a register store. 
Without volatile we can increment the memory directly. It seems that 
volatile requires that the variable is loaded into a register first 
and then operated upon. Understandable when you think about volatile 
being used to access memory mapped I/O registers where a RMW 
operation could be problematic.


So, if we want consistent behavior, we're pretty much screwed unless 
we use inline assembler everywhere?


Nah, this whole argument is flawed -- "without volatile" we still
*cannot* "increment the memory directly".  On x86, you need a lock
prefix; on other archs, some other mechanism to make the memory
increment an *atomic* memory increment.

And no, RMW on MMIO isn't "problematic" at all, either.

An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode.  But three actual operations.


The advantages of asm code for atomic_{read,set} are:
1) all the other atomic ops are implemented that way already;
2) you have full control over the asm insns selected, in particular,
   you can guarantee you *do* get an atomic op;
3) you don't need to use "volatile " which generates
   not-all-that-good code on archs like x86, and we want to get rid
   of it anyway since it is problematic in many ways;
4) you don't need to use *(volatile *)&, which a) doesn't
   exist in C; b) isn't documented or supported in GCC; c) has a recent
   history of bugginess; d) _still uses volatile objects_; e) _still_
   is problematic in almost all those same ways as in 3);
5) you can mix atomic and non-atomic accesses to the atomic_t, which
   you cannot with the other alternatives.

The only disadvantage I know of is potentially slightly worse
instruction scheduling.  This is a generic asm() problem: GCC
cannot see what actual insns are inside the asm() block.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [POWERPC] Fix for assembler -g

2007-08-20 Thread Segher Boessenkool

In a way though, good thing, or mainline would have continued to be
broken :)


What do I need to do to see this bug, btw?  A special config
symbol perhaps?


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [POWERPC] Fix for assembler -g

2007-08-20 Thread Segher Boessenkool
Hmm.  Check the V=1 make output to see that the lparmap.c really got 
the -g0.
The log says it didn't.  Oops!  It looks like the patch that got 
committed is

the one that sets CFLAGS_lparmap.s, but really it needs to set
CFLAGS_lparmap.o instead (go kbuild).  Did I post the wrong one?  
(It's only

one letter different.)  Sorry about that!


But there is no lparmap.o!  lparmap.s is the generated file.

Maybe the best fix is

$(obj)/lparmap.s: CFLAGS += -g0

This whole, erm, "cleverness" will go away soon fwiw.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [POWERPC] Fix for assembler -g

2007-08-20 Thread Segher Boessenkool
Hmm.  Check the V=1 make output to see that the lparmap.c really got 
the -g0.
The log says it didn't.  Oops!  It looks like the patch that got 
committed is

the one that sets CFLAGS_lparmap.s, but really it needs to set
CFLAGS_lparmap.o instead (go kbuild).  Did I post the wrong one?  
(It's only

one letter different.)  Sorry about that!


But there is no lparmap.o!  lparmap.s is the generated file.

Maybe the best fix is

$(obj)/lparmap.s: CFLAGS += -g0

This whole, erm, cleverness will go away soon fwiw.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [POWERPC] Fix for assembler -g

2007-08-20 Thread Segher Boessenkool

In a way though, good thing, or mainline would have continued to be
broken :)


What do I need to do to see this bug, btw?  A special config
symbol perhaps?


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-20 Thread Segher Boessenkool
Right. ROTFL... volatile actually breaks atomic_t instead of making 
it safe. x++ becomes a register load, increment and a register store. 
Without volatile we can increment the memory directly. It seems that 
volatile requires that the variable is loaded into a register first 
and then operated upon. Understandable when you think about volatile 
being used to access memory mapped I/O registers where a RMW 
operation could be problematic.


So, if we want consistent behavior, we're pretty much screwed unless 
we use inline assembler everywhere?


Nah, this whole argument is flawed -- without volatile we still
*cannot* increment the memory directly.  On x86, you need a lock
prefix; on other archs, some other mechanism to make the memory
increment an *atomic* memory increment.

And no, RMW on MMIO isn't problematic at all, either.

An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode.  But three actual operations.


The advantages of asm code for atomic_{read,set} are:
1) all the other atomic ops are implemented that way already;
2) you have full control over the asm insns selected, in particular,
   you can guarantee you *do* get an atomic op;
3) you don't need to use volatile data which generates
   not-all-that-good code on archs like x86, and we want to get rid
   of it anyway since it is problematic in many ways;
4) you don't need to use *(volatile type*)data, which a) doesn't
   exist in C; b) isn't documented or supported in GCC; c) has a recent
   history of bugginess; d) _still uses volatile objects_; e) _still_
   is problematic in almost all those same ways as in 3);
5) you can mix atomic and non-atomic accesses to the atomic_t, which
   you cannot with the other alternatives.

The only disadvantage I know of is potentially slightly worse
instruction scheduling.  This is a generic asm() problem: GCC
cannot see what actual insns are inside the asm() block.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-20 Thread Segher Boessenkool
Such code generally doesn't care precisely when it gets the update, 
just that the update is atomic, and it doesn't loop forever.


Yes, it _does_ care that it gets the update _at all_, and preferably
as early as possible.


Regardless, I'm convinced we just need to do it all in assembly.


So do you want volatile asm or plain asm, for atomic_read()?
The asm version has two ways to go about it too...


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [POWERPC] Fix for assembler -g

2007-08-20 Thread Segher Boessenkool

In a way though, good thing, or mainline would have continued to be
broken :)


What do I need to do to see this bug, btw?  A special config
symbol perhaps?


CONFIG_DEBUG_INFO=y
and pass-g-to-assembler-under-config_debug_info.patch from -mm


Ah, an -mm patch.  Gotcha.

The fix most likely won't be necessary anymore for .24, since all
of lparmap.s will be gone -- see patch earlier today to linuxppc-dev.
Good riddance :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [POWERPC] Fix for assembler -g

2007-08-20 Thread Segher Boessenkool

But there is no lparmap.o!  lparmap.s is the generated file.


Yeah, tell that to scripts/Makefile.lib:

_c_flags   = $(CFLAGS) $(EXTRA_CFLAGS) $(CFLAGS_$(basetarget).o)

What would do what a person expects is $(CFLAGS_$(@F)), I think.


Looks good to me.  Sam?  We wanted to set CFLAGS_lparmap.s .


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-20 Thread Segher Boessenkool

And no, RMW on MMIO isn't problematic at all, either.

An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode.  But three actual operations.


Maybe for some CPUs, but not all.  ARM for instance can't use the
load exclusive and store exclusive instructions to MMIO space.


Sure, your CPU doesn't have RMW instructions -- how to emulate
those if you don't have them is a totally different thing.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

The documentation simply doesn't say "+m" is allowed.  The code to
allow it was added for the benefit of people who do not read the
documentation.  Documentation for "+m" might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly obvious).


Huh?

"If the (current) documentation doesn't match up with the (current)
code, then _at least one_ of them has to be (as of current) wrong."

I wonder how could you even try to disagree with that.


Easy.

The GCC documentation you're referring to is the user's manual.
See the blurb on the first page:

"This manual documents how to use the GNU compilers, as well as their
features and incompatibilities, and how to report bugs.  It corresponds
to GCC version 4.3.0.  The internals of the GNU compilers, including
how to port them to new targets and some information about how to write
front ends for new languages, are documented in a separate manual."

_How to use_.  This documentation doesn't describe in minute detail
everything the compiler does (see the source code for that -- no, it
isn't described in the internals manual either).

If it doesn't tell you how to use "+m", and even tells you _not_ to
use it, maybe that is what it means to say?  It doesn't mean "+m"
doesn't actually do something.  It also doesn't mean it does what
you think it should do.  It might do just that of course.  But treating
writing C code as an empirical science isn't such a smart idea.


And I didn't go whining about this ... you asked me. (I think I'd said
something to the effect of GCC docs are often wrong,


No need to guess at what you said, even if you managed to delete
your own mail already, there are plenty of free web-based archives
around.  You said:


See, "volatile" C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs).


and that to me reads as complaining that the ISO C standard "isn't
very good" and that the GCC documentation is 10**5482 times worse
even.  Which of course is hyperbole and cannot be true.  It also
isn't helpful in any way or form for anyone on this list.  I call
that whining.


which is true,


Yes, documentation of that size often has shortcomings.  No surprise
there.  However, great effort is made to make it better documentation,
and especially to keep it up to date; if you find any errors or
omissions, please report them.  There are many ways how to do that,
see the GCC homepage.


but probably you feel saying that is "not allowed" on non-gcc lists?)


You're allowed to say whatever you want.  Let's have a quote again
shall we?  I said:


If you find any problems/shortcomings in the GCC documentation,
please file a PR, don't go whine on some unrelated mailing lists.
Thank you.


I read that as a friendly request, not a prohibition.  Well maybe
not actually friendly, more a bit angry.  A request, either way.


As for the "PR"


"Problem report", a bugzilla ticket.  Sorry for using terminology
unknown to you.


you're requesting me to file with GCC for this, that
gcc-patches@ thread did precisely that


Actually not -- PRs make sure issues aren't forgotten (although
they might gather dust, sure).  But yes, submitting patches is a
Great Thing(tm).


and more (submitted a patch to
said documentation -- and no, saying "documentation might get added
later" is totally bogus and nonsensical -- documentation exists to
document current behaviour, not past).


When code like you want to write becomes a supported feature, that
will be reflected in the user manual.  It is completely nonsensical
to expect everything that is *not* a supported feature to be mentioned
there.


I wouldn't have replied, really, if you weren't so provoking.


Hey, maybe that character trait is good for something, then.
Now to build a business plan around it...


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

The "asm volatile" implementation does have exactly the same
reordering guarantees as the "volatile cast" thing,


I don't think so.


"asm volatile" creates a side effect.


Yeah.


Side effects aren't
allowed to be reordered wrt sequence points.


Yeah.


This is exactly
the same reason as why "volatile accesses" cannot be reordered.


No, the code in that sub-thread I earlier pointed you at *WAS* written
such that there was a sequence point after all the uses of that 
volatile

access cast macro, and _therefore_ we were safe from re-ordering
(behaviour guaranteed by the C standard).


And exactly the same is true for the "asm" version.

Now, one cannot fantasize that "volatile asms" are also sequence 
points.


Sure you can do that.  I don't though.


In fact such an argument would be sadly mistaken, because "sequence
points" are defined by the C standard and it'd be horribly wrong to
even _try_ claiming that the C standard knows about "volatile asms".


That's nonsense.  GCC can extend the C standard any way they
bloody well please -- witness the fact that they added an
extra class of side effects...


Read the relevant GCC documentation.


I did, yes.


No, you didn't read:

http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Read the bit about the need for artificial dependencies, and the 
example

given there:

asm volatile("mtfsf 255,%0" : : "f" (fpenv));
sum = x + y;

The docs explicitly say the addition can be moved before the "volatile
asm". Hopefully, as you know, (x + y) is an C "expression" and hence
a "sequence point" as defined by the standard.


The _end of a full expression_ is a sequence point, not every
expression.  And that is irrelevant here anyway.

It is perfectly fine to compute x+y any time before the
assignment; the C compiler is allowed to compute it _after_
the assignment even, if it could figure out how ;-)

x+y does not contain a side effect, you know.


I know there is also stuff written about "side-effects" there which
_could_ give the same semantic w.r.t. sequence points as the volatile
access casts,


s/could/does/


but hey, it's GCC's own documentation, you obviously can't
find fault with _me_ if there's wrong stuff written in there. Say that
to GCC ...


There's nothing wrong there.


See, "volatile" C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs).


If you find any problems/shortcomings in the GCC documentation,
please file a PR, don't go whine on some unrelated mailing lists.
Thank you.


Semantics of "volatile" as applies to inline
asm, OTOH? You're completely relying on the compiler for that ...


Yes, and?  GCC promises the behaviour it has documented.


[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]


There was (and is) nothing wrong about the "+m" documentation, if
that is what you are talking about.  It could be extended now, to
allow "+m" -- but that takes more than just "fixing" the 
documentation.


No, there was (and is) _everything_ wrong about the "+" documentation 
as

applies to memory-constrained operands. I don't give a whit if it's
some workaround in their gimplifier, or the other, that makes it 
possible

to use "+m" (like the current kernel code does). The docs suggest
otherwise, so there's obviously a clear disconnect between the docs and
actual GCC behaviour.


The documentation simply doesn't say "+m" is allowed.  The code to
allow it was added for the benefit of people who do not read the
documentation.  Documentation for "+m" might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly obvious).

[ You seem to often take issue with _amazingly_ petty and pedantic 
things,

  by the way :-) ]


If you're talking details, you better get them right.  Handwaving is
fine with me as long as you're not purporting you're not.

And I simply cannot stand false assertions.

You can always ignore me if _you_ take issue with _that_ :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Implement atomic{,64}_{read,write}() without volatile

2007-08-17 Thread Segher Boessenkool

Instead, use asm() like all other atomic operations already do.



+static __inline__ long atomic64_read(const atomic_t *v)



+static __inline__ void atomic64_set(atomic_t *v, long i)


s/atomic_t/atomic64_t/ in both lines.  I've edited my copy of the 
patch.


Ouch, sorry about that.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
No it does not have any volatile semantics. atomic_dec() can be 
reordered
at will by the compiler within the current basic unit if you do not 
add a

barrier.


"volatile" has nothing to do with reordering.


If you're talking of "volatile" the type-qualifier keyword, then
http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
otherwise.


I'm not sure what in that mail you mean, but anyway...

Yes, of course, the fact that "volatile" creates a side effect
prevents certain things from being reordered wrt the atomic_dec();
but the atomic_dec() has a side effect *already* so the volatile
doesn't change anything.


atomic_dec() writes
to memory, so it _does_ have "volatile semantics", implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


I don't think an atomic_dec() implemented as an inline "asm volatile"
or one that uses a "forget" macro would have the same re-ordering
guarantees as an atomic_dec() that uses a volatile access cast.


The "asm volatile" implementation does have exactly the same
reordering guarantees as the "volatile cast" thing, if that is
implemented by GCC in the "obvious" way.  Even a "plain" asm()
will do the same.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

atomic_dec() writes
to memory, so it _does_ have "volatile semantics", implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


I don't think an atomic_dec() implemented as an inline "asm volatile"
or one that uses a "forget" macro would have the same re-ordering
guarantees as an atomic_dec() that uses a volatile access cast.


The "asm volatile" implementation does have exactly the same
reordering guarantees as the "volatile cast" thing,


I don't think so.


"asm volatile" creates a side effect.  Side effects aren't
allowed to be reordered wrt sequence points.  This is exactly
the same reason as why "volatile accesses" cannot be reordered.


if that is
implemented by GCC in the "obvious" way.  Even a "plain" asm()
will do the same.


Read the relevant GCC documentation.


I did, yes.


[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]


There was (and is) nothing wrong about the "+m" documentation, if
that is what you are talking about.  It could be extended now, to
allow "+m" -- but that takes more than just "fixing" the documentation.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

#define forget(a)   __asm__ __volatile__ ("" :"=m" (a) :"m" (a))

[ This is exactly equivalent to using "+m" in the constraints, as 
recently
  explained on a GCC list somewhere, in response to the patch in my 
bitops

  series a few weeks back where I thought "+m" was bogus. ]


[It wasn't explained on a GCC list in response to your patch, as
far as I can see -- if I missed it, please point me to an archived
version of it].


http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html


Ah yes, that old thread, thank you.


That's when _I_ came to know how GCC interprets "+m", but probably
this has been explained on those lists multiple times. Who cares,
anyway?


I just couldn't find the thread you meant, I thought I missed
have it, that's all :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

#define forget(a)   __asm__ __volatile__ ("" :"=m" (a) :"m" (a))

[ This is exactly equivalent to using "+m" in the constraints, as 
recently
  explained on a GCC list somewhere, in response to the patch in my 
bitops

  series a few weeks back where I thought "+m" was bogus. ]


[It wasn't explained on a GCC list in response to your patch, as
far as I can see -- if I missed it, please point me to an archived
version of it].

One last time: it isn't equivalent on older (but still supported
by Linux) versions of GCC.  Current versions of GCC allow it, but
it has no documented behaviour at all, so use it at your own risk.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Here, I should obviously admit that the semantics of *(volatile int 
*)&
aren't any neater or well-defined in the _language standard_ at all. 
The

standard does say (verbatim) "precisely what constitutes as access to
object of volatile-qualified type is implementation-defined", but GCC
does help us out here by doing the right thing.


Where do you get that idea?


Try a testcase (experimentally verify).


That doesn't prove anything.  Experiments can only disprove
things.


GCC manual, section 6.1, "When
is a Volatile Object Accessed?" doesn't say anything of the
kind.


True, "implementation-defined" as per the C standard _is_ supposed to 
mean
"unspecified behaviour where each implementation documents how the 
choice

is made". So ok, probably GCC isn't "documenting" this
implementation-defined behaviour which it is supposed to, but can't 
really

fault them much for this, probably.


GCC _is_ documenting this, namely in this section 6.1.  It doesn't
mention volatile-casted stuff.  Draw your own conclusions.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

Now the second wording *IS* technically correct, but come on, it's
24 words long whereas the original one was 3 -- and hopefully anybody
reading the shorter phrase *would* have known anyway what was meant,
without having to be pedantic about it :-)


Well you were talking pretty formal (and detailed) stuff, so
IMHO it's good to have that exactly correct.  Sure it's nicer
to use small words most of the time :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

In a reasonable world, gcc should just make that be (on x86)

addl $1,i(%rip)

on x86-64, which is indeed what it does without the volatile. But with 
the
volatile, the compiler gets really nervous, and doesn't dare do it in 
one

instruction, and thus generates crap like

movli(%rip), %eax
addl$1, %eax
movl%eax, i(%rip)

instead. For no good reason, except that "volatile" just doesn't have 
any
good/clear semantics for the compiler, so most compilers will just 
make it

be "I will not touch this access in any way, shape, or form". Including
even trivially correct instruction optimization/combination.


It's just a (target-specific, perhaps) missed-optimisation kind
of bug in GCC.  Care to file a bug report?


but is
(again) something that gcc doesn't dare do, since "i" is volatile.


Just nobody taught it it can do this; perhaps no one wanted to
add optimisations like that, maybe with a reasoning like "people
who hit the go-slow-in-unspecified-ways button should get what
they deserve" ;-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

(and yes, it is perfectly legitimate to
want a non-volatile read for a data type that you also want to do
atomic RMW operations on)


...which is undefined behaviour in C (and GCC) when that data is
declared volatile, which is a good argument against implementing
atomics that way in itself.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

Of course, since *normal* accesses aren't necessarily limited wrt
re-ordering, the question then becomes one of "with regard to *what* 
does

it limit re-ordering?".

A C compiler that re-orders two different volatile accesses that have a
sequence point in between them is pretty clearly a buggy compiler. So 
at a

minimum, it limits re-ordering wrt other volatiles (assuming sequence
points exists). It also means that the compiler cannot move it
speculatively across conditionals, but other than that it's starting to
get fuzzy.


This is actually really well-defined in C, not fuzzy at all.
"Volatile accesses" are a side effect, and no side effects can
be reordered with respect to sequence points.  The side effects
that matter in the kernel environment are: 1) accessing a volatile
object; 2) modifying an object; 3) volatile asm(); 4) calling a
function that does any of these.

We certainly should avoid volatile whenever possible, but "because
it's fuzzy wrt reordering" is not a reason -- all alternatives have
exactly the same issues.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
atomic_dec() already has volatile behavior everywhere, so this is 
semantically
okay, but this code (and any like it) should be calling cpu_relax() 
each
iteration through the loop, unless there's a compelling reason not 
to.  I'll
allow that for some hardware drivers (possibly this one) such a 
compelling
reason may exist, but hardware-independent core subsystems probably 
have no

excuse.


No it does not have any volatile semantics. atomic_dec() can be 
reordered
at will by the compiler within the current basic unit if you do not 
add a

barrier.


"volatile" has nothing to do with reordering.  atomic_dec() writes
to memory, so it _does_ have "volatile semantics", implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew 
where they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.



So why is this a good tradeoff?

It certainly is better than disabling all compiler optimisations!


It's easy to be better than something really stupid :)


Sure, it wasn't me who made the comparison though.


So i386 and x86-64 don't have volatiles there, and it saves them a
few K of kernel text.


Which has to be investigated.  A few kB is a lot more than expected.


What you need to justify is why it is a good
tradeoff to make them volatile (which btw, is much harder to go
the other way after we let people make those assumptions).


My point is that people *already* made those assumptions.  There
are two ways to clean up this mess:

1) Have the "volatile" semantics by default, change the users
   that don't need it;
2) Have "non-volatile" semantics by default, change the users
   that do need it.

Option 2) randomly breaks stuff all over the place, option 1)
doesn't.  Yeah 1) could cause some extremely minor speed or
code size regression, but only temporarily until everything has
been audited.


I also think that just adding things to APIs in the hope it might fix
up some bugs isn't really a good road to go down. Where do you stop?

I look at it the other way: keeping the "volatile" semantics in
atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;


Yeah, but we could add lots of things to help prevent bugs and
would never be included. I would also contend that it helps _hide_
bugs and encourages people to be lazy when thinking about these
things.


Sure.  We aren't _adding_ anything here though, not on the platforms
where it is most likely to show up, anyway.


Also, you dismiss the fact that we'd actually be *adding* volatile
semantics back to the 2 most widely tested architectures (in terms
of test time, number of testers, variety of configurations, and
coverage of driver code).


I'm not dismissing that.  x86 however is one of the few architectures
where mistakenly leaving out a "volatile" will not easily show up on
user testing, since the compiler will very often produce a memory
reference anyway because it has no registers to play with.


This is a very important different from
just keeping volatile semantics because it is basically a one-way
API change.


That's a good point.  Maybe we should create _two_ new APIs, one
explicitly going each way.


certainly most people expect that behaviour, and also that behaviour
is *needed* in some places and no other interface provides that
functionality.


I don't know that most people would expect that behaviour.


I didn't conduct any formal poll either :-)


Is there any documentation anywhere that would suggest this?


Not really I think, no.  But not the other way around, either.
Most uses of it seem to expect it though.


[some confusion about barriers wrt atomics snipped]


What were you confused about?


Me?  Not much.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew 
where they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.



So why is this a good tradeoff?

It certainly is better than disabling all compiler optimisations!


It's easy to be better than something really stupid :)


Sure, it wasn't me who made the comparison though.


So i386 and x86-64 don't have volatiles there, and it saves them a
few K of kernel text.


Which has to be investigated.  A few kB is a lot more than expected.


What you need to justify is why it is a good
tradeoff to make them volatile (which btw, is much harder to go
the other way after we let people make those assumptions).


My point is that people *already* made those assumptions.  There
are two ways to clean up this mess:

1) Have the volatile semantics by default, change the users
   that don't need it;
2) Have non-volatile semantics by default, change the users
   that do need it.

Option 2) randomly breaks stuff all over the place, option 1)
doesn't.  Yeah 1) could cause some extremely minor speed or
code size regression, but only temporarily until everything has
been audited.


I also think that just adding things to APIs in the hope it might fix
up some bugs isn't really a good road to go down. Where do you stop?

I look at it the other way: keeping the volatile semantics in
atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;


Yeah, but we could add lots of things to help prevent bugs and
would never be included. I would also contend that it helps _hide_
bugs and encourages people to be lazy when thinking about these
things.


Sure.  We aren't _adding_ anything here though, not on the platforms
where it is most likely to show up, anyway.


Also, you dismiss the fact that we'd actually be *adding* volatile
semantics back to the 2 most widely tested architectures (in terms
of test time, number of testers, variety of configurations, and
coverage of driver code).


I'm not dismissing that.  x86 however is one of the few architectures
where mistakenly leaving out a volatile will not easily show up on
user testing, since the compiler will very often produce a memory
reference anyway because it has no registers to play with.


This is a very important different from
just keeping volatile semantics because it is basically a one-way
API change.


That's a good point.  Maybe we should create _two_ new APIs, one
explicitly going each way.


certainly most people expect that behaviour, and also that behaviour
is *needed* in some places and no other interface provides that
functionality.


I don't know that most people would expect that behaviour.


I didn't conduct any formal poll either :-)


Is there any documentation anywhere that would suggest this?


Not really I think, no.  But not the other way around, either.
Most uses of it seem to expect it though.


[some confusion about barriers wrt atomics snipped]


What were you confused about?


Me?  Not much.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
atomic_dec() already has volatile behavior everywhere, so this is 
semantically
okay, but this code (and any like it) should be calling cpu_relax() 
each
iteration through the loop, unless there's a compelling reason not 
to.  I'll
allow that for some hardware drivers (possibly this one) such a 
compelling
reason may exist, but hardware-independent core subsystems probably 
have no

excuse.


No it does not have any volatile semantics. atomic_dec() can be 
reordered
at will by the compiler within the current basic unit if you do not 
add a

barrier.


volatile has nothing to do with reordering.  atomic_dec() writes
to memory, so it _does_ have volatile semantics, implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

Of course, since *normal* accesses aren't necessarily limited wrt
re-ordering, the question then becomes one of with regard to *what* 
does

it limit re-ordering?.

A C compiler that re-orders two different volatile accesses that have a
sequence point in between them is pretty clearly a buggy compiler. So 
at a

minimum, it limits re-ordering wrt other volatiles (assuming sequence
points exists). It also means that the compiler cannot move it
speculatively across conditionals, but other than that it's starting to
get fuzzy.


This is actually really well-defined in C, not fuzzy at all.
Volatile accesses are a side effect, and no side effects can
be reordered with respect to sequence points.  The side effects
that matter in the kernel environment are: 1) accessing a volatile
object; 2) modifying an object; 3) volatile asm(); 4) calling a
function that does any of these.

We certainly should avoid volatile whenever possible, but because
it's fuzzy wrt reordering is not a reason -- all alternatives have
exactly the same issues.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

(and yes, it is perfectly legitimate to
want a non-volatile read for a data type that you also want to do
atomic RMW operations on)


...which is undefined behaviour in C (and GCC) when that data is
declared volatile, which is a good argument against implementing
atomics that way in itself.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

In a reasonable world, gcc should just make that be (on x86)

addl $1,i(%rip)

on x86-64, which is indeed what it does without the volatile. But with 
the
volatile, the compiler gets really nervous, and doesn't dare do it in 
one

instruction, and thus generates crap like

movli(%rip), %eax
addl$1, %eax
movl%eax, i(%rip)

instead. For no good reason, except that volatile just doesn't have 
any
good/clear semantics for the compiler, so most compilers will just 
make it

be I will not touch this access in any way, shape, or form. Including
even trivially correct instruction optimization/combination.


It's just a (target-specific, perhaps) missed-optimisation kind
of bug in GCC.  Care to file a bug report?


but is
(again) something that gcc doesn't dare do, since i is volatile.


Just nobody taught it it can do this; perhaps no one wanted to
add optimisations like that, maybe with a reasoning like people
who hit the go-slow-in-unspecified-ways button should get what
they deserve ;-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

Now the second wording *IS* technically correct, but come on, it's
24 words long whereas the original one was 3 -- and hopefully anybody
reading the shorter phrase *would* have known anyway what was meant,
without having to be pedantic about it :-)


Well you were talking pretty formal (and detailed) stuff, so
IMHO it's good to have that exactly correct.  Sure it's nicer
to use small words most of the time :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

#define forget(a)   __asm__ __volatile__ ( :=m (a) :m (a))

[ This is exactly equivalent to using +m in the constraints, as 
recently
  explained on a GCC list somewhere, in response to the patch in my 
bitops

  series a few weeks back where I thought +m was bogus. ]


[It wasn't explained on a GCC list in response to your patch, as
far as I can see -- if I missed it, please point me to an archived
version of it].

One last time: it isn't equivalent on older (but still supported
by Linux) versions of GCC.  Current versions of GCC allow it, but
it has no documented behaviour at all, so use it at your own risk.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Here, I should obviously admit that the semantics of *(volatile int 
*)
aren't any neater or well-defined in the _language standard_ at all. 
The

standard does say (verbatim) precisely what constitutes as access to
object of volatile-qualified type is implementation-defined, but GCC
does help us out here by doing the right thing.


Where do you get that idea?


Try a testcase (experimentally verify).


That doesn't prove anything.  Experiments can only disprove
things.


GCC manual, section 6.1, When
is a Volatile Object Accessed? doesn't say anything of the
kind.


True, implementation-defined as per the C standard _is_ supposed to 
mean
unspecified behaviour where each implementation documents how the 
choice

is made. So ok, probably GCC isn't documenting this
implementation-defined behaviour which it is supposed to, but can't 
really

fault them much for this, probably.


GCC _is_ documenting this, namely in this section 6.1.  It doesn't
mention volatile-casted stuff.  Draw your own conclusions.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

#define forget(a)   __asm__ __volatile__ ( :=m (a) :m (a))

[ This is exactly equivalent to using +m in the constraints, as 
recently
  explained on a GCC list somewhere, in response to the patch in my 
bitops

  series a few weeks back where I thought +m was bogus. ]


[It wasn't explained on a GCC list in response to your patch, as
far as I can see -- if I missed it, please point me to an archived
version of it].


http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html


Ah yes, that old thread, thank you.


That's when _I_ came to know how GCC interprets +m, but probably
this has been explained on those lists multiple times. Who cares,
anyway?


I just couldn't find the thread you meant, I thought I missed
have it, that's all :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

atomic_dec() writes
to memory, so it _does_ have volatile semantics, implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


I don't think an atomic_dec() implemented as an inline asm volatile
or one that uses a forget macro would have the same re-ordering
guarantees as an atomic_dec() that uses a volatile access cast.


The asm volatile implementation does have exactly the same
reordering guarantees as the volatile cast thing,


I don't think so.


asm volatile creates a side effect.  Side effects aren't
allowed to be reordered wrt sequence points.  This is exactly
the same reason as why volatile accesses cannot be reordered.


if that is
implemented by GCC in the obvious way.  Even a plain asm()
will do the same.


Read the relevant GCC documentation.


I did, yes.


[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]


There was (and is) nothing wrong about the +m documentation, if
that is what you are talking about.  It could be extended now, to
allow +m -- but that takes more than just fixing the documentation.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
No it does not have any volatile semantics. atomic_dec() can be 
reordered
at will by the compiler within the current basic unit if you do not 
add a

barrier.


volatile has nothing to do with reordering.


If you're talking of volatile the type-qualifier keyword, then
http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
otherwise.


I'm not sure what in that mail you mean, but anyway...

Yes, of course, the fact that volatile creates a side effect
prevents certain things from being reordered wrt the atomic_dec();
but the atomic_dec() has a side effect *already* so the volatile
doesn't change anything.


atomic_dec() writes
to memory, so it _does_ have volatile semantics, implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


I don't think an atomic_dec() implemented as an inline asm volatile
or one that uses a forget macro would have the same re-ordering
guarantees as an atomic_dec() that uses a volatile access cast.


The asm volatile implementation does have exactly the same
reordering guarantees as the volatile cast thing, if that is
implemented by GCC in the obvious way.  Even a plain asm()
will do the same.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Implement atomic{,64}_{read,write}() without volatile

2007-08-17 Thread Segher Boessenkool

Instead, use asm() like all other atomic operations already do.



+static __inline__ long atomic64_read(const atomic_t *v)



+static __inline__ void atomic64_set(atomic_t *v, long i)


s/atomic_t/atomic64_t/ in both lines.  I've edited my copy of the 
patch.


Ouch, sorry about that.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

The asm volatile implementation does have exactly the same
reordering guarantees as the volatile cast thing,


I don't think so.


asm volatile creates a side effect.


Yeah.


Side effects aren't
allowed to be reordered wrt sequence points.


Yeah.


This is exactly
the same reason as why volatile accesses cannot be reordered.


No, the code in that sub-thread I earlier pointed you at *WAS* written
such that there was a sequence point after all the uses of that 
volatile

access cast macro, and _therefore_ we were safe from re-ordering
(behaviour guaranteed by the C standard).


And exactly the same is true for the asm version.

Now, one cannot fantasize that volatile asms are also sequence 
points.


Sure you can do that.  I don't though.


In fact such an argument would be sadly mistaken, because sequence
points are defined by the C standard and it'd be horribly wrong to
even _try_ claiming that the C standard knows about volatile asms.


That's nonsense.  GCC can extend the C standard any way they
bloody well please -- witness the fact that they added an
extra class of side effects...


Read the relevant GCC documentation.


I did, yes.


No, you didn't read:

http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Read the bit about the need for artificial dependencies, and the 
example

given there:

asm volatile(mtfsf 255,%0 : : f (fpenv));
sum = x + y;

The docs explicitly say the addition can be moved before the volatile
asm. Hopefully, as you know, (x + y) is an C expression and hence
a sequence point as defined by the standard.


The _end of a full expression_ is a sequence point, not every
expression.  And that is irrelevant here anyway.

It is perfectly fine to compute x+y any time before the
assignment; the C compiler is allowed to compute it _after_
the assignment even, if it could figure out how ;-)

x+y does not contain a side effect, you know.


I know there is also stuff written about side-effects there which
_could_ give the same semantic w.r.t. sequence points as the volatile
access casts,


s/could/does/


but hey, it's GCC's own documentation, you obviously can't
find fault with _me_ if there's wrong stuff written in there. Say that
to GCC ...


There's nothing wrong there.


See, volatile C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs).


If you find any problems/shortcomings in the GCC documentation,
please file a PR, don't go whine on some unrelated mailing lists.
Thank you.


Semantics of volatile as applies to inline
asm, OTOH? You're completely relying on the compiler for that ...


Yes, and?  GCC promises the behaviour it has documented.


[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]


There was (and is) nothing wrong about the +m documentation, if
that is what you are talking about.  It could be extended now, to
allow +m -- but that takes more than just fixing the 
documentation.


No, there was (and is) _everything_ wrong about the + documentation 
as

applies to memory-constrained operands. I don't give a whit if it's
some workaround in their gimplifier, or the other, that makes it 
possible

to use +m (like the current kernel code does). The docs suggest
otherwise, so there's obviously a clear disconnect between the docs and
actual GCC behaviour.


The documentation simply doesn't say +m is allowed.  The code to
allow it was added for the benefit of people who do not read the
documentation.  Documentation for +m might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly obvious).

[ You seem to often take issue with _amazingly_ petty and pedantic 
things,

  by the way :-) ]


If you're talking details, you better get them right.  Handwaving is
fine with me as long as you're not purporting you're not.

And I simply cannot stand false assertions.

You can always ignore me if _you_ take issue with _that_ :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

The documentation simply doesn't say +m is allowed.  The code to
allow it was added for the benefit of people who do not read the
documentation.  Documentation for +m might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly obvious).


Huh?

If the (current) documentation doesn't match up with the (current)
code, then _at least one_ of them has to be (as of current) wrong.

I wonder how could you even try to disagree with that.


Easy.

The GCC documentation you're referring to is the user's manual.
See the blurb on the first page:

This manual documents how to use the GNU compilers, as well as their
features and incompatibilities, and how to report bugs.  It corresponds
to GCC version 4.3.0.  The internals of the GNU compilers, including
how to port them to new targets and some information about how to write
front ends for new languages, are documented in a separate manual.

_How to use_.  This documentation doesn't describe in minute detail
everything the compiler does (see the source code for that -- no, it
isn't described in the internals manual either).

If it doesn't tell you how to use +m, and even tells you _not_ to
use it, maybe that is what it means to say?  It doesn't mean +m
doesn't actually do something.  It also doesn't mean it does what
you think it should do.  It might do just that of course.  But treating
writing C code as an empirical science isn't such a smart idea.


And I didn't go whining about this ... you asked me. (I think I'd said
something to the effect of GCC docs are often wrong,


No need to guess at what you said, even if you managed to delete
your own mail already, there are plenty of free web-based archives
around.  You said:


See, volatile C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs).


and that to me reads as complaining that the ISO C standard isn't
very good and that the GCC documentation is 10**5482 times worse
even.  Which of course is hyperbole and cannot be true.  It also
isn't helpful in any way or form for anyone on this list.  I call
that whining.


which is true,


Yes, documentation of that size often has shortcomings.  No surprise
there.  However, great effort is made to make it better documentation,
and especially to keep it up to date; if you find any errors or
omissions, please report them.  There are many ways how to do that,
see the GCC homepage./end-of-marketing-blurb


but probably you feel saying that is not allowed on non-gcc lists?)


You're allowed to say whatever you want.  Let's have a quote again
shall we?  I said:


If you find any problems/shortcomings in the GCC documentation,
please file a PR, don't go whine on some unrelated mailing lists.
Thank you.


I read that as a friendly request, not a prohibition.  Well maybe
not actually friendly, more a bit angry.  A request, either way.


As for the PR


Problem report, a bugzilla ticket.  Sorry for using terminology
unknown to you.


you're requesting me to file with GCC for this, that
gcc-patches@ thread did precisely that


Actually not -- PRs make sure issues aren't forgotten (although
they might gather dust, sure).  But yes, submitting patches is a
Great Thing(tm).


and more (submitted a patch to
said documentation -- and no, saying documentation might get added
later is totally bogus and nonsensical -- documentation exists to
document current behaviour, not past).


When code like you want to write becomes a supported feature, that
will be reflected in the user manual.  It is completely nonsensical
to expect everything that is *not* a supported feature to be mentioned
there.


I wouldn't have replied, really, if you weren't so provoking.


Hey, maybe that character trait is good for something, then.
Now to build a business plan around it...


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/23] document preferred use of volatile with atomic_t

2007-08-16 Thread Segher Boessenkool

But barriers force a flush of *everything* in scope,


Nonsense, whatever "flush" is supposed to mean here.

If you really, *really* distrust the compiler that much, you shouldn't 
be using barrier, since that uses volatile under the hood too.  You 
should just go ahead and implement the atomic operations in assembler, 
like Segher Boessenkool did for powerpc in response to my previous 
patchset.


Puh-lease.  I DO NOT DISTRUST THE COMPILER, I just don't assume
it will do whatever I would like it to do without telling it.
It's a machine you know, and it is very well documented.

(And most barriers don't (need to) use volatile).

Implementing the atomic ops in asm loses exactly *no* semantics,
and it doesn't add restrictions either; it does allow you however
to access an atomic_t with normal loads/stores independently, if
you so choose.  The only valid issue brought up so far is Russell
King's comment that GCC cannot schedule the machine instructions
in and around an asm() perfectly, it doesn't look inside the asm()
after all; but the situation isn't quite as sever as he suggests
(GCC _does_ know you are performing loads/stores, etc.); more about
that later, when I've finished researching the current state of
that stuff.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

Here, I should obviously admit that the semantics of *(volatile int *)&
aren't any neater or well-defined in the _language standard_ at all. 
The

standard does say (verbatim) "precisely what constitutes as access to
object of volatile-qualified type is implementation-defined", but GCC
does help us out here by doing the right thing.


Where do you get that idea?  GCC manual, section 6.1, "When
is a Volatile Object Accessed?" doesn't say anything of the
kind.  PR33053 and some others.


Honestly, given such confusion, and the propensity of the "volatile"
type-qualifier keyword to be ill-defined (or at least poorly 
understood,
often inconsistently implemented), I'd (again) express my opinion that 
it

would be best to avoid its usage, given other alternatives do exist.


Yeah.  Or we can have an email thread like this every time
someone proposes a patch that uses an atomic variable ;-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

Note that "volatile"
is a type-qualifier, not a type itself, so a cast of the _object_ 
itself
to a qualified-type i.e. (volatile int) would not make the access 
itself

volatile-qualified.


There is no such thing as "volatile-qualified access" defined
anywhere; there only is the concept of a "volatile-qualified
*object*".

To serve our purposes, it is necessary for us to take the address of 
this
(non-volatile) object, cast the resulting _pointer_ to the 
corresponding
volatile-qualified pointer-type, and then dereference it. This makes 
that
particular _access_ be volatile-qualified, without the object itself 
being
such. Also note that the (dereferenced) result is also a valid lvalue 
and

hence can be used in "*(volatile int *) = b;" kind of construction
(which we use for the atomic_set case).


There is a quite convincing argument that such an access _is_ an
access to a volatile object; see GCC PR21568 comment #9.  This
probably isn't the last word on the matter though...


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool
I can't speak for this particular case, but there could be similar 
code

examples elsewhere, where we do the atomic ops on an atomic_t object
inside a higher-level locking scheme that would take care of the kind 
of
problem you're referring to here. It would be useful for such or 
similar
code if the compiler kept the value of that atomic object in a 
register.


If there is a higher-level locking scheme then there is no point to
using atomic_t variables.  Atomic_t is specifically for the situation
where multiple CPUs are updating a variable without locking.


And don't forget about the case where it is an I/O device that is
updating the memory (in buffer descriptors or similar).  The driver
needs to do a "volatile" atomic read to get at the most recent version
of that data, which can be important for optimising latency (or 
throughput

even).  There is no other way the kernel can get that info -- doing an
MMIO read is way way too expensive.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

I'd go so far as to say that anywhere where you want a non-"volatile"
atomic_read, either your code is buggy, or else an int would work just
as well.


Even, the only way to implement a "non-volatile" atomic_read() is
essentially as a plain int (you can do some tricks so you cannot
assign to the result and stuff like that, but that's not the issue
here).

So if that would be the behaviour we wanted, just get rid of that
whole atomic_read() thing, so no one can misuse it anymore.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler "do not
remove this asm even if you don't need any of its outputs".

It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an asm
that needs it simply _is_ a bug :-)


Yep.  And the reason it is a bug is that it fails to disable
the relevant compiler optimizations.  So I suspect that we might
actually be saying the same thing here.


We're not saying the same thing, but we do agree :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew where 
they



By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.

Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.


So why is this a good tradeoff?


It certainly is better than disabling all compiler optimisations!


I also think that just adding things to APIs in the hope it might fix
up some bugs isn't really a good road to go down. Where do you stop?


I look at it the other way: keeping the "volatile" semantics in
atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;
certainly most people expect that behaviour, and also that behaviour
is *needed* in some places and no other interface provides that
functionality.


[some confusion about barriers wrt atomics snipped]


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew where 
they



By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.

Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.


So why is this a good tradeoff?


It certainly is better than disabling all compiler optimisations!


I also think that just adding things to APIs in the hope it might fix
up some bugs isn't really a good road to go down. Where do you stop?


I look at it the other way: keeping the volatile semantics in
atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;
certainly most people expect that behaviour, and also that behaviour
is *needed* in some places and no other interface provides that
functionality.


[some confusion about barriers wrt atomics snipped]


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler do not
remove this asm even if you don't need any of its outputs.

It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an asm
that needs it simply _is_ a bug :-)


Yep.  And the reason it is a bug is that it fails to disable
the relevant compiler optimizations.  So I suspect that we might
actually be saying the same thing here.


We're not saying the same thing, but we do agree :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

I'd go so far as to say that anywhere where you want a non-volatile
atomic_read, either your code is buggy, or else an int would work just
as well.


Even, the only way to implement a non-volatile atomic_read() is
essentially as a plain int (you can do some tricks so you cannot
assign to the result and stuff like that, but that's not the issue
here).

So if that would be the behaviour we wanted, just get rid of that
whole atomic_read() thing, so no one can misuse it anymore.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool
I can't speak for this particular case, but there could be similar 
code

examples elsewhere, where we do the atomic ops on an atomic_t object
inside a higher-level locking scheme that would take care of the kind 
of
problem you're referring to here. It would be useful for such or 
similar
code if the compiler kept the value of that atomic object in a 
register.


If there is a higher-level locking scheme then there is no point to
using atomic_t variables.  Atomic_t is specifically for the situation
where multiple CPUs are updating a variable without locking.


And don't forget about the case where it is an I/O device that is
updating the memory (in buffer descriptors or similar).  The driver
needs to do a volatile atomic read to get at the most recent version
of that data, which can be important for optimising latency (or 
throughput

even).  There is no other way the kernel can get that info -- doing an
MMIO read is way way too expensive.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

Note that volatile
is a type-qualifier, not a type itself, so a cast of the _object_ 
itself
to a qualified-type i.e. (volatile int) would not make the access 
itself

volatile-qualified.


There is no such thing as volatile-qualified access defined
anywhere; there only is the concept of a volatile-qualified
*object*.

To serve our purposes, it is necessary for us to take the address of 
this
(non-volatile) object, cast the resulting _pointer_ to the 
corresponding
volatile-qualified pointer-type, and then dereference it. This makes 
that
particular _access_ be volatile-qualified, without the object itself 
being
such. Also note that the (dereferenced) result is also a valid lvalue 
and

hence can be used in *(volatile int *)a = b; kind of construction
(which we use for the atomic_set case).


There is a quite convincing argument that such an access _is_ an
access to a volatile object; see GCC PR21568 comment #9.  This
probably isn't the last word on the matter though...


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

Here, I should obviously admit that the semantics of *(volatile int *)
aren't any neater or well-defined in the _language standard_ at all. 
The

standard does say (verbatim) precisely what constitutes as access to
object of volatile-qualified type is implementation-defined, but GCC
does help us out here by doing the right thing.


Where do you get that idea?  GCC manual, section 6.1, When
is a Volatile Object Accessed? doesn't say anything of the
kind.  PR33053 and some others.


Honestly, given such confusion, and the propensity of the volatile
type-qualifier keyword to be ill-defined (or at least poorly 
understood,
often inconsistently implemented), I'd (again) express my opinion that 
it

would be best to avoid its usage, given other alternatives do exist.


Yeah.  Or we can have an email thread like this every time
someone proposes a patch that uses an atomic variable ;-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/23] document preferred use of volatile with atomic_t

2007-08-16 Thread Segher Boessenkool

But barriers force a flush of *everything* in scope,


Nonsense, whatever flush is supposed to mean here.

If you really, *really* distrust the compiler that much, you shouldn't 
be using barrier, since that uses volatile under the hood too.  You 
should just go ahead and implement the atomic operations in assembler, 
like Segher Boessenkool did for powerpc in response to my previous 
patchset.


Puh-lease.  I DO NOT DISTRUST THE COMPILER, I just don't assume
it will do whatever I would like it to do without telling it.
It's a machine you know, and it is very well documented.

(And most barriers don't (need to) use volatile).

Implementing the atomic ops in asm loses exactly *no* semantics,
and it doesn't add restrictions either; it does allow you however
to access an atomic_t with normal loads/stores independently, if
you so choose.  The only valid issue brought up so far is Russell
King's comment that GCC cannot schedule the machine instructions
in and around an asm() perfectly, it doesn't look inside the asm()
after all; but the situation isn't quite as sever as he suggests
(GCC _does_ know you are performing loads/stores, etc.); more about
that later, when I've finished researching the current state of
that stuff.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

A volatile default would disable optimizations for atomic_read.
atomic_read without volatile would allow for full optimization by the
compiler. Seems that this is what one wants in many cases.


Name one such case.

An atomic_read should do a load from memory.  If the programmer puts
an atomic_read() in the code then the compiler should emit a load for
it, not re-use a value returned by a previous atomic_read.  I do not
believe it would ever be useful for the compiler to collapse two
atomic_read statements into a single load.


An atomic_read() implemented as a "normal" C variable read would
allow that read to be combined with another "normal" read from
that variable.  This could perhaps be marginally useful, although
I'd bet you cannot see it unless counting cycles on a simulator
or counting bits in the binary size.

With an asm() implementation, the compiler can not do this; with
a "volatile" implementation (either volatile variable or volatile-cast),
this invokes undefined behaviour (in both C and GCC).


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re:

2007-08-15 Thread Segher Boessenkool

"compilation unit" is a C standard term.  It typically boils down
to "single .c file".


As you mentioned later, "single .c file with all the other files 
(headers
or other .c files) that it pulls in via #include" is actually 
"translation

unit", both in the C standard as well as gcc docs.


Yeah.  "single .c file after preprocessing".  Same thing :-)


"Compilation unit"
doesn't seem to be nearly as standard a term, though in most places it
is indeed meant to be same as "translation unit", but with the new gcc
inter-module-analysis stuff that you referred to above, I suspect one 
may
reasonably want to call a "compilation unit" as all that the compiler 
sees

at a given instant.


That would be a bit confusing, would it not?  They'd better find
some better name for that if they want to name it at all (remember,
none of these optimisations should have any effect on the semantics
of the program, you just get fewer .o files etc.).


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew where 
they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Precisely the point -- use of volatile (whether in casts or on asms)
in these cases are intended to disable those optimizations likely to
result in heisenbugs.


The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler "do not
remove this asm even if you don't need any of its outputs".

It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an asm
that needs it simply _is_ a bug :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew where 
they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.


Last I checked, the Linux kernel build system did compile each .c 
file

as a separate compilation unit.


I have some patches to use -combine -fwhole-program for Linux.
Highly experimental, you need a patched bleeding edge toolchain.
If there's interest I'll clean it up and put it online.

David Woodhouse had some similar patches about a year ago.


Sounds exciting...  ;-)


Yeah, the breakage is *quite* spectacular :-)


In many cases, the compiler also has to assume that
msleep_interruptible()
might call back into a function in the current compilation unit, 
thus

possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Yep.  For example, if it knows the current value of a given such 
local

variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.


Or the most common thing: if neither the address of the translation-
unit local variable nor the address of any function writing to that
variable can "escape" from that translation unit, nothing outside
the translation unit can write to the variable.


But there is usually at least one externally callable function in
a .c file.


Of course, but often none of those will (indirectly) write a certain
static variable.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Please check the definition of "cache coherence".


Which of the twelve thousand such definitions?  :-)

Summary: the CPU is indeed within its rights to execute loads and 
stores
to a single variable out of order, -but- only if it gets the same 
result
that it would have obtained by executing them in order.  Which means 
that

any reordering of accesses by a single CPU to a single variable will be
invisible to the software.


I'm still not sure if that applies to all architectures.
Doesn't matter anyway, let's kill this thread :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Possibly these were too trivial to expose any potential problems that 
you
may have been referring to, so would be helpful if you could write a 
more

concrete example / sample code.


The trick is to have a sufficiently complicated expression to force
the compiler to run out of registers.


You can use -ffixed-XXX to keep the testcase simple.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.


Last I checked, the Linux kernel build system did compile each .c file
as a separate compilation unit.


I have some patches to use -combine -fwhole-program for Linux.
Highly experimental, you need a patched bleeding edge toolchain.
If there's interest I'll clean it up and put it online.

David Woodhouse had some similar patches about a year ago.


In many cases, the compiler also has to assume that
msleep_interruptible()
might call back into a function in the current compilation unit, thus
possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Yep.  For example, if it knows the current value of a given such local
variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.


Or the most common thing: if neither the address of the translation-
unit local variable nor the address of any function writing to that
variable can "escape" from that translation unit, nothing outside
the translation unit can write to the variable.

At least given that gcc doesn't know about multiple threads of 
execution!


Heh, only about the threads it creates itself (not relevant to
the kernel, for sure :-) )


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Of course, if we find there are more callers in the kernel who want 
the

volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions 
to

somethine else, say "atomic_read_nonvolatile" for all I care.


Do we really need another set of APIs?


Well, if there's one set of users who do care about volatile behaviour,
and another set that doesn't, it only sounds correct to provide both
those APIs, instead of forcing one behaviour on all users.


But since there currently is only one such API, and there are
users expecting the stronger behaviour, the only sane thing to
do is let the API provide that behaviour.  You can always add
a new API with weaker behaviour later, and move users that are
okay with it over to that new API.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
I think this was just terminology confusion here again. Isn't "any 
code

that it cannot currently see" the same as "another compilation unit",
and wouldn't the "compilation unit" itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
"definition" for "compilation unit" (in gcc lingo, possibly?)


This is indeed my understanding -- "compilation unit" is whatever the
compiler looks at in one go.  I have heard the word "module" used for
the minimal compilation unit covering a single .c file and everything
that it #includes, but there might be a better name for this.


Yes, that's what's called "compilation unit" :-)

[/me double checks]

Erm, the C standard actually calls it "translation unit".

To be exact, to avoid any more confusion:

5.1.1.1/1:
A C program need not all be translated at the same time. The
text of the program is kept in units called source files, (or
preprocessing files) in this International Standard. A source
file together with all the headers and source files included
via the preprocessing directive #include is known as a
preprocessing translation unit. After preprocessing, a
preprocessing translation unit is called a translation unit.



Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point.  This does not mean other accesses cannot be
reordered wrt the volatile access.

If the abstract machine would do an access to a volatile-
qualified object, the generated machine code will do that
access too.  But, for example, it can still be optimised
away by the compiler, if it can prove it is allowed to.


As (now) indicated above, I had meant multiple volatile accesses to
the same object, obviously.


Yes, accesses to volatile objects are never reordered with
respect to each other.


BTW:

#define atomic_read(a)  (*(volatile int *)&(a))
#define atomic_set(a,i) (*(volatile int *)&(a) = (i))

int a;

void func(void)
{
int b;

b = atomic_read(a);
atomic_set(a, 20);
b = atomic_read(a);
}

gives:

func:
pushl   %ebp
movla, %eax
movl%esp, %ebp
movl$20, a
movla, %eax
popl%ebp
ret

so the first atomic_read() wasn't optimized away.


Of course.  It is executed by the abstract machine, so
it will be executed by the actual machine.  On the other
hand, try

b = 0;
if (b)
b = atomic_read(a);

or similar.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)


I think this was just terminology confusion here again. Isn't "any code
that it cannot currently see" the same as "another compilation unit",


It is not; try  gcc -combine  or the upcoming link-time optimisation
stuff, for example.


and wouldn't the "compilation unit" itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
"definition" for "compilation unit" (in gcc lingo, possibly?)


"compilation unit" is a C standard term.  It typically boils down
to "single .c file".


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

How does the compiler know that msleep() has got barrier()s?


Because msleep_interruptible() is in a separate compilation unit,
the compiler has to assume that it might modify any arbitrary global.


No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.

What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)

In many cases, the compiler also has to assume that 
msleep_interruptible()

might call back into a function in the current compilation unit, thus
possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use
it.


rmb() orders *any* two reads; that includes two reads from the same
location.


If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.


That's true of course, although there is no real guarantee for that.


A CPU that did not provide this property ("cache coherence") would be
most emphatically reviled.


That doesn't have anything to do with coherency as far as I can see.

It's just about the order in which a CPU (speculatively) performs the 
loads

(which isn't necessarily the same as the order in which it executes the
corresponding instructions, even).


So we are pretty safe assuming that CPUs
will provide it.


Yeah, pretty safe.  I just don't like undocumented assumptions :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use 
it.


rmb() orders *any* two reads; that includes two reads from the same
location.


If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.


That's true of course, although there is no real guarantee for that.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool
Well if there is only one memory location involved, then smp_rmb() 
isn't

going to really do anything anyway, so it would be incorrect to use it.


rmb() orders *any* two reads; that includes two reads from the same
location.


Consider that smp_rmb basically will do anything from flushing the
pipeline to invalidating loads speculatively executed out of order. 
AFAIK
it will not control the visibility of stores coming from other CPUs 
(that

is up to the cache coherency).


The writer side should typically use wmb() whenever the reader side
uses rmb(), sure.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

"Volatile behaviour" itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms),


It should be consistent across platforms; if not, file a bug please.


but it is /expected/ to mean something like: "ensure that
every such access actually goes all the way to memory, and is not
re-ordered w.r.t. to other accesses, as far as the compiler can take
care of these". The last "as far as compiler can take care" disclaimer
comes about due to CPUs doing their own re-ordering nowadays.


You can *expect* whatever you want, but this isn't in line with
reality at all.

volatile _does not_ make accesses go all the way to memory.
volatile _does not_ prevent reordering wrt other accesses.

What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point.  This does not mean other accesses cannot be
reordered wrt the volatile access.

If the abstract machine would do an access to a volatile-
qualified object, the generated machine code will do that
access too.  But, for example, it can still be optimised
away by the compiler, if it can prove it is allowed to.

If you want stuff to go all the way to memory, you need some
architecture-specific flush sequence; to make a store globally
visible before another store, you need mb(); before some following
read, you need mb(); to prevent reordering you need a barrier.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool
Well if there is only one memory location involved, then smp_rmb() 
isn't

going to really do anything anyway, so it would be incorrect to use it.


rmb() orders *any* two reads; that includes two reads from the same
location.


Consider that smp_rmb basically will do anything from flushing the
pipeline to invalidating loads speculatively executed out of order. 
AFAIK
it will not control the visibility of stores coming from other CPUs 
(that

is up to the cache coherency).


The writer side should typically use wmb() whenever the reader side
uses rmb(), sure.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

Volatile behaviour itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms),


It should be consistent across platforms; if not, file a bug please.


but it is /expected/ to mean something like: ensure that
every such access actually goes all the way to memory, and is not
re-ordered w.r.t. to other accesses, as far as the compiler can take
care of these. The last as far as compiler can take care disclaimer
comes about due to CPUs doing their own re-ordering nowadays.


You can *expect* whatever you want, but this isn't in line with
reality at all.

volatile _does not_ make accesses go all the way to memory.
volatile _does not_ prevent reordering wrt other accesses.

What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point.  This does not mean other accesses cannot be
reordered wrt the volatile access.

If the abstract machine would do an access to a volatile-
qualified object, the generated machine code will do that
access too.  But, for example, it can still be optimised
away by the compiler, if it can prove it is allowed to.

If you want stuff to go all the way to memory, you need some
architecture-specific flush sequence; to make a store globally
visible before another store, you need mb(); before some following
read, you need mb(); to prevent reordering you need a barrier.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use 
it.


rmb() orders *any* two reads; that includes two reads from the same
location.


If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.


That's true of course, although there is no real guarantee for that.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use
it.


rmb() orders *any* two reads; that includes two reads from the same
location.


If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.


That's true of course, although there is no real guarantee for that.


A CPU that did not provide this property (cache coherence) would be
most emphatically reviled.


That doesn't have anything to do with coherency as far as I can see.

It's just about the order in which a CPU (speculatively) performs the 
loads

(which isn't necessarily the same as the order in which it executes the
corresponding instructions, even).


So we are pretty safe assuming that CPUs
will provide it.


Yeah, pretty safe.  I just don't like undocumented assumptions :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

How does the compiler know that msleep() has got barrier()s?


Because msleep_interruptible() is in a separate compilation unit,
the compiler has to assume that it might modify any arbitrary global.


No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.

What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)

In many cases, the compiler also has to assume that 
msleep_interruptible()

might call back into a function in the current compilation unit, thus
possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
I think this was just terminology confusion here again. Isn't any 
code

that it cannot currently see the same as another compilation unit,
and wouldn't the compilation unit itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
definition for compilation unit (in gcc lingo, possibly?)


This is indeed my understanding -- compilation unit is whatever the
compiler looks at in one go.  I have heard the word module used for
the minimal compilation unit covering a single .c file and everything
that it #includes, but there might be a better name for this.


Yes, that's what's called compilation unit :-)

[/me double checks]

Erm, the C standard actually calls it translation unit.

To be exact, to avoid any more confusion:

5.1.1.1/1:
A C program need not all be translated at the same time. The
text of the program is kept in units called source files, (or
preprocessing files) in this International Standard. A source
file together with all the headers and source files included
via the preprocessing directive #include is known as a
preprocessing translation unit. After preprocessing, a
preprocessing translation unit is called a translation unit.



Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)


I think this was just terminology confusion here again. Isn't any code
that it cannot currently see the same as another compilation unit,


It is not; try  gcc -combine  or the upcoming link-time optimisation
stuff, for example.


and wouldn't the compilation unit itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
definition for compilation unit (in gcc lingo, possibly?)


compilation unit is a C standard term.  It typically boils down
to single .c file.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point.  This does not mean other accesses cannot be
reordered wrt the volatile access.

If the abstract machine would do an access to a volatile-
qualified object, the generated machine code will do that
access too.  But, for example, it can still be optimised
away by the compiler, if it can prove it is allowed to.


As (now) indicated above, I had meant multiple volatile accesses to
the same object, obviously.


Yes, accesses to volatile objects are never reordered with
respect to each other.


BTW:

#define atomic_read(a)  (*(volatile int *)(a))
#define atomic_set(a,i) (*(volatile int *)(a) = (i))

int a;

void func(void)
{
int b;

b = atomic_read(a);
atomic_set(a, 20);
b = atomic_read(a);
}

gives:

func:
pushl   %ebp
movla, %eax
movl%esp, %ebp
movl$20, a
movla, %eax
popl%ebp
ret

so the first atomic_read() wasn't optimized away.


Of course.  It is executed by the abstract machine, so
it will be executed by the actual machine.  On the other
hand, try

b = 0;
if (b)
b = atomic_read(a);

or similar.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Of course, if we find there are more callers in the kernel who want 
the

volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions 
to

somethine else, say atomic_read_nonvolatile for all I care.


Do we really need another set of APIs?


Well, if there's one set of users who do care about volatile behaviour,
and another set that doesn't, it only sounds correct to provide both
those APIs, instead of forcing one behaviour on all users.


But since there currently is only one such API, and there are
users expecting the stronger behaviour, the only sane thing to
do is let the API provide that behaviour.  You can always add
a new API with weaker behaviour later, and move users that are
okay with it over to that new API.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.


Last I checked, the Linux kernel build system did compile each .c file
as a separate compilation unit.


I have some patches to use -combine -fwhole-program for Linux.
Highly experimental, you need a patched bleeding edge toolchain.
If there's interest I'll clean it up and put it online.

David Woodhouse had some similar patches about a year ago.


In many cases, the compiler also has to assume that
msleep_interruptible()
might call back into a function in the current compilation unit, thus
possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Yep.  For example, if it knows the current value of a given such local
variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.


Or the most common thing: if neither the address of the translation-
unit local variable nor the address of any function writing to that
variable can escape from that translation unit, nothing outside
the translation unit can write to the variable.

At least given that gcc doesn't know about multiple threads of 
execution!


Heh, only about the threads it creates itself (not relevant to
the kernel, for sure :-) )


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Possibly these were too trivial to expose any potential problems that 
you
may have been referring to, so would be helpful if you could write a 
more

concrete example / sample code.


The trick is to have a sufficiently complicated expression to force
the compiler to run out of registers.


You can use -ffixed-XXX to keep the testcase simple.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Please check the definition of cache coherence.


Which of the twelve thousand such definitions?  :-)

Summary: the CPU is indeed within its rights to execute loads and 
stores
to a single variable out of order, -but- only if it gets the same 
result
that it would have obtained by executing them in order.  Which means 
that

any reordering of accesses by a single CPU to a single variable will be
invisible to the software.


I'm still not sure if that applies to all architectures.
Doesn't matter anyway, let's kill this thread :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re:

2007-08-15 Thread Segher Boessenkool

compilation unit is a C standard term.  It typically boils down
to single .c file.


As you mentioned later, single .c file with all the other files 
(headers
or other .c files) that it pulls in via #include is actually 
translation

unit, both in the C standard as well as gcc docs.


Yeah.  single .c file after preprocessing.  Same thing :-)


Compilation unit
doesn't seem to be nearly as standard a term, though in most places it
is indeed meant to be same as translation unit, but with the new gcc
inter-module-analysis stuff that you referred to above, I suspect one 
may
reasonably want to call a compilation unit as all that the compiler 
sees

at a given instant.


That would be a bit confusing, would it not?  They'd better find
some better name for that if they want to name it at all (remember,
none of these optimisations should have any effect on the semantics
of the program, you just get fewer .o files etc.).


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew where 
they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.


Last I checked, the Linux kernel build system did compile each .c 
file

as a separate compilation unit.


I have some patches to use -combine -fwhole-program for Linux.
Highly experimental, you need a patched bleeding edge toolchain.
If there's interest I'll clean it up and put it online.

David Woodhouse had some similar patches about a year ago.


Sounds exciting...  ;-)


Yeah, the breakage is *quite* spectacular :-)


In many cases, the compiler also has to assume that
msleep_interruptible()
might call back into a function in the current compilation unit, 
thus

possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Yep.  For example, if it knows the current value of a given such 
local

variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.


Or the most common thing: if neither the address of the translation-
unit local variable nor the address of any function writing to that
variable can escape from that translation unit, nothing outside
the translation unit can write to the variable.


But there is usually at least one externally callable function in
a .c file.


Of course, but often none of those will (indirectly) write a certain
static variable.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew where 
they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Precisely the point -- use of volatile (whether in casts or on asms)
in these cases are intended to disable those optimizations likely to
result in heisenbugs.


The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler do not
remove this asm even if you don't need any of its outputs.

It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an asm
that needs it simply _is_ a bug :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

A volatile default would disable optimizations for atomic_read.
atomic_read without volatile would allow for full optimization by the
compiler. Seems that this is what one wants in many cases.


Name one such case.

An atomic_read should do a load from memory.  If the programmer puts
an atomic_read() in the code then the compiler should emit a load for
it, not re-use a value returned by a previous atomic_read.  I do not
believe it would ever be useful for the compiler to collapse two
atomic_read statements into a single load.


An atomic_read() implemented as a normal C variable read would
allow that read to be combined with another normal read from
that variable.  This could perhaps be marginally useful, although
I'd bet you cannot see it unless counting cycles on a simulator
or counting bits in the binary size.

With an asm() implementation, the compiler can not do this; with
a volatile implementation (either volatile variable or volatile-cast),
this invokes undefined behaviour (in both C and GCC).


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-12 Thread Segher Boessenkool

Yeah.  Compiler errors are more annoying though I dare say ;-)


Actually, compile-time errors are fine,


Yes, they don't cause data corruption or anything like that,
but I still don't think the 390 people want to ship a kernel
that doesn't build -- and it seems they still need to support
GCC versions before 4.0.  Up until the day they can finally
drop 3.x, they'll have to...


and easy to work around.


...use the simple workaround, annoying as it may be, at least
it works.  That's all I'm saying.


*Much*
more annoying is when gcc actively generates subtly bad code.


Quite so.


We've had
use-after-free issues due to incorrect gcc liveness calculations etc, 
and
inline asm has beeen one of the more common causes - exactly because 
the

kernel is one of the few users (along with glibc) that uses it at all.


The good news is that things are getting better since more and
more of the RTL transformations are ripped out.


Now *those* are hard to find - the code works most of the time, but the
compiler has inserted a really subtle race condition into the code
(deallocated a local stack entry before last use). We had that with our
semaphore code at some point.


Yeah, magnifying glass + lots of time kind of bug.  Lovely :-/


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-12 Thread Segher Boessenkool

"+m" works. We use it. It's better than the alternatives. Pointing to
stale documentation doesn't change anything.


Well, perhaps on i386. I've seen some older versions of the s390 gcc 
die
with an ICE because I have used "+m" in some kernel inline assembly. 
I'm

happy to hear that this issue is fixed in recent gcc. Now I'll have to
find out if this is already true with gcc 3.x.


It was fixed (that is, "+m" is translated into a separate read
and write by GCC itself) in GCC-4.0.0, I just learnt.

The duplication "=m" and "m" with the same constraint is rather 
annoying.


Yeah.  Compiler errors are more annoying though I dare say ;-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-12 Thread Segher Boessenkool

Yes, though I would use "=m" on the output list and "m" on the input
list. The reason is that I've seen gcc fall on its face with an ICE 
on
s390 due to "+m". The explanation I've got from our compiler people 
was

quite esoteric, as far as I remember gcc splits "+m" to an input
operand
and an output operand. Now it can happen that the compiler chooses 
two

different registers to access the same memory location. "+m" requires
that the two memory references are identical which causes the ICE if
they are not.


The problem is very nicely described here, last paragraph:


It's not a problem anymore in (very) recent GCC, although
that of course won't help you in the kernel (yet).


So you are saying that gcc 3.x still has this problem ?


Yes.  A warning ("read-write constraint does not allow a register")
was added for GCC-3.4, but the fix/workaround is more recent
(4.2 I believe, perhaps it was backported to the 4.1 branch).


I do not know if the current compilers still do this. Has
anyone else seen this happen ?


In recent GCC, it's actually documented:

	 The ordinary output operands must be write-only; GCC will assume 
that

the values in these operands before the instruction are dead and need
not be generated.  Extended asm supports input-output or read-write
	operands.  Use the constraint character `+' to indicate such an 
operand

and list it with the output operands.  You should only use read-write
	operands when the constraints for the operand (or the operand in 
which

only some of the bits are to be changed) allow a register.

Note that last line.


I see, thanks for the info.


My pleasure.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-12 Thread Segher Boessenkool

Note that last line.


Segher, how about you just accept that Linux uses gcc as per reality, 
and

that sometimes the reality is different from your expectations?

"+m" works.


It works _most of the time_.  Ask Martin.  Oh you don't even have to,
he told you two mails ago.  My last mail simply pointed out that this
isn't a GCC bug, but merely documented behaviour.  Well okay, it was
documented only after people found problems with it; it is an edge
case and pretty hard to trigger (impossible to trigger on RISC
architectures, and not very likely on x86 either, for different
reasons).

Since "realistic" people keep using "+m" anyhow, current GCC has added
a workaround that splits "+m" into an "=m" and an "m" constraint.  It
likely will be accepted as a permanent feature.  However, that 
workaround

doesn't yet exist on some of the compilers Linux still allows building
with (but the problem does).


We use it. It's better than the alternatives.


How is this better than simply writing the slightly more verbose
asm("..." : "=m"(XX) : "m"(XX)) ?  It's a few more characters.
asm() is hard to write (correctly) anyway.  I don't see the problem.


Pointing to stale documentation doesn't change anything.


It's not "stale" documentation, it's freshly built from GCC mainline.
It's also not "stale" in the sense that it isn't updated; in fact,
the email I pointed to is from a thread where a change in this exact
paragraph in the documentation was suggested (a couple of weeks ago).


The same is true of accesses through a volatile pointer. The kernel has
done that for a *loong* time (all MMIO IO is done that way),


(All MMIO on certain architectures).  This however is a completely
different case; there _is_ no underlying C object declared anywhere,
so no way can GCC prove that that object isn't volatile, so it _has_
to assume it is.

Also, in case of e.g. readb(), if the compiler can prove the resulting
value isn't used it doesn't have to perform the access at all -- the
documentation (again, not "stale" documentation) points this out quite
literally.


and it's
totally pointless to say that "volatile" isn't guaranteed to do what it
does.


I actually asked a couple of other GCC developers last night, and
they all agreed with me.  If you look at historic GCC problem reports
(I pointed at a few earlier in these threads) you would see that the
situation is far from clear.


It works, and quite frankly, if it didn't work, it would be a gcc
BUG.


How can it be a bug, if the semantics you think are guaranteed are
guaranteed in your (and others') minds only?

Of course, GCC likes to cater to its users, and tries to make things
work the way the programmer probably intended.  However it a) cannot
*actually* read your mind, it just looks that way; b) it _also_ has
to implement the _correct_ semantics of volatile; c) since this isn't
part of any C version (no, not GNU99 or similar either), sometimes
GCC developers do not think about what some people expect the compiler
should do with such code, but just implement the requirements they
*know* exist.  And that's when "BUGs" happen.

If you want GCC to do what you want it to do instead of what it does,
why not file an enhancement request?  .

To save you the trouble, I just did: .  Let's
see what comes from it.

And again, this is not a "C standard" issue. It's a "sane 
implementation"

issue.


Maybe what you consider sane isn't as clear-cut as you think?  I'm
not alone with this opinion, as the mailing lists archives readily
show.


Linux expects the compiler to be sane.


And the compiler expects the code it is asked to translate to
be reasonably sane.  Nothing new here.  The compiler is more
forgiving than the Linux code, IMHO; that's why I suggested
using the obviously correct asm implementation of the atomic
get/set, rather than using the not-so-obviously-correct and
error-prone volatile thing.  The semantics of the asm() are
exactly the semantics you expect from an atomic get/set, while
that of the volatile implementation are *not*; and on top of
that it is very well-tested well-understood technology.


If it isn't, that's not our problem.


What, if the compiler doesn't work as you expect, like a silent
miscompilation, or even only the compiler ICEs, that is not a
problem for users and/or developers of Linux?  Interesting
opinion.  I'd rather stay clear of known pitfalls.


gcc *is* sane,


Yes, it is.


and I don't see why you constantly act as if it wasn't.


I guess I didn't express myself clearly enough then.  Hopefully
some other people _did_ understand what I meant.  In very harsh
words: "not all (proposed) kernel code is sane".


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-12 Thread Segher Boessenkool

Note that last line.


Segher, how about you just accept that Linux uses gcc as per reality, 
and

that sometimes the reality is different from your expectations?

+m works.


It works _most of the time_.  Ask Martin.  Oh you don't even have to,
he told you two mails ago.  My last mail simply pointed out that this
isn't a GCC bug, but merely documented behaviour.  Well okay, it was
documented only after people found problems with it; it is an edge
case and pretty hard to trigger (impossible to trigger on RISC
architectures, and not very likely on x86 either, for different
reasons).

Since realistic people keep using +m anyhow, current GCC has added
a workaround that splits +m into an =m and an m constraint.  It
likely will be accepted as a permanent feature.  However, that 
workaround

doesn't yet exist on some of the compilers Linux still allows building
with (but the problem does).


We use it. It's better than the alternatives.


How is this better than simply writing the slightly more verbose
asm(... : =m(XX) : m(XX)) ?  It's a few more characters.
asm() is hard to write (correctly) anyway.  I don't see the problem.


Pointing to stale documentation doesn't change anything.


It's not stale documentation, it's freshly built from GCC mainline.
It's also not stale in the sense that it isn't updated; in fact,
the email I pointed to is from a thread where a change in this exact
paragraph in the documentation was suggested (a couple of weeks ago).


The same is true of accesses through a volatile pointer. The kernel has
done that for a *loong* time (all MMIO IO is done that way),


(All MMIO on certain architectures).  This however is a completely
different case; there _is_ no underlying C object declared anywhere,
so no way can GCC prove that that object isn't volatile, so it _has_
to assume it is.

Also, in case of e.g. readb(), if the compiler can prove the resulting
value isn't used it doesn't have to perform the access at all -- the
documentation (again, not stale documentation) points this out quite
literally.


and it's
totally pointless to say that volatile isn't guaranteed to do what it
does.


I actually asked a couple of other GCC developers last night, and
they all agreed with me.  If you look at historic GCC problem reports
(I pointed at a few earlier in these threads) you would see that the
situation is far from clear.


It works, and quite frankly, if it didn't work, it would be a gcc
BUG.


How can it be a bug, if the semantics you think are guaranteed are
guaranteed in your (and others') minds only?

Of course, GCC likes to cater to its users, and tries to make things
work the way the programmer probably intended.  However it a) cannot
*actually* read your mind, it just looks that way; b) it _also_ has
to implement the _correct_ semantics of volatile; c) since this isn't
part of any C version (no, not GNU99 or similar either), sometimes
GCC developers do not think about what some people expect the compiler
should do with such code, but just implement the requirements they
*know* exist.  And that's when BUGs happen.

If you want GCC to do what you want it to do instead of what it does,
why not file an enhancement request?  http://gcc.gnu.org/bugzilla.

To save you the trouble, I just did: gcc.gnu.org/PR33053.  Let's
see what comes from it.

And again, this is not a C standard issue. It's a sane 
implementation

issue.


Maybe what you consider sane isn't as clear-cut as you think?  I'm
not alone with this opinion, as the mailing lists archives readily
show.


Linux expects the compiler to be sane.


And the compiler expects the code it is asked to translate to
be reasonably sane.  Nothing new here.  The compiler is more
forgiving than the Linux code, IMHO; that's why I suggested
using the obviously correct asm implementation of the atomic
get/set, rather than using the not-so-obviously-correct and
error-prone volatile thing.  The semantics of the asm() are
exactly the semantics you expect from an atomic get/set, while
that of the volatile implementation are *not*; and on top of
that it is very well-tested well-understood technology.


If it isn't, that's not our problem.


What, if the compiler doesn't work as you expect, like a silent
miscompilation, or even only the compiler ICEs, that is not a
problem for users and/or developers of Linux?  Interesting
opinion.  I'd rather stay clear of known pitfalls.


gcc *is* sane,


Yes, it is.


and I don't see why you constantly act as if it wasn't.


I guess I didn't express myself clearly enough then.  Hopefully
some other people _did_ understand what I meant.  In very harsh
words: not all (proposed) kernel code is sane.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-12 Thread Segher Boessenkool

Yes, though I would use =m on the output list and m on the input
list. The reason is that I've seen gcc fall on its face with an ICE 
on
s390 due to +m. The explanation I've got from our compiler people 
was

quite esoteric, as far as I remember gcc splits +m to an input
operand
and an output operand. Now it can happen that the compiler chooses 
two

different registers to access the same memory location. +m requires
that the two memory references are identical which causes the ICE if
they are not.


The problem is very nicely described here, last paragraph:
http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01816.html

It's not a problem anymore in (very) recent GCC, although
that of course won't help you in the kernel (yet).


So you are saying that gcc 3.x still has this problem ?


Yes.  A warning (read-write constraint does not allow a register)
was added for GCC-3.4, but the fix/workaround is more recent
(4.2 I believe, perhaps it was backported to the 4.1 branch).


I do not know if the current compilers still do this. Has
anyone else seen this happen ?


In recent GCC, it's actually documented:

	 The ordinary output operands must be write-only; GCC will assume 
that

the values in these operands before the instruction are dead and need
not be generated.  Extended asm supports input-output or read-write
	operands.  Use the constraint character `+' to indicate such an 
operand

and list it with the output operands.  You should only use read-write
	operands when the constraints for the operand (or the operand in 
which

only some of the bits are to be changed) allow a register.

Note that last line.


I see, thanks for the info.


My pleasure.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   >