At some point in the future, barrier() will be universally regarded as
a hammer too big for most purposes. Whether or not removing it now
You can't just remove it, it is needed in some places; you want to
replace it in most places with a more fine-grained compiler barrier,
I presume?
How many people e.g. test -rc kernels compiled with gcc 3.2?
Why would that matter? It either works or not. If it doesn't
work, it can either be fixed, or support for that old compiler
version can be removed.
The only other policy than only remove support if things are
badly broken would be
Last time I tried a mips build, it would fail the compile unless I was
using _exactly_ 3.4.4 (I didn't tried older versions, but did try
3.4.6, for ex.).
If 3.4.4 works where 3.4.6 doesn't, you should report this as a
bug; either here, or to the GCC team (but please be aware that the
3.4 series
How many people e.g. test -rc kernels compiled with gcc 3.2?
Why would that matter? It either works or not. If it doesn't
work, it can either be fixed, or support for that old compiler
version can be removed.
One bug report kernel doesn't work / crash / ... when compiled with
gcc 3.2, but
And no, RMW on MMIO isn't "problematic" at all, either.
An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode. But three actual operations.
Maybe for some CPUs, but not all. ARM for instance can't use the
load exclusive and store exclusive instructions to MMIO
But there is no lparmap.o! lparmap.s is the generated file.
Yeah, tell that to scripts/Makefile.lib:
_c_flags = $(CFLAGS) $(EXTRA_CFLAGS) $(CFLAGS_$(basetarget).o)
What would do what a person expects is $(CFLAGS_$(@F)), I think.
Looks good to me. Sam? We wanted to set
In a way though, good thing, or mainline would have continued to be
broken :)
What do I need to do to see this bug, btw? A special config
symbol perhaps?
CONFIG_DEBUG_INFO=y
and pass-g-to-assembler-under-config_debug_info.patch from -mm
Ah, an -mm patch. Gotcha.
The fix most likely won't
Such code generally doesn't care precisely when it gets the update,
just that the update is atomic, and it doesn't loop forever.
Yes, it _does_ care that it gets the update _at all_, and preferably
as early as possible.
Regardless, I'm convinced we just need to do it all in assembly.
So do
Right. ROTFL... volatile actually breaks atomic_t instead of making
it safe. x++ becomes a register load, increment and a register store.
Without volatile we can increment the memory directly. It seems that
volatile requires that the variable is loaded into a register first
and then operated
In a way though, good thing, or mainline would have continued to be
broken :)
What do I need to do to see this bug, btw? A special config
symbol perhaps?
Segher
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More
Hmm. Check the V=1 make output to see that the lparmap.c really got
the -g0.
The log says it didn't. Oops! It looks like the patch that got
committed is
the one that sets CFLAGS_lparmap.s, but really it needs to set
CFLAGS_lparmap.o instead (go kbuild). Did I post the wrong one?
(It's
Hmm. Check the V=1 make output to see that the lparmap.c really got
the -g0.
The log says it didn't. Oops! It looks like the patch that got
committed is
the one that sets CFLAGS_lparmap.s, but really it needs to set
CFLAGS_lparmap.o instead (go kbuild). Did I post the wrong one?
(It's
In a way though, good thing, or mainline would have continued to be
broken :)
What do I need to do to see this bug, btw? A special config
symbol perhaps?
Segher
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More
Right. ROTFL... volatile actually breaks atomic_t instead of making
it safe. x++ becomes a register load, increment and a register store.
Without volatile we can increment the memory directly. It seems that
volatile requires that the variable is loaded into a register first
and then operated
Such code generally doesn't care precisely when it gets the update,
just that the update is atomic, and it doesn't loop forever.
Yes, it _does_ care that it gets the update _at all_, and preferably
as early as possible.
Regardless, I'm convinced we just need to do it all in assembly.
So do
In a way though, good thing, or mainline would have continued to be
broken :)
What do I need to do to see this bug, btw? A special config
symbol perhaps?
CONFIG_DEBUG_INFO=y
and pass-g-to-assembler-under-config_debug_info.patch from -mm
Ah, an -mm patch. Gotcha.
The fix most likely won't
But there is no lparmap.o! lparmap.s is the generated file.
Yeah, tell that to scripts/Makefile.lib:
_c_flags = $(CFLAGS) $(EXTRA_CFLAGS) $(CFLAGS_$(basetarget).o)
What would do what a person expects is $(CFLAGS_$(@F)), I think.
Looks good to me. Sam? We wanted to set
And no, RMW on MMIO isn't problematic at all, either.
An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode. But three actual operations.
Maybe for some CPUs, but not all. ARM for instance can't use the
load exclusive and store exclusive instructions to MMIO space.
The documentation simply doesn't say "+m" is allowed. The code to
allow it was added for the benefit of people who do not read the
documentation. Documentation for "+m" might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly
The "asm volatile" implementation does have exactly the same
reordering guarantees as the "volatile cast" thing,
I don't think so.
"asm volatile" creates a side effect.
Yeah.
Side effects aren't
allowed to be reordered wrt sequence points.
Yeah.
This is exactly
the same reason as why
Instead, use asm() like all other atomic operations already do.
+static __inline__ long atomic64_read(const atomic_t *v)
+static __inline__ void atomic64_set(atomic_t *v, long i)
s/atomic_t/atomic64_t/ in both lines. I've edited my copy of the
patch.
Ouch, sorry about that.
Segher
No it does not have any volatile semantics. atomic_dec() can be
reordered
at will by the compiler within the current basic unit if you do not
add a
barrier.
"volatile" has nothing to do with reordering.
If you're talking of "volatile" the type-qualifier keyword, then
atomic_dec() writes
to memory, so it _does_ have "volatile semantics", implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.
I don't think an atomic_dec() implemented as an inline "asm volatile"
or one that uses a "forget"
#define forget(a) __asm__ __volatile__ ("" :"=m" (a) :"m" (a))
[ This is exactly equivalent to using "+m" in the constraints, as
recently
explained on a GCC list somewhere, in response to the patch in my
bitops
series a few weeks back where I thought "+m" was bogus. ]
[It wasn't
#define forget(a) __asm__ __volatile__ ("" :"=m" (a) :"m" (a))
[ This is exactly equivalent to using "+m" in the constraints, as
recently
explained on a GCC list somewhere, in response to the patch in my
bitops
series a few weeks back where I thought "+m" was bogus. ]
[It wasn't
Here, I should obviously admit that the semantics of *(volatile int
*)&
aren't any neater or well-defined in the _language standard_ at all.
The
standard does say (verbatim) "precisely what constitutes as access to
object of volatile-qualified type is implementation-defined", but GCC
does help
Now the second wording *IS* technically correct, but come on, it's
24 words long whereas the original one was 3 -- and hopefully anybody
reading the shorter phrase *would* have known anyway what was meant,
without having to be pedantic about it :-)
Well you were talking pretty formal (and
In a reasonable world, gcc should just make that be (on x86)
addl $1,i(%rip)
on x86-64, which is indeed what it does without the volatile. But with
the
volatile, the compiler gets really nervous, and doesn't dare do it in
one
instruction, and thus generates crap like
movl
(and yes, it is perfectly legitimate to
want a non-volatile read for a data type that you also want to do
atomic RMW operations on)
...which is undefined behaviour in C (and GCC) when that data is
declared volatile, which is a good argument against implementing
atomics that way in itself.
Of course, since *normal* accesses aren't necessarily limited wrt
re-ordering, the question then becomes one of "with regard to *what*
does
it limit re-ordering?".
A C compiler that re-orders two different volatile accesses that have a
sequence point in between them is pretty clearly a buggy
atomic_dec() already has volatile behavior everywhere, so this is
semantically
okay, but this code (and any like it) should be calling cpu_relax()
each
iteration through the loop, unless there's a compelling reason not
to. I'll
allow that for some hardware drivers (possibly this one) such a
Part of the motivation here is to fix heisenbugs. If I knew
where they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Almost everything is a tradeoff; and so is this. I don't
believe most people would find disabling all compiler
Part of the motivation here is to fix heisenbugs. If I knew
where they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Almost everything is a tradeoff; and so is this. I don't
believe most people would find disabling all compiler
atomic_dec() already has volatile behavior everywhere, so this is
semantically
okay, but this code (and any like it) should be calling cpu_relax()
each
iteration through the loop, unless there's a compelling reason not
to. I'll
allow that for some hardware drivers (possibly this one) such a
Of course, since *normal* accesses aren't necessarily limited wrt
re-ordering, the question then becomes one of with regard to *what*
does
it limit re-ordering?.
A C compiler that re-orders two different volatile accesses that have a
sequence point in between them is pretty clearly a buggy
(and yes, it is perfectly legitimate to
want a non-volatile read for a data type that you also want to do
atomic RMW operations on)
...which is undefined behaviour in C (and GCC) when that data is
declared volatile, which is a good argument against implementing
atomics that way in itself.
In a reasonable world, gcc should just make that be (on x86)
addl $1,i(%rip)
on x86-64, which is indeed what it does without the volatile. But with
the
volatile, the compiler gets really nervous, and doesn't dare do it in
one
instruction, and thus generates crap like
movl
Now the second wording *IS* technically correct, but come on, it's
24 words long whereas the original one was 3 -- and hopefully anybody
reading the shorter phrase *would* have known anyway what was meant,
without having to be pedantic about it :-)
Well you were talking pretty formal (and
#define forget(a) __asm__ __volatile__ ( :=m (a) :m (a))
[ This is exactly equivalent to using +m in the constraints, as
recently
explained on a GCC list somewhere, in response to the patch in my
bitops
series a few weeks back where I thought +m was bogus. ]
[It wasn't explained
Here, I should obviously admit that the semantics of *(volatile int
*)
aren't any neater or well-defined in the _language standard_ at all.
The
standard does say (verbatim) precisely what constitutes as access to
object of volatile-qualified type is implementation-defined, but GCC
does help us
#define forget(a) __asm__ __volatile__ ( :=m (a) :m (a))
[ This is exactly equivalent to using +m in the constraints, as
recently
explained on a GCC list somewhere, in response to the patch in my
bitops
series a few weeks back where I thought +m was bogus. ]
[It wasn't explained
atomic_dec() writes
to memory, so it _does_ have volatile semantics, implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.
I don't think an atomic_dec() implemented as an inline asm volatile
or one that uses a forget macro
No it does not have any volatile semantics. atomic_dec() can be
reordered
at will by the compiler within the current basic unit if you do not
add a
barrier.
volatile has nothing to do with reordering.
If you're talking of volatile the type-qualifier keyword, then
Instead, use asm() like all other atomic operations already do.
+static __inline__ long atomic64_read(const atomic_t *v)
+static __inline__ void atomic64_set(atomic_t *v, long i)
s/atomic_t/atomic64_t/ in both lines. I've edited my copy of the
patch.
Ouch, sorry about that.
Segher
The asm volatile implementation does have exactly the same
reordering guarantees as the volatile cast thing,
I don't think so.
asm volatile creates a side effect.
Yeah.
Side effects aren't
allowed to be reordered wrt sequence points.
Yeah.
This is exactly
the same reason as why
The documentation simply doesn't say +m is allowed. The code to
allow it was added for the benefit of people who do not read the
documentation. Documentation for +m might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly
the atomic operations in assembler,
like Segher Boessenkool did for powerpc in response to my previous
patchset.
Puh-lease. I DO NOT DISTRUST THE COMPILER, I just don't assume
it will do whatever I would like it to do without telling it.
It's a machine you know, and it is very well documented.
Here, I should obviously admit that the semantics of *(volatile int *)&
aren't any neater or well-defined in the _language standard_ at all.
The
standard does say (verbatim) "precisely what constitutes as access to
object of volatile-qualified type is implementation-defined", but GCC
does help
Note that "volatile"
is a type-qualifier, not a type itself, so a cast of the _object_
itself
to a qualified-type i.e. (volatile int) would not make the access
itself
volatile-qualified.
There is no such thing as "volatile-qualified access" defined
anywhere; there only is the concept of a
I can't speak for this particular case, but there could be similar
code
examples elsewhere, where we do the atomic ops on an atomic_t object
inside a higher-level locking scheme that would take care of the kind
of
problem you're referring to here. It would be useful for such or
similar
code if
I'd go so far as to say that anywhere where you want a non-"volatile"
atomic_read, either your code is buggy, or else an int would work just
as well.
Even, the only way to implement a "non-volatile" atomic_read() is
essentially as a plain int (you can do some tricks so you cannot
assign to the
The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler "do not
remove this asm even if you don't need any of its outputs".
It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on
Part of the motivation here is to fix heisenbugs. If I knew where
they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Almost everything is a tradeoff; and so is this. I don't
believe most people would find disabling all compiler
Part of the motivation here is to fix heisenbugs. If I knew where
they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Almost everything is a tradeoff; and so is this. I don't
believe most people would find disabling all compiler
The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler do not
remove this asm even if you don't need any of its outputs.
It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an
I'd go so far as to say that anywhere where you want a non-volatile
atomic_read, either your code is buggy, or else an int would work just
as well.
Even, the only way to implement a non-volatile atomic_read() is
essentially as a plain int (you can do some tricks so you cannot
assign to the
I can't speak for this particular case, but there could be similar
code
examples elsewhere, where we do the atomic ops on an atomic_t object
inside a higher-level locking scheme that would take care of the kind
of
problem you're referring to here. It would be useful for such or
similar
code if
Note that volatile
is a type-qualifier, not a type itself, so a cast of the _object_
itself
to a qualified-type i.e. (volatile int) would not make the access
itself
volatile-qualified.
There is no such thing as volatile-qualified access defined
anywhere; there only is the concept of a
Here, I should obviously admit that the semantics of *(volatile int *)
aren't any neater or well-defined in the _language standard_ at all.
The
standard does say (verbatim) precisely what constitutes as access to
object of volatile-qualified type is implementation-defined, but GCC
does help us
operations in assembler,
like Segher Boessenkool did for powerpc in response to my previous
patchset.
Puh-lease. I DO NOT DISTRUST THE COMPILER, I just don't assume
it will do whatever I would like it to do without telling it.
It's a machine you know, and it is very well documented.
(And most
A volatile default would disable optimizations for atomic_read.
atomic_read without volatile would allow for full optimization by the
compiler. Seems that this is what one wants in many cases.
Name one such case.
An atomic_read should do a load from memory. If the programmer puts
an
"compilation unit" is a C standard term. It typically boils down
to "single .c file".
As you mentioned later, "single .c file with all the other files
(headers
or other .c files) that it pulls in via #include" is actually
"translation
unit", both in the C standard as well as gcc docs.
Part of the motivation here is to fix heisenbugs. If I knew where
they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Precisely the point -- use of volatile (whether in casts or on asms)
in these cases are intended to disable those
Part of the motivation here is to fix heisenbugs. If I knew where
they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Almost everything is a tradeoff; and so is this. I don't
believe most people would find disabling all compiler
No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.
Last I checked, the Linux kernel build system did compile each .c
file
as a separate compilation unit.
I have
Please check the definition of "cache coherence".
Which of the twelve thousand such definitions? :-)
Summary: the CPU is indeed within its rights to execute loads and
stores
to a single variable out of order, -but- only if it gets the same
result
that it would have obtained by executing
Possibly these were too trivial to expose any potential problems that
you
may have been referring to, so would be helpful if you could write a
more
concrete example / sample code.
The trick is to have a sufficiently complicated expression to force
the compiler to run out of registers.
You
No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.
Last I checked, the Linux kernel build system did compile each .c file
as a separate compilation unit.
I have some
Of course, if we find there are more callers in the kernel who want
the
volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions
to
somethine else, say "atomic_read_nonvolatile" for all I care.
Do we really need
I think this was just terminology confusion here again. Isn't "any
code
that it cannot currently see" the same as "another compilation unit",
and wouldn't the "compilation unit" itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
"definition" for
What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point. This does not mean other accesses cannot be
reordered wrt the volatile access.
If the
What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)
I think this was just terminology confusion here again. Isn't "any code
that it cannot currently see" the same as "another compilation
How does the compiler know that msleep() has got barrier()s?
Because msleep_interruptible() is in a separate compilation unit,
the compiler has to assume that it might modify any arbitrary global.
No; compilation units have nothing to do with it, GCC can optimise
across compilation unit
Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use
it.
rmb() orders *any* two reads; that includes two reads from the same
location.
If the two reads are to the same location, all CPUs I am aware of
will
Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use
it.
rmb() orders *any* two reads; that includes two reads from the same
location.
If the two reads are to the same location, all CPUs I am aware of
Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use it.
rmb() orders *any* two reads; that includes two reads from the same
location.
Consider that smp_rmb basically will do anything from flushing the
"Volatile behaviour" itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms),
It should be consistent across platforms; if not, file a bug please.
but it is /expected/ to mean something like: "ensure that
every such access
Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use it.
rmb() orders *any* two reads; that includes two reads from the same
location.
Consider that smp_rmb basically will do anything from flushing the
Volatile behaviour itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms),
It should be consistent across platforms; if not, file a bug please.
but it is /expected/ to mean something like: ensure that
every such access
Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use
it.
rmb() orders *any* two reads; that includes two reads from the same
location.
If the two reads are to the same location, all CPUs I am aware of
Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use
it.
rmb() orders *any* two reads; that includes two reads from the same
location.
If the two reads are to the same location, all CPUs I am aware of
will
How does the compiler know that msleep() has got barrier()s?
Because msleep_interruptible() is in a separate compilation unit,
the compiler has to assume that it might modify any arbitrary global.
No; compilation units have nothing to do with it, GCC can optimise
across compilation unit
I think this was just terminology confusion here again. Isn't any
code
that it cannot currently see the same as another compilation unit,
and wouldn't the compilation unit itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
definition for compilation
What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)
I think this was just terminology confusion here again. Isn't any code
that it cannot currently see the same as another compilation
What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point. This does not mean other accesses cannot be
reordered wrt the volatile access.
If the
Of course, if we find there are more callers in the kernel who want
the
volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions
to
somethine else, say atomic_read_nonvolatile for all I care.
Do we really need
No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.
Last I checked, the Linux kernel build system did compile each .c file
as a separate compilation unit.
I have some
Possibly these were too trivial to expose any potential problems that
you
may have been referring to, so would be helpful if you could write a
more
concrete example / sample code.
The trick is to have a sufficiently complicated expression to force
the compiler to run out of registers.
You
Please check the definition of cache coherence.
Which of the twelve thousand such definitions? :-)
Summary: the CPU is indeed within its rights to execute loads and
stores
to a single variable out of order, -but- only if it gets the same
result
that it would have obtained by executing them
compilation unit is a C standard term. It typically boils down
to single .c file.
As you mentioned later, single .c file with all the other files
(headers
or other .c files) that it pulls in via #include is actually
translation
unit, both in the C standard as well as gcc docs.
Yeah.
Part of the motivation here is to fix heisenbugs. If I knew where
they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Almost everything is a tradeoff; and so is this. I don't
believe most people would find disabling all compiler
No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.
Last I checked, the Linux kernel build system did compile each .c
file
as a separate compilation unit.
I have
Part of the motivation here is to fix heisenbugs. If I knew where
they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Precisely the point -- use of volatile (whether in casts or on asms)
in these cases are intended to disable those
A volatile default would disable optimizations for atomic_read.
atomic_read without volatile would allow for full optimization by the
compiler. Seems that this is what one wants in many cases.
Name one such case.
An atomic_read should do a load from memory. If the programmer puts
an
Yeah. Compiler errors are more annoying though I dare say ;-)
Actually, compile-time errors are fine,
Yes, they don't cause data corruption or anything like that,
but I still don't think the 390 people want to ship a kernel
that doesn't build -- and it seems they still need to support
GCC
"+m" works. We use it. It's better than the alternatives. Pointing to
stale documentation doesn't change anything.
Well, perhaps on i386. I've seen some older versions of the s390 gcc
die
with an ICE because I have used "+m" in some kernel inline assembly.
I'm
happy to hear that this issue
Yes, though I would use "=m" on the output list and "m" on the input
list. The reason is that I've seen gcc fall on its face with an ICE
on
s390 due to "+m". The explanation I've got from our compiler people
was
quite esoteric, as far as I remember gcc splits "+m" to an input
operand
and an
Note that last line.
Segher, how about you just accept that Linux uses gcc as per reality,
and
that sometimes the reality is different from your expectations?
"+m" works.
It works _most of the time_. Ask Martin. Oh you don't even have to,
he told you two mails ago. My last mail simply
Note that last line.
Segher, how about you just accept that Linux uses gcc as per reality,
and
that sometimes the reality is different from your expectations?
+m works.
It works _most of the time_. Ask Martin. Oh you don't even have to,
he told you two mails ago. My last mail simply
Yes, though I would use =m on the output list and m on the input
list. The reason is that I've seen gcc fall on its face with an ICE
on
s390 due to +m. The explanation I've got from our compiler people
was
quite esoteric, as far as I remember gcc splits +m to an input
operand
and an output
401 - 500 of 836 matches
Mail list logo