Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-27 Thread Will Deacon
Hi Dan,

On Tue, Jun 27, 2017 at 12:07:20AM +, Daniel Lustig wrote:
> > > https://github.com/riscv/riscv-isa-manual/releases/download/riscv-user
> > > -2.2/riscv-spec-v2.2.pdf
> > 
> > That's the most up to date spec.
> 
> Yes, that's the most up to date public spec.  Internally, the RISC-V memory
> model task group has been working on fixing the memory model spec for the
> past couple of months now.  We're aiming to release it for public review
> well before the end of the year.  Hopefully in the coming weeks even.

Excellent, cheers for the update.

> > > which, on the one hand is reassuring (because ignoring dependency
> > > ordering is plain broken), but on the other it doesn't go quite far
> > > enough in defining exactly what constitutes a "syntactic data
> > > dependency". The cumulativity of your fences also needs defining,
> > > because I think this was up in the air at some point and the document
> > > above doesn't seem to tackle it (it doesn't seem to describe what
> > > constitutes being a memory of the predecessor or successor sets)
> 
> That will all covered in the new spec.
> 
> > > P.S. You should also totally get your architects to write a formal
> > > model ;)
> 
> Also in progress :)

3/3 :)

> Were there any more specific questions I can answer in the meantime?  Or
> any specific concern you'd like to point me to?

Nothing specific, but we won't be able to review the
memory-ordering/atomics/locking parts of this patch series until we have
a spec.

Will


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-27 Thread Will Deacon
Hi Dan,

On Tue, Jun 27, 2017 at 12:07:20AM +, Daniel Lustig wrote:
> > > https://github.com/riscv/riscv-isa-manual/releases/download/riscv-user
> > > -2.2/riscv-spec-v2.2.pdf
> > 
> > That's the most up to date spec.
> 
> Yes, that's the most up to date public spec.  Internally, the RISC-V memory
> model task group has been working on fixing the memory model spec for the
> past couple of months now.  We're aiming to release it for public review
> well before the end of the year.  Hopefully in the coming weeks even.

Excellent, cheers for the update.

> > > which, on the one hand is reassuring (because ignoring dependency
> > > ordering is plain broken), but on the other it doesn't go quite far
> > > enough in defining exactly what constitutes a "syntactic data
> > > dependency". The cumulativity of your fences also needs defining,
> > > because I think this was up in the air at some point and the document
> > > above doesn't seem to tackle it (it doesn't seem to describe what
> > > constitutes being a memory of the predecessor or successor sets)
> 
> That will all covered in the new spec.
> 
> > > P.S. You should also totally get your architects to write a formal
> > > model ;)
> 
> Also in progress :)

3/3 :)

> Were there any more specific questions I can answer in the meantime?  Or
> any specific concern you'd like to point me to?

Nothing specific, but we won't be able to review the
memory-ordering/atomics/locking parts of this patch series until we have
a spec.

Will


RE: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-26 Thread Daniel Lustig
> > https://github.com/riscv/riscv-isa-manual/releases/download/riscv-user
> > -2.2/riscv-spec-v2.2.pdf
> 
> That's the most up to date spec.

Yes, that's the most up to date public spec.  Internally, the RISC-V memory
model task group has been working on fixing the memory model spec for the
past couple of months now.  We're aiming to release it for public review
well before the end of the year.  Hopefully in the coming weeks even.

> > which, on the one hand is reassuring (because ignoring dependency
> > ordering is plain broken), but on the other it doesn't go quite far
> > enough in defining exactly what constitutes a "syntactic data
> > dependency". The cumulativity of your fences also needs defining,
> > because I think this was up in the air at some point and the document
> > above doesn't seem to tackle it (it doesn't seem to describe what
> > constitutes being a memory of the predecessor or successor sets)

That will all covered in the new spec.

> > P.S. You should also totally get your architects to write a formal
> > model ;)

Also in progress :)

Were there any more specific questions I can answer in the meantime?  Or
any specific concern you'd like to point me to?

Dan


RE: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-26 Thread Daniel Lustig
> > https://github.com/riscv/riscv-isa-manual/releases/download/riscv-user
> > -2.2/riscv-spec-v2.2.pdf
> 
> That's the most up to date spec.

Yes, that's the most up to date public spec.  Internally, the RISC-V memory
model task group has been working on fixing the memory model spec for the
past couple of months now.  We're aiming to release it for public review
well before the end of the year.  Hopefully in the coming weeks even.

> > which, on the one hand is reassuring (because ignoring dependency
> > ordering is plain broken), but on the other it doesn't go quite far
> > enough in defining exactly what constitutes a "syntactic data
> > dependency". The cumulativity of your fences also needs defining,
> > because I think this was up in the air at some point and the document
> > above doesn't seem to tackle it (it doesn't seem to describe what
> > constitutes being a memory of the predecessor or successor sets)

That will all covered in the new spec.

> > P.S. You should also totally get your architects to write a formal
> > model ;)

Also in progress :)

Were there any more specific questions I can answer in the meantime?  Or
any specific concern you'd like to point me to?

Dan


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-26 Thread Palmer Dabbelt
On Wed, 07 Jun 2017 06:16:11 PDT (-0700), will.dea...@arm.com wrote:
> [sorry, jumping in here because it's the only mail I have relating to
>  patch 13]
>
> On Wed, Jun 07, 2017 at 02:58:50PM +0200, Peter Zijlstra wrote:
>> On Wed, Jun 07, 2017 at 02:36:27PM +0200, Peter Zijlstra wrote:
>> > Which (pending the sub confusion) will generate the entire set of:
>> >
>> >  atomic_add, atomic_add_return{_relaxed,_acquire,_release,} 
>> > atomic_fetch_add{_relaxed,_acquire,_release,}
>> >  atomic_sub, atomic_sub_return{_relaxed,_acquire,_release,} 
>> > atomic_fetch_sub{_relaxed,_acquire,_release,}
>> >
>> >  atomic_and, atomic_fetch_and{_relaxed,_acquire,_release,}
>> >  atomic_or,  atomic_fetch_or{_relaxed,_acquire,_release,}
>> >  atomic_xor, atomic_fetch_xor{_relaxed,_acquire,_release,}
>> >
>>
>> Another approach would be to override __atomic_op_{acquire,release} and
>> use things like:
>>
>>  "FENCE r,rw" -- (load) ACQUIRE
>>  "FENCE rw,w" -- (store) RELEASE
>>
>> And then you only need to provide _relaxed atomics.
>>
>> Also, and I didn't check for that, you need to provide:
>>
>> smp_load_acquire(), smp_store_release(), atomic_read_acquire(),
>> atomic_store_release().
>
> Is there an up-to-date specification for the RISC-V memory model? I looked
> at:
>
> https://github.com/riscv/riscv-isa-manual/releases/download/riscv-user-2.2/riscv-spec-v2.2.pdf

That's the most up to date spec.

> but it says:
>
> | 2.7 Memory Model
> | This section is out of date as the RISC-V memory model is
> | currently under revision to ensure it can efficiently support current
> | programming language memory models. The revised base mem- ory model will
> | contain further ordering constraints, including at least that loads to the
> | same address from the same hart cannot be reordered, and that syntactic data
> | dependencies between instructions are respected
>
> which, on the one hand is reassuring (because ignoring dependency ordering is
> plain broken), but on the other it doesn't go quite far enough in defining
> exactly what constitutes a "syntactic data dependency". The cumulativity of
> your fences also needs defining, because I think this was up in the air at 
> some
> point and the document above doesn't seem to tackle it (it doesn't seem to
> describe what constitutes being a memory of the predecessor or successor sets)

Unfortunately I'm not really a formal memory model guy, so I probably know a
lot less about what a "syntactic data dependency" is than you do.  I believe
the plan (like for most of RISC-V) is to avoid doing anything weird, to support
existing software systems, and to allow for implementation flexibility where
possible.

> Could you shed some light on this please? We've started relying on RW control
> dependencies in semi-recent history, so it's important to get this nailed 
> down.

This has been a sticking point for a while.  The result of lacking a proper
memory model has been that implementations have been extremely conservative and
as a result there aren't any issues in practice, but that itself is a bad
thing.

> Thanks,
>
> Will
>
> P.S. You should also totally get your architects to write a formal model ;)

The RISC-V organization has a working group defining a formal memory model.
Here's the original posting about the working group

  https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/Oxm_IvfYItY

To the best of my understanding we hope to have a formal memory model defined
by the end of the year.  I've added Daniel Lustig, the chair of the working
group, to the thread.  He knows a lot more about this than I do.


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-26 Thread Palmer Dabbelt
On Wed, 07 Jun 2017 06:16:11 PDT (-0700), will.dea...@arm.com wrote:
> [sorry, jumping in here because it's the only mail I have relating to
>  patch 13]
>
> On Wed, Jun 07, 2017 at 02:58:50PM +0200, Peter Zijlstra wrote:
>> On Wed, Jun 07, 2017 at 02:36:27PM +0200, Peter Zijlstra wrote:
>> > Which (pending the sub confusion) will generate the entire set of:
>> >
>> >  atomic_add, atomic_add_return{_relaxed,_acquire,_release,} 
>> > atomic_fetch_add{_relaxed,_acquire,_release,}
>> >  atomic_sub, atomic_sub_return{_relaxed,_acquire,_release,} 
>> > atomic_fetch_sub{_relaxed,_acquire,_release,}
>> >
>> >  atomic_and, atomic_fetch_and{_relaxed,_acquire,_release,}
>> >  atomic_or,  atomic_fetch_or{_relaxed,_acquire,_release,}
>> >  atomic_xor, atomic_fetch_xor{_relaxed,_acquire,_release,}
>> >
>>
>> Another approach would be to override __atomic_op_{acquire,release} and
>> use things like:
>>
>>  "FENCE r,rw" -- (load) ACQUIRE
>>  "FENCE rw,w" -- (store) RELEASE
>>
>> And then you only need to provide _relaxed atomics.
>>
>> Also, and I didn't check for that, you need to provide:
>>
>> smp_load_acquire(), smp_store_release(), atomic_read_acquire(),
>> atomic_store_release().
>
> Is there an up-to-date specification for the RISC-V memory model? I looked
> at:
>
> https://github.com/riscv/riscv-isa-manual/releases/download/riscv-user-2.2/riscv-spec-v2.2.pdf

That's the most up to date spec.

> but it says:
>
> | 2.7 Memory Model
> | This section is out of date as the RISC-V memory model is
> | currently under revision to ensure it can efficiently support current
> | programming language memory models. The revised base mem- ory model will
> | contain further ordering constraints, including at least that loads to the
> | same address from the same hart cannot be reordered, and that syntactic data
> | dependencies between instructions are respected
>
> which, on the one hand is reassuring (because ignoring dependency ordering is
> plain broken), but on the other it doesn't go quite far enough in defining
> exactly what constitutes a "syntactic data dependency". The cumulativity of
> your fences also needs defining, because I think this was up in the air at 
> some
> point and the document above doesn't seem to tackle it (it doesn't seem to
> describe what constitutes being a memory of the predecessor or successor sets)

Unfortunately I'm not really a formal memory model guy, so I probably know a
lot less about what a "syntactic data dependency" is than you do.  I believe
the plan (like for most of RISC-V) is to avoid doing anything weird, to support
existing software systems, and to allow for implementation flexibility where
possible.

> Could you shed some light on this please? We've started relying on RW control
> dependencies in semi-recent history, so it's important to get this nailed 
> down.

This has been a sticking point for a while.  The result of lacking a proper
memory model has been that implementations have been extremely conservative and
as a result there aren't any issues in practice, but that itself is a bad
thing.

> Thanks,
>
> Will
>
> P.S. You should also totally get your architects to write a formal model ;)

The RISC-V organization has a working group defining a formal memory model.
Here's the original posting about the working group

  https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/Oxm_IvfYItY

To the best of my understanding we hope to have a formal memory model defined
by the end of the year.  I've added Daniel Lustig, the chair of the working
group, to the thread.  He knows a lot more about this than I do.


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-26 Thread Palmer Dabbelt
On Wed, 07 Jun 2017 05:58:50 PDT (-0700), pet...@infradead.org wrote:
> On Wed, Jun 07, 2017 at 02:36:27PM +0200, Peter Zijlstra wrote:
>> Which (pending the sub confusion) will generate the entire set of:
>>
>>  atomic_add, atomic_add_return{_relaxed,_acquire,_release,} 
>> atomic_fetch_add{_relaxed,_acquire,_release,}
>>  atomic_sub, atomic_sub_return{_relaxed,_acquire,_release,} 
>> atomic_fetch_sub{_relaxed,_acquire,_release,}
>>
>>  atomic_and, atomic_fetch_and{_relaxed,_acquire,_release,}
>>  atomic_or,  atomic_fetch_or{_relaxed,_acquire,_release,}
>>  atomic_xor, atomic_fetch_xor{_relaxed,_acquire,_release,}
>>
>
> Another approach would be to override __atomic_op_{acquire,release} and
> use things like:
>
>   "FENCE r,rw" -- (load) ACQUIRE
>   "FENCE rw,w" -- (store) RELEASE
>
> And then you only need to provide _relaxed atomics.
>
> Also, and I didn't check for that, you need to provide:
>
> smp_load_acquire(), smp_store_release(), atomic_read_acquire(),
> atomic_store_release().

OK, thanks for looking so deeply into this.  Sorry it was such a mess, I
thought I included a note somewhere that this all needed to be redone -- I just
wanted to get a v2 out first as that split all the drivers out.  I've went
ahead and completely rewrote atomic.h using your suggestions in a slightly
modified way.  It includes

 * _relaxed, _acquire, and _release versions of everything via a bunch of 
macros.
 * What I believe to be correct aqrl bits on every op.
 * 64-bit and 32-bit atomics (as opposed to just copying everything)

I didn't implement try_cmpxchg yet.  I'm going to go ahead and sort through our
memory barriers, look at the few remaining CR comments from our v2, and then
submit a v3 patch set.

I'm only replying to this message, but I believe I'll have taken into account
all your comments for the v3.

Thanks, again, for your time!


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-26 Thread Palmer Dabbelt
On Wed, 07 Jun 2017 05:58:50 PDT (-0700), pet...@infradead.org wrote:
> On Wed, Jun 07, 2017 at 02:36:27PM +0200, Peter Zijlstra wrote:
>> Which (pending the sub confusion) will generate the entire set of:
>>
>>  atomic_add, atomic_add_return{_relaxed,_acquire,_release,} 
>> atomic_fetch_add{_relaxed,_acquire,_release,}
>>  atomic_sub, atomic_sub_return{_relaxed,_acquire,_release,} 
>> atomic_fetch_sub{_relaxed,_acquire,_release,}
>>
>>  atomic_and, atomic_fetch_and{_relaxed,_acquire,_release,}
>>  atomic_or,  atomic_fetch_or{_relaxed,_acquire,_release,}
>>  atomic_xor, atomic_fetch_xor{_relaxed,_acquire,_release,}
>>
>
> Another approach would be to override __atomic_op_{acquire,release} and
> use things like:
>
>   "FENCE r,rw" -- (load) ACQUIRE
>   "FENCE rw,w" -- (store) RELEASE
>
> And then you only need to provide _relaxed atomics.
>
> Also, and I didn't check for that, you need to provide:
>
> smp_load_acquire(), smp_store_release(), atomic_read_acquire(),
> atomic_store_release().

OK, thanks for looking so deeply into this.  Sorry it was such a mess, I
thought I included a note somewhere that this all needed to be redone -- I just
wanted to get a v2 out first as that split all the drivers out.  I've went
ahead and completely rewrote atomic.h using your suggestions in a slightly
modified way.  It includes

 * _relaxed, _acquire, and _release versions of everything via a bunch of 
macros.
 * What I believe to be correct aqrl bits on every op.
 * 64-bit and 32-bit atomics (as opposed to just copying everything)

I didn't implement try_cmpxchg yet.  I'm going to go ahead and sort through our
memory barriers, look at the few remaining CR comments from our v2, and then
submit a v3 patch set.

I'm only replying to this message, but I believe I'll have taken into account
all your comments for the v3.

Thanks, again, for your time!


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-26 Thread Palmer Dabbelt
On Wed, 07 Jun 2017 06:17:27 PDT (-0700), pet...@infradead.org wrote:
> On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
>> diff --git a/arch/riscv/include/asm/spinlock.h 
>> b/arch/riscv/include/asm/spinlock.h
>> new file mode 100644
>> index ..9736f5714e54
>> --- /dev/null
>> +++ b/arch/riscv/include/asm/spinlock.h
>> @@ -0,0 +1,155 @@
>> +/*
>> + * Copyright (C) 2015 Regents of the University of California
>> + *
>> + *   This program is free software; you can redistribute it and/or
>> + *   modify it under the terms of the GNU General Public License
>> + *   as published by the Free Software Foundation, version 2.
>> + *
>> + *   This program is distributed in the hope that it will be useful,
>> + *   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + *   GNU General Public License for more details.
>> + */
>> +
>> +#ifndef _ASM_RISCV_SPINLOCK_H
>> +#define _ASM_RISCV_SPINLOCK_H
>> +
>> +#include 
>> +#include 
>> +
>> +/*
>> + * Simple spin lock operations.  These provide no fairness guarantees.
>> + */
>
> Any reason to use a test-and-set spinlock at all?

Just simplicity.  I looked at the MIPS ticket lock and I think we can implement
that while still maintaining the forward progress constraints on our LR/SC
sequences.  I added a FIXME about it, which I'll try to get around to

  
https://github.com/riscv/riscv-linux/commit/a75d28c849e695639b7909ffa88ce571abfb0c76

but I'm going to try and get a v3 patch set out first.

>> +
>> +#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
>> +#define arch_spin_is_locked(x)  ((x)->lock != 0)
>> +#define arch_spin_unlock_wait(x) \
>> +do { cpu_relax(); } while ((x)->lock)
>
> Hehe, yeah, no ;-) There are ordering constraints on that.

OK.  I copied the ordering guarntees from MIPS here

  
https://github.com/riscv/riscv-linux/commit/7d16920452c6bd14c847aade2d51c56d2a1ae457

>> +
>> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
>> +{
>> +__asm__ __volatile__ (
>> +"amoswap.w.rl x0, x0, %0"
>> +: "=A" (lock->lock)
>> +:: "memory");
>> +}
>> +
>> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
>> +{
>> +int tmp = 1, busy;
>> +
>> +__asm__ __volatile__ (
>> +"amoswap.w.aq %0, %2, %1"
>> +: "=r" (busy), "+A" (lock->lock)
>> +: "r" (tmp)
>> +: "memory");
>> +
>> +return !busy;
>> +}
>> +
>> +static inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> +while (1) {
>> +if (arch_spin_is_locked(lock))
>> +continue;
>> +
>> +if (arch_spin_trylock(lock))
>> +break;
>> +}
>> +}

Thanks for all the comments!


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-26 Thread Palmer Dabbelt
On Wed, 07 Jun 2017 06:17:27 PDT (-0700), pet...@infradead.org wrote:
> On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
>> diff --git a/arch/riscv/include/asm/spinlock.h 
>> b/arch/riscv/include/asm/spinlock.h
>> new file mode 100644
>> index ..9736f5714e54
>> --- /dev/null
>> +++ b/arch/riscv/include/asm/spinlock.h
>> @@ -0,0 +1,155 @@
>> +/*
>> + * Copyright (C) 2015 Regents of the University of California
>> + *
>> + *   This program is free software; you can redistribute it and/or
>> + *   modify it under the terms of the GNU General Public License
>> + *   as published by the Free Software Foundation, version 2.
>> + *
>> + *   This program is distributed in the hope that it will be useful,
>> + *   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + *   GNU General Public License for more details.
>> + */
>> +
>> +#ifndef _ASM_RISCV_SPINLOCK_H
>> +#define _ASM_RISCV_SPINLOCK_H
>> +
>> +#include 
>> +#include 
>> +
>> +/*
>> + * Simple spin lock operations.  These provide no fairness guarantees.
>> + */
>
> Any reason to use a test-and-set spinlock at all?

Just simplicity.  I looked at the MIPS ticket lock and I think we can implement
that while still maintaining the forward progress constraints on our LR/SC
sequences.  I added a FIXME about it, which I'll try to get around to

  
https://github.com/riscv/riscv-linux/commit/a75d28c849e695639b7909ffa88ce571abfb0c76

but I'm going to try and get a v3 patch set out first.

>> +
>> +#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
>> +#define arch_spin_is_locked(x)  ((x)->lock != 0)
>> +#define arch_spin_unlock_wait(x) \
>> +do { cpu_relax(); } while ((x)->lock)
>
> Hehe, yeah, no ;-) There are ordering constraints on that.

OK.  I copied the ordering guarntees from MIPS here

  
https://github.com/riscv/riscv-linux/commit/7d16920452c6bd14c847aade2d51c56d2a1ae457

>> +
>> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
>> +{
>> +__asm__ __volatile__ (
>> +"amoswap.w.rl x0, x0, %0"
>> +: "=A" (lock->lock)
>> +:: "memory");
>> +}
>> +
>> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
>> +{
>> +int tmp = 1, busy;
>> +
>> +__asm__ __volatile__ (
>> +"amoswap.w.aq %0, %2, %1"
>> +: "=r" (busy), "+A" (lock->lock)
>> +: "r" (tmp)
>> +: "memory");
>> +
>> +return !busy;
>> +}
>> +
>> +static inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> +while (1) {
>> +if (arch_spin_is_locked(lock))
>> +continue;
>> +
>> +if (arch_spin_trylock(lock))
>> +break;
>> +}
>> +}

Thanks for all the comments!


Re: [patches] Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-24 Thread Benjamin Herrenschmidt
On Sat, 2017-06-24 at 14:32 -0700, Palmer Dabbelt wrote:
> On Sat, 24 Jun 2017 08:42:05 PDT (-0700), b...@kernel.crashing.org wrote:
> > On Fri, 2017-06-23 at 19:01 -0700, Palmer Dabbelt wrote:
> > > > > +#define mmiowb()   __asm__ __volatile__ ("fence io,io" : : : 
> > > > > "memory");
> > 
> > I forgot if we already mentioned that but mmiowb is primarily intended
> > to order MMIO stores vs. a subsequent spin_unlock.
> > 
> > I'm not sure an IO only fence is sufficient here.
> > 
> > Note that I've never trusted drivers to get that right, it's a rather
> > bad abstraction to begin with, so on powerpc, instead, I just set a
> > per-cpu flag on every non-relaxed MMIO write and test it in spin_unlock
> > in order to "beef up" the barrier in there if necessary.
> 
> Sorry about that -- I thought I'd included a note somewhere that the atomics
> and barriers weren't ready to go yet, as we'd found a bunch of problems with
> them in the first review and I needed to go through them all.  Arnd suggested
> copying the PowerPC approach to mmiowb and I like that better, so we're going
> to use it.

Ah yes, I did see your note, I just wasn't sure we had clarified the
mmiowb case and thought it was worth mentioning.

Cheers,
Ben,



Re: [patches] Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-24 Thread Benjamin Herrenschmidt
On Sat, 2017-06-24 at 14:32 -0700, Palmer Dabbelt wrote:
> On Sat, 24 Jun 2017 08:42:05 PDT (-0700), b...@kernel.crashing.org wrote:
> > On Fri, 2017-06-23 at 19:01 -0700, Palmer Dabbelt wrote:
> > > > > +#define mmiowb()   __asm__ __volatile__ ("fence io,io" : : : 
> > > > > "memory");
> > 
> > I forgot if we already mentioned that but mmiowb is primarily intended
> > to order MMIO stores vs. a subsequent spin_unlock.
> > 
> > I'm not sure an IO only fence is sufficient here.
> > 
> > Note that I've never trusted drivers to get that right, it's a rather
> > bad abstraction to begin with, so on powerpc, instead, I just set a
> > per-cpu flag on every non-relaxed MMIO write and test it in spin_unlock
> > in order to "beef up" the barrier in there if necessary.
> 
> Sorry about that -- I thought I'd included a note somewhere that the atomics
> and barriers weren't ready to go yet, as we'd found a bunch of problems with
> them in the first review and I needed to go through them all.  Arnd suggested
> copying the PowerPC approach to mmiowb and I like that better, so we're going
> to use it.

Ah yes, I did see your note, I just wasn't sure we had clarified the
mmiowb case and thought it was worth mentioning.

Cheers,
Ben,



Re: [patches] Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-24 Thread Palmer Dabbelt
On Sat, 24 Jun 2017 08:42:05 PDT (-0700), b...@kernel.crashing.org wrote:
> On Fri, 2017-06-23 at 19:01 -0700, Palmer Dabbelt wrote:
>> > > +#define mmiowb()       __asm__ __volatile__ ("fence io,io" : : : 
>> > > "memory");
>
> I forgot if we already mentioned that but mmiowb is primarily intended
> to order MMIO stores vs. a subsequent spin_unlock.
>
> I'm not sure an IO only fence is sufficient here.
>
> Note that I've never trusted drivers to get that right, it's a rather
> bad abstraction to begin with, so on powerpc, instead, I just set a
> per-cpu flag on every non-relaxed MMIO write and test it in spin_unlock
> in order to "beef up" the barrier in there if necessary.

Sorry about that -- I thought I'd included a note somewhere that the atomics
and barriers weren't ready to go yet, as we'd found a bunch of problems with
them in the first review and I needed to go through them all.  Arnd suggested
copying the PowerPC approach to mmiowb and I like that better, so we're going
to use it.

Thanks!


Re: [patches] Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-24 Thread Palmer Dabbelt
On Sat, 24 Jun 2017 08:42:05 PDT (-0700), b...@kernel.crashing.org wrote:
> On Fri, 2017-06-23 at 19:01 -0700, Palmer Dabbelt wrote:
>> > > +#define mmiowb()       __asm__ __volatile__ ("fence io,io" : : : 
>> > > "memory");
>
> I forgot if we already mentioned that but mmiowb is primarily intended
> to order MMIO stores vs. a subsequent spin_unlock.
>
> I'm not sure an IO only fence is sufficient here.
>
> Note that I've never trusted drivers to get that right, it's a rather
> bad abstraction to begin with, so on powerpc, instead, I just set a
> per-cpu flag on every non-relaxed MMIO write and test it in spin_unlock
> in order to "beef up" the barrier in there if necessary.

Sorry about that -- I thought I'd included a note somewhere that the atomics
and barriers weren't ready to go yet, as we'd found a bunch of problems with
them in the first review and I needed to go through them all.  Arnd suggested
copying the PowerPC approach to mmiowb and I like that better, so we're going
to use it.

Thanks!


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-24 Thread Benjamin Herrenschmidt
On Fri, 2017-06-23 at 19:01 -0700, Palmer Dabbelt wrote:
> > > +#define mmiowb()   __asm__ __volatile__ ("fence io,io" : : : 
> > > "memory");

I forgot if we already mentioned that but mmiowb is primarily intended
to order MMIO stores vs. a subsequent spin_unlock.

I'm not sure an IO only fence is sufficient here.

Note that I've never trusted drivers to get that right, it's a rather
bad abstraction to begin with, so on powerpc, instead, I just set a
per-cpu flag on every non-relaxed MMIO write and test it in spin_unlock
in order to "beef up" the barrier in there if necessary.

Cheers,
Ben.



Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-24 Thread Benjamin Herrenschmidt
On Fri, 2017-06-23 at 19:01 -0700, Palmer Dabbelt wrote:
> > > +#define mmiowb()   __asm__ __volatile__ ("fence io,io" : : : 
> > > "memory");

I forgot if we already mentioned that but mmiowb is primarily intended
to order MMIO stores vs. a subsequent spin_unlock.

I'm not sure an IO only fence is sufficient here.

Note that I've never trusted drivers to get that right, it's a rather
bad abstraction to begin with, so on powerpc, instead, I just set a
per-cpu flag on every non-relaxed MMIO write and test it in spin_unlock
in order to "beef up" the barrier in there if necessary.

Cheers,
Ben.



Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-23 Thread Palmer Dabbelt
On Wed, 07 Jun 2017 01:12:00 PDT (-0700), Arnd Bergmann wrote:
> On Wed, Jun 7, 2017 at 1:00 AM, Palmer Dabbelt  wrote:
>> This patch adds the include files for the RISC-V port.  These are mostly
>> based on the score port, but there are a lot of arm64-based files as
>> well.
>>
>> Signed-off-by: Palmer Dabbelt 
>
> It might be better to split this up into several parts, as the patch
> is longer than
> most people are willing to review at once.
>
> The uapi should definitely be a separate patch, as it includes the parts that
> cannot be changed any more later. memory management (pgtable, mmu,
> uaccess) would be another part to split out, and possibly all the atomics
> in one separate patch (along with spinlocks and bitops).

OK, we'll do this for the v3.

>
>> +
>> +/* IO barriers.  These only fence on the IO bits because they're only 
>> required
>> + * to order device access.  We're defining mmiowb because our AMO 
>> instructions
>> + * (which are used to implement locks) don't specify ordering.  From 
>> Chapter 7
>> + * of v2.2 of the user ISA:
>> + * "The bits order accesses to one of the two address domains, memory or 
>> I/O,
>> + * depending on which address domain the atomic instruction is accessing. No
>> + * ordering constraint is implied to accesses to the other domain, and a 
>> FENCE
>> + * instruction should be used to order across both domains."
>> + */
>> +
>> +#define __iormb()  __asm__ __volatile__ ("fence i,io" : : : "memory");
>> +#define __iowmb()  __asm__ __volatile__ ("fence io,o" : : : "memory");
>> +
>> +#define mmiowb()   __asm__ __volatile__ ("fence io,io" : : : "memory");
>> +
>> +/*
>> + * Relaxed I/O memory access primitives. These follow the Device memory
>> + * ordering rules but do not guarantee any ordering relative to Normal 
>> memory
>> + * accesses.
>> + */
>> +#define readb_relaxed(c)   ({ u8  __r = __raw_readb(c); __r; })
>> +#define readw_relaxed(c)   ({ u16 __r = le16_to_cpu((__force 
>> __le16)__raw_readw(c)); __r; })
>> +#define readl_relaxed(c)   ({ u32 __r = le32_to_cpu((__force 
>> __le32)__raw_readl(c)); __r; })
>> +#define readq_relaxed(c)   ({ u64 __r = le64_to_cpu((__force 
>> __le64)__raw_readq(c)); __r; })
>> +
>> +#define writeb_relaxed(v,c)((void)__raw_writeb((v),(c)))
>> +#define writew_relaxed(v,c)((void)__raw_writew((__force 
>> u16)cpu_to_le16(v),(c)))
>> +#define writel_relaxed(v,c)((void)__raw_writel((__force 
>> u32)cpu_to_le32(v),(c)))
>> +#define writeq_relaxed(v,c)((void)__raw_writeq((__force 
>> u64)cpu_to_le64(v),(c)))
>> +
>> +/*
>> + * I/O memory access primitives. Reads are ordered relative to any
>> + * following Normal memory access. Writes are ordered relative to any prior
>> + * Normal memory access.
>> + */
>> +#define readb(c)   ({ u8  __v = readb_relaxed(c); __iormb(); 
>> __v; })
>> +#define readw(c)   ({ u16 __v = readw_relaxed(c); __iormb(); 
>> __v; })
>> +#define readl(c)   ({ u32 __v = readl_relaxed(c); __iormb(); 
>> __v; })
>> +#define readq(c)   ({ u64 __v = readq_relaxed(c); __iormb(); 
>> __v; })
>> +
>> +#define writeb(v,c)({ __iowmb(); writeb_relaxed((v),(c)); })
>> +#define writew(v,c)({ __iowmb(); writew_relaxed((v),(c)); })
>> +#define writel(v,c)({ __iowmb(); writel_relaxed((v),(c)); })
>> +#define writeq(v,c)({ __iowmb(); writeq_relaxed((v),(c)); })
>> +
>> +#include 
>
> These do not yet contain all the changes we discussed: the relaxed operations
> don't seem to be ordered against one another and the regular accessors
> are not ordered against DMA.

Sorry, I must have forgotten to write this -- I just wanted to push out a v3
patch set without the changes to the atomics so everything else could be looked
at.  I wanted to just go through the atomics completely and fix them, as I
found a handful of problems (everything was missing the AQ and RL bits, for
example) and figured it would be best to just get them done right.

I think that's not something for after dinner on a Friday, but hopefully I'll
get to it tomorrow morning.


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-23 Thread Palmer Dabbelt
On Wed, 07 Jun 2017 01:12:00 PDT (-0700), Arnd Bergmann wrote:
> On Wed, Jun 7, 2017 at 1:00 AM, Palmer Dabbelt  wrote:
>> This patch adds the include files for the RISC-V port.  These are mostly
>> based on the score port, but there are a lot of arm64-based files as
>> well.
>>
>> Signed-off-by: Palmer Dabbelt 
>
> It might be better to split this up into several parts, as the patch
> is longer than
> most people are willing to review at once.
>
> The uapi should definitely be a separate patch, as it includes the parts that
> cannot be changed any more later. memory management (pgtable, mmu,
> uaccess) would be another part to split out, and possibly all the atomics
> in one separate patch (along with spinlocks and bitops).

OK, we'll do this for the v3.

>
>> +
>> +/* IO barriers.  These only fence on the IO bits because they're only 
>> required
>> + * to order device access.  We're defining mmiowb because our AMO 
>> instructions
>> + * (which are used to implement locks) don't specify ordering.  From 
>> Chapter 7
>> + * of v2.2 of the user ISA:
>> + * "The bits order accesses to one of the two address domains, memory or 
>> I/O,
>> + * depending on which address domain the atomic instruction is accessing. No
>> + * ordering constraint is implied to accesses to the other domain, and a 
>> FENCE
>> + * instruction should be used to order across both domains."
>> + */
>> +
>> +#define __iormb()  __asm__ __volatile__ ("fence i,io" : : : "memory");
>> +#define __iowmb()  __asm__ __volatile__ ("fence io,o" : : : "memory");
>> +
>> +#define mmiowb()   __asm__ __volatile__ ("fence io,io" : : : "memory");
>> +
>> +/*
>> + * Relaxed I/O memory access primitives. These follow the Device memory
>> + * ordering rules but do not guarantee any ordering relative to Normal 
>> memory
>> + * accesses.
>> + */
>> +#define readb_relaxed(c)   ({ u8  __r = __raw_readb(c); __r; })
>> +#define readw_relaxed(c)   ({ u16 __r = le16_to_cpu((__force 
>> __le16)__raw_readw(c)); __r; })
>> +#define readl_relaxed(c)   ({ u32 __r = le32_to_cpu((__force 
>> __le32)__raw_readl(c)); __r; })
>> +#define readq_relaxed(c)   ({ u64 __r = le64_to_cpu((__force 
>> __le64)__raw_readq(c)); __r; })
>> +
>> +#define writeb_relaxed(v,c)((void)__raw_writeb((v),(c)))
>> +#define writew_relaxed(v,c)((void)__raw_writew((__force 
>> u16)cpu_to_le16(v),(c)))
>> +#define writel_relaxed(v,c)((void)__raw_writel((__force 
>> u32)cpu_to_le32(v),(c)))
>> +#define writeq_relaxed(v,c)((void)__raw_writeq((__force 
>> u64)cpu_to_le64(v),(c)))
>> +
>> +/*
>> + * I/O memory access primitives. Reads are ordered relative to any
>> + * following Normal memory access. Writes are ordered relative to any prior
>> + * Normal memory access.
>> + */
>> +#define readb(c)   ({ u8  __v = readb_relaxed(c); __iormb(); 
>> __v; })
>> +#define readw(c)   ({ u16 __v = readw_relaxed(c); __iormb(); 
>> __v; })
>> +#define readl(c)   ({ u32 __v = readl_relaxed(c); __iormb(); 
>> __v; })
>> +#define readq(c)   ({ u64 __v = readq_relaxed(c); __iormb(); 
>> __v; })
>> +
>> +#define writeb(v,c)({ __iowmb(); writeb_relaxed((v),(c)); })
>> +#define writew(v,c)({ __iowmb(); writew_relaxed((v),(c)); })
>> +#define writel(v,c)({ __iowmb(); writel_relaxed((v),(c)); })
>> +#define writeq(v,c)({ __iowmb(); writeq_relaxed((v),(c)); })
>> +
>> +#include 
>
> These do not yet contain all the changes we discussed: the relaxed operations
> don't seem to be ordered against one another and the regular accessors
> are not ordered against DMA.

Sorry, I must have forgotten to write this -- I just wanted to push out a v3
patch set without the changes to the atomics so everything else could be looked
at.  I wanted to just go through the atomics completely and fix them, as I
found a handful of problems (everything was missing the AQ and RL bits, for
example) and figured it would be best to just get them done right.

I think that's not something for after dinner on a Friday, but hopefully I'll
get to it tomorrow morning.


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-09 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 03:17:27PM +0200, Peter Zijlstra wrote:

> > +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > +{
> > +   __asm__ __volatile__ (
> > +   "amoswap.w.rl x0, x0, %0"
> > +   : "=A" (lock->lock)
> > +   :: "memory");
> > +}
> > +
> > +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> > +{
> > +   int tmp = 1, busy;
> > +
> > +   __asm__ __volatile__ (
> > +   "amoswap.w.aq %0, %2, %1"
> > +   : "=r" (busy), "+A" (lock->lock)
> > +   : "r" (tmp)
> > +   : "memory");
> > +
> > +   return !busy;
> > +}

One other thing, you need to describe the acquire/release semantics for
your platform. Is the above lock RCpc or RCsc ? If RCpc, you need to
look into adding smp_mb__after_unlock_lock().


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-09 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 03:17:27PM +0200, Peter Zijlstra wrote:

> > +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > +{
> > +   __asm__ __volatile__ (
> > +   "amoswap.w.rl x0, x0, %0"
> > +   : "=A" (lock->lock)
> > +   :: "memory");
> > +}
> > +
> > +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> > +{
> > +   int tmp = 1, busy;
> > +
> > +   __asm__ __volatile__ (
> > +   "amoswap.w.aq %0, %2, %1"
> > +   : "=r" (busy), "+A" (lock->lock)
> > +   : "r" (tmp)
> > +   : "memory");
> > +
> > +   return !busy;
> > +}

One other thing, you need to describe the acquire/release semantics for
your platform. Is the above lock RCpc or RCsc ? If RCpc, you need to
look into adding smp_mb__after_unlock_lock().


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 02:58:50PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 07, 2017 at 02:36:27PM +0200, Peter Zijlstra wrote:
> > Which (pending the sub confusion) will generate the entire set of:
> > 
> >  atomic_add, atomic_add_return{_relaxed,_acquire,_release,} 
> > atomic_fetch_add{_relaxed,_acquire,_release,}
> >  atomic_sub, atomic_sub_return{_relaxed,_acquire,_release,} 
> > atomic_fetch_sub{_relaxed,_acquire,_release,}
> > 
> >  atomic_and, atomic_fetch_and{_relaxed,_acquire,_release,}
> >  atomic_or,  atomic_fetch_or{_relaxed,_acquire,_release,}
> >  atomic_xor, atomic_fetch_xor{_relaxed,_acquire,_release,}
> > 
> 
> Another approach would be to override __atomic_op_{acquire,release} and
> use things like:
> 
>   "FENCE r,rw" -- (load) ACQUIRE
>   "FENCE rw,w" -- (store) RELEASE
> 
> And then you only need to provide _relaxed atomics.
> 
> Also, and I didn't check for that, you need to provide:
> 
> smp_load_acquire(), smp_store_release(), atomic_read_acquire(),
> atomic_store_release().

Also, you probably need to provide smp_mb__before_spinlock(), but also
see:

  https://lkml.kernel.org/r/20170607161501.819948...@infradead.org


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 02:58:50PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 07, 2017 at 02:36:27PM +0200, Peter Zijlstra wrote:
> > Which (pending the sub confusion) will generate the entire set of:
> > 
> >  atomic_add, atomic_add_return{_relaxed,_acquire,_release,} 
> > atomic_fetch_add{_relaxed,_acquire,_release,}
> >  atomic_sub, atomic_sub_return{_relaxed,_acquire,_release,} 
> > atomic_fetch_sub{_relaxed,_acquire,_release,}
> > 
> >  atomic_and, atomic_fetch_and{_relaxed,_acquire,_release,}
> >  atomic_or,  atomic_fetch_or{_relaxed,_acquire,_release,}
> >  atomic_xor, atomic_fetch_xor{_relaxed,_acquire,_release,}
> > 
> 
> Another approach would be to override __atomic_op_{acquire,release} and
> use things like:
> 
>   "FENCE r,rw" -- (load) ACQUIRE
>   "FENCE rw,w" -- (store) RELEASE
> 
> And then you only need to provide _relaxed atomics.
> 
> Also, and I didn't check for that, you need to provide:
> 
> smp_load_acquire(), smp_store_release(), atomic_read_acquire(),
> atomic_store_release().

Also, you probably need to provide smp_mb__before_spinlock(), but also
see:

  https://lkml.kernel.org/r/20170607161501.819948...@infradead.org


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
> diff --git a/arch/riscv/include/asm/spinlock.h 
> b/arch/riscv/include/asm/spinlock.h
> new file mode 100644
> index ..9736f5714e54
> --- /dev/null
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -0,0 +1,155 @@
> +/*
> + * Copyright (C) 2015 Regents of the University of California
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful,
> + *   but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *   GNU General Public License for more details.
> + */
> +
> +#ifndef _ASM_RISCV_SPINLOCK_H
> +#define _ASM_RISCV_SPINLOCK_H
> +
> +#include 
> +#include 
> +
> +/*
> + * Simple spin lock operations.  These provide no fairness guarantees.
> + */

Any reason to use a test-and-set spinlock at all?

> +
> +#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
> +#define arch_spin_is_locked(x)   ((x)->lock != 0)
> +#define arch_spin_unlock_wait(x) \
> + do { cpu_relax(); } while ((x)->lock)

Hehe, yeah, no ;-) There are ordering constraints on that.

> +
> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> + __asm__ __volatile__ (
> + "amoswap.w.rl x0, x0, %0"
> + : "=A" (lock->lock)
> + :: "memory");
> +}
> +
> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> +{
> + int tmp = 1, busy;
> +
> + __asm__ __volatile__ (
> + "amoswap.w.aq %0, %2, %1"
> + : "=r" (busy), "+A" (lock->lock)
> + : "r" (tmp)
> + : "memory");
> +
> + return !busy;
> +}
> +
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> + while (1) {
> + if (arch_spin_is_locked(lock))
> + continue;
> +
> + if (arch_spin_trylock(lock))
> + break;
> + }
> +}


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
> diff --git a/arch/riscv/include/asm/spinlock.h 
> b/arch/riscv/include/asm/spinlock.h
> new file mode 100644
> index ..9736f5714e54
> --- /dev/null
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -0,0 +1,155 @@
> +/*
> + * Copyright (C) 2015 Regents of the University of California
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful,
> + *   but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *   GNU General Public License for more details.
> + */
> +
> +#ifndef _ASM_RISCV_SPINLOCK_H
> +#define _ASM_RISCV_SPINLOCK_H
> +
> +#include 
> +#include 
> +
> +/*
> + * Simple spin lock operations.  These provide no fairness guarantees.
> + */

Any reason to use a test-and-set spinlock at all?

> +
> +#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
> +#define arch_spin_is_locked(x)   ((x)->lock != 0)
> +#define arch_spin_unlock_wait(x) \
> + do { cpu_relax(); } while ((x)->lock)

Hehe, yeah, no ;-) There are ordering constraints on that.

> +
> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> + __asm__ __volatile__ (
> + "amoswap.w.rl x0, x0, %0"
> + : "=A" (lock->lock)
> + :: "memory");
> +}
> +
> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> +{
> + int tmp = 1, busy;
> +
> + __asm__ __volatile__ (
> + "amoswap.w.aq %0, %2, %1"
> + : "=r" (busy), "+A" (lock->lock)
> + : "r" (tmp)
> + : "memory");
> +
> + return !busy;
> +}
> +
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> + while (1) {
> + if (arch_spin_is_locked(lock))
> + continue;
> +
> + if (arch_spin_trylock(lock))
> + break;
> + }
> +}


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Will Deacon
[sorry, jumping in here because it's the only mail I have relating to
 patch 13]

On Wed, Jun 07, 2017 at 02:58:50PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 07, 2017 at 02:36:27PM +0200, Peter Zijlstra wrote:
> > Which (pending the sub confusion) will generate the entire set of:
> > 
> >  atomic_add, atomic_add_return{_relaxed,_acquire,_release,} 
> > atomic_fetch_add{_relaxed,_acquire,_release,}
> >  atomic_sub, atomic_sub_return{_relaxed,_acquire,_release,} 
> > atomic_fetch_sub{_relaxed,_acquire,_release,}
> > 
> >  atomic_and, atomic_fetch_and{_relaxed,_acquire,_release,}
> >  atomic_or,  atomic_fetch_or{_relaxed,_acquire,_release,}
> >  atomic_xor, atomic_fetch_xor{_relaxed,_acquire,_release,}
> > 
> 
> Another approach would be to override __atomic_op_{acquire,release} and
> use things like:
> 
>   "FENCE r,rw" -- (load) ACQUIRE
>   "FENCE rw,w" -- (store) RELEASE
> 
> And then you only need to provide _relaxed atomics.
> 
> Also, and I didn't check for that, you need to provide:
> 
> smp_load_acquire(), smp_store_release(), atomic_read_acquire(),
> atomic_store_release().

Is there an up-to-date specification for the RISC-V memory model? I looked
at:

https://github.com/riscv/riscv-isa-manual/releases/download/riscv-user-2.2/riscv-spec-v2.2.pdf

but it says:

| 2.7 Memory Model
| This section is out of date as the RISC-V memory model is
| currently under revision to ensure it can efficiently support current
| programming language memory models. The revised base mem- ory model will
| contain further ordering constraints, including at least that loads to the
| same address from the same hart cannot be reordered, and that syntactic data
| dependencies between instructions are respected

which, on the one hand is reassuring (because ignoring dependency ordering is
plain broken), but on the other it doesn't go quite far enough in defining
exactly what constitutes a "syntactic data dependency". The cumulativity of
your fences also needs defining, because I think this was up in the air at some
point and the document above doesn't seem to tackle it (it doesn't seem to
describe what constitutes being a memory of the predecessor or successor sets)

Could you shed some light on this please? We've started relying on RW control
dependencies in semi-recent history, so it's important to get this nailed down.

Thanks,

Will

P.S. You should also totally get your architects to write a formal model ;)


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Will Deacon
[sorry, jumping in here because it's the only mail I have relating to
 patch 13]

On Wed, Jun 07, 2017 at 02:58:50PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 07, 2017 at 02:36:27PM +0200, Peter Zijlstra wrote:
> > Which (pending the sub confusion) will generate the entire set of:
> > 
> >  atomic_add, atomic_add_return{_relaxed,_acquire,_release,} 
> > atomic_fetch_add{_relaxed,_acquire,_release,}
> >  atomic_sub, atomic_sub_return{_relaxed,_acquire,_release,} 
> > atomic_fetch_sub{_relaxed,_acquire,_release,}
> > 
> >  atomic_and, atomic_fetch_and{_relaxed,_acquire,_release,}
> >  atomic_or,  atomic_fetch_or{_relaxed,_acquire,_release,}
> >  atomic_xor, atomic_fetch_xor{_relaxed,_acquire,_release,}
> > 
> 
> Another approach would be to override __atomic_op_{acquire,release} and
> use things like:
> 
>   "FENCE r,rw" -- (load) ACQUIRE
>   "FENCE rw,w" -- (store) RELEASE
> 
> And then you only need to provide _relaxed atomics.
> 
> Also, and I didn't check for that, you need to provide:
> 
> smp_load_acquire(), smp_store_release(), atomic_read_acquire(),
> atomic_store_release().

Is there an up-to-date specification for the RISC-V memory model? I looked
at:

https://github.com/riscv/riscv-isa-manual/releases/download/riscv-user-2.2/riscv-spec-v2.2.pdf

but it says:

| 2.7 Memory Model
| This section is out of date as the RISC-V memory model is
| currently under revision to ensure it can efficiently support current
| programming language memory models. The revised base mem- ory model will
| contain further ordering constraints, including at least that loads to the
| same address from the same hart cannot be reordered, and that syntactic data
| dependencies between instructions are respected

which, on the one hand is reassuring (because ignoring dependency ordering is
plain broken), but on the other it doesn't go quite far enough in defining
exactly what constitutes a "syntactic data dependency". The cumulativity of
your fences also needs defining, because I think this was up in the air at some
point and the document above doesn't seem to tackle it (it doesn't seem to
describe what constitutes being a memory of the predecessor or successor sets)

Could you shed some light on this please? We've started relying on RW control
dependencies in semi-recent history, so it's important to get this nailed down.

Thanks,

Will

P.S. You should also totally get your architects to write a formal model ;)


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 02:36:27PM +0200, Peter Zijlstra wrote:
> Which (pending the sub confusion) will generate the entire set of:
> 
>  atomic_add, atomic_add_return{_relaxed,_acquire,_release,} 
> atomic_fetch_add{_relaxed,_acquire,_release,}
>  atomic_sub, atomic_sub_return{_relaxed,_acquire,_release,} 
> atomic_fetch_sub{_relaxed,_acquire,_release,}
> 
>  atomic_and, atomic_fetch_and{_relaxed,_acquire,_release,}
>  atomic_or,  atomic_fetch_or{_relaxed,_acquire,_release,}
>  atomic_xor, atomic_fetch_xor{_relaxed,_acquire,_release,}
> 

Another approach would be to override __atomic_op_{acquire,release} and
use things like:

"FENCE r,rw" -- (load) ACQUIRE
"FENCE rw,w" -- (store) RELEASE

And then you only need to provide _relaxed atomics.

Also, and I didn't check for that, you need to provide:

smp_load_acquire(), smp_store_release(), atomic_read_acquire(),
atomic_store_release().


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 02:36:27PM +0200, Peter Zijlstra wrote:
> Which (pending the sub confusion) will generate the entire set of:
> 
>  atomic_add, atomic_add_return{_relaxed,_acquire,_release,} 
> atomic_fetch_add{_relaxed,_acquire,_release,}
>  atomic_sub, atomic_sub_return{_relaxed,_acquire,_release,} 
> atomic_fetch_sub{_relaxed,_acquire,_release,}
> 
>  atomic_and, atomic_fetch_and{_relaxed,_acquire,_release,}
>  atomic_or,  atomic_fetch_or{_relaxed,_acquire,_release,}
>  atomic_xor, atomic_fetch_xor{_relaxed,_acquire,_release,}
> 

Another approach would be to override __atomic_op_{acquire,release} and
use things like:

"FENCE r,rw" -- (load) ACQUIRE
"FENCE rw,w" -- (store) RELEASE

And then you only need to provide _relaxed atomics.

Also, and I didn't check for that, you need to provide:

smp_load_acquire(), smp_store_release(), atomic_read_acquire(),
atomic_store_release().


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
> diff --git a/arch/riscv/include/asm/cmpxchg.h 
> b/arch/riscv/include/asm/cmpxchg.h
> new file mode 100644
> index ..c7ee1321ac18
> --- /dev/null
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -0,0 +1,124 @@
> +/*
> + * Copyright (C) 2014 Regents of the University of California
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful,
> + *   but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *   GNU General Public License for more details.
> + */
> +
> +#ifndef _ASM_RISCV_CMPXCHG_H
> +#define _ASM_RISCV_CMPXCHG_H
> +
> +#include 
> +
> +#ifdef CONFIG_ISA_A
> +
> +#include 
> +
> +#define __xchg(new, ptr, size)   \
> +({   \
> + __typeof__(ptr) __ptr = (ptr);  \
> + __typeof__(new) __new = (new);  \
> + __typeof__(*(ptr)) __ret;   \
> + switch (size) { \
> + case 4: \
> + __asm__ __volatile__ (  \
> + "amoswap.w %0, %2, %1"  \
> + : "=r" (__ret), "+A" (*__ptr)   \
> + : "r" (__new)); \
> + break;  \
> + case 8: \
> + __asm__ __volatile__ (  \
> + "amoswap.d %0, %2, %1"  \
> + : "=r" (__ret), "+A" (*__ptr)   \
> + : "r" (__new)); \
> + break;  \
> + default:\
> + BUILD_BUG();\
> + }   \
> + __ret;  \
> +})
> +
> +#define xchg(ptr, x)(__xchg((x), (ptr), sizeof(*(ptr

our xchg() is fully ordered, and thus you need to use "amoswap.aq.rl"

> +
> +
> +/*
> + * Atomic compare and exchange.  Compare OLD with MEM, if identical,
> + * store NEW in MEM.  Return the initial value in MEM.  Success is
> + * indicated by comparing RETURN with OLD.
> + */
> +#define __cmpxchg(ptr, old, new, size)   
> \
> +({   \
> + __typeof__(ptr) __ptr = (ptr);  \
> + __typeof__(old) __old = (old);  \
> + __typeof__(new) __new = (new);  \
> + __typeof__(*(ptr)) __ret;   \
> + register unsigned int __rc; \
> + switch (size) { \
> + case 4: \
> + __asm__ __volatile__ (  \
> + "0:"\
> + "lr.w %0, %2\n" \
> + "bne  %0, %z3, 1f\n"\
> + "sc.w %1, %z4, %2\n"\
> + "bnez %1, 0b\n" \
> + "1:"\
> + : "=" (__ret), "=" (__rc), "+A" (*__ptr)\
> + : "rJ" (__old), "rJ" (__new));  \
> + break;  \
> + case 8: \
> + __asm__ __volatile__ (  \
> + "0:"\
> + "lr.d %0, %2\n" \
> + "bne  %0, %z3, 1f\n"\
> + "sc.d %1, %z4, %2\n"\
> + "bnez %1, 0b\n" \
> + "1:"\
> + : "=" (__ret), "=" (__rc), "+A" (*__ptr)\
> + : "rJ" (__old), "rJ" (__new));  \
> + break;

Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
> diff --git a/arch/riscv/include/asm/cmpxchg.h 
> b/arch/riscv/include/asm/cmpxchg.h
> new file mode 100644
> index ..c7ee1321ac18
> --- /dev/null
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -0,0 +1,124 @@
> +/*
> + * Copyright (C) 2014 Regents of the University of California
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful,
> + *   but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *   GNU General Public License for more details.
> + */
> +
> +#ifndef _ASM_RISCV_CMPXCHG_H
> +#define _ASM_RISCV_CMPXCHG_H
> +
> +#include 
> +
> +#ifdef CONFIG_ISA_A
> +
> +#include 
> +
> +#define __xchg(new, ptr, size)   \
> +({   \
> + __typeof__(ptr) __ptr = (ptr);  \
> + __typeof__(new) __new = (new);  \
> + __typeof__(*(ptr)) __ret;   \
> + switch (size) { \
> + case 4: \
> + __asm__ __volatile__ (  \
> + "amoswap.w %0, %2, %1"  \
> + : "=r" (__ret), "+A" (*__ptr)   \
> + : "r" (__new)); \
> + break;  \
> + case 8: \
> + __asm__ __volatile__ (  \
> + "amoswap.d %0, %2, %1"  \
> + : "=r" (__ret), "+A" (*__ptr)   \
> + : "r" (__new)); \
> + break;  \
> + default:\
> + BUILD_BUG();\
> + }   \
> + __ret;  \
> +})
> +
> +#define xchg(ptr, x)(__xchg((x), (ptr), sizeof(*(ptr

our xchg() is fully ordered, and thus you need to use "amoswap.aq.rl"

> +
> +
> +/*
> + * Atomic compare and exchange.  Compare OLD with MEM, if identical,
> + * store NEW in MEM.  Return the initial value in MEM.  Success is
> + * indicated by comparing RETURN with OLD.
> + */
> +#define __cmpxchg(ptr, old, new, size)   
> \
> +({   \
> + __typeof__(ptr) __ptr = (ptr);  \
> + __typeof__(old) __old = (old);  \
> + __typeof__(new) __new = (new);  \
> + __typeof__(*(ptr)) __ret;   \
> + register unsigned int __rc; \
> + switch (size) { \
> + case 4: \
> + __asm__ __volatile__ (  \
> + "0:"\
> + "lr.w %0, %2\n" \
> + "bne  %0, %z3, 1f\n"\
> + "sc.w %1, %z4, %2\n"\
> + "bnez %1, 0b\n" \
> + "1:"\
> + : "=" (__ret), "=" (__rc), "+A" (*__ptr)\
> + : "rJ" (__old), "rJ" (__new));  \
> + break;  \
> + case 8: \
> + __asm__ __volatile__ (  \
> + "0:"\
> + "lr.d %0, %2\n" \
> + "bne  %0, %z3, 1f\n"\
> + "sc.d %1, %z4, %2\n"\
> + "bnez %1, 0b\n" \
> + "1:"\
> + : "=" (__ret), "=" (__rc), "+A" (*__ptr)\
> + : "rJ" (__old), "rJ" (__new));  \
> + break;

Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 02:06:13PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
> What pretty much all the other architectures do is something like:
> 
> #define ATOMIC_OP(op, asm_op, c_op)   \
> static __always_inline void atomic_##op(int i, atomic_t *v)   \
> { \
>   __asm__ __volatile__ (  \
>   "amo" #asm_op ".w zero, %1, %0" \
>   : "+A" (v->counter) \
>   : "r" (i)); \
> }
> 
> #define ATOMIC_FETCH_OP(op, asm_op, c_op) \
> static __always_inline int atomic_fetch_##op(int i, atomic_t *v)\
> { \
>   register int ret;   \
>   __asm__ __volatile__ (  \
>   "amo" #asm_op ".w %2, %1, %0"   \
>   : "+A" (v->counter), "=r" (ret) \
>   : "r" (mask));  \
>   return ret; \
> }
> 
> #define ATOMIC_OP_RETURN(op, asm_op, c_op)\
> static __always_inline int atomic_##op##_return(int i, atomic_t *v) \
> { \
>   return atomic_fetch_##op(i, v) c_op i;  \
> }
> 
> #define ATOMIC_OPS(op, asm_op, c_op)  \
>   ATOMIC_OP(op, asm_op, c_op) \
>   ATOMIC_OP_RETURN(op, asm_op, c_op)  \
>   ATOMIC_FETCH_OP(op, asm_op, c_op)
> 
> ATOMIC_OPS(add, add, +)
> ATOMIC_OPS(sub, sub, -)
> 
> #undef ATOMIC_OPS
> 
> #define ATOMIC_OPS(op, asm_op, c_op)  \
>   ATOMIC_OP(op, asm_op, c_op) \
>   ATOMIC_FETCH_OP(op, asm_op, c_op)
> 
> ATOMIC_OPS(and, and, &)
> ATOMIC_OPS(or, or, |)
> ATOMIC_OPS(xor, xor, ^)
> 
> #undef ATOMIC_OPS
> 
> Which is much simpler no?

In fact, after having read your manual you'd want something like:


#define ATOMIC_OP(op, asm_op, c_op) \
static __always_inline void atomic_##op(int i, atomic_t *v) \
{   \
__asm__ __volatile__ (  \
"amo" #asm_op ".w zero, %1, %0" \
: "+A" (v->counter) \
: "r" (i)); \
}

#define ATOMIC_FETCH_OP(op, asm_op, c_op, asm_or, order)\
static __always_inline int atomic_fetch_##op##order(int i, atomic_t *v)\
{   \
register int ret;   \
__asm__ __volatile__ (  \
"amo" #asm_op ".w" #asm_or " %2, %1, %0"\
: "+A" (v->counter), "=r" (ret) \
: "r" (mask));  \
return ret; \
}

#define ATOMIC_OP_RETURN(op, asm_op, c_op, asm_or, order)   \
static __always_inline int atomic_##op##_return##order(int i, atomic_t *v) \
{   \
return atomic_fetch_##op##order(i, v) c_op i;   \
}

#define ATOMIC_OPS(op, asm_op, c_op)\
ATOMIC_OP(op, asm_op, c_op, , _relaxed) \
ATOMIC_OP_RETURN(op, asm_op, c_op, , _relaxed)  \
ATOMIC_FETCH_OP(op, asm_op, c_op, , _relaxed)

ATOMIC_OPS(add, add, +)
ATOMIC_OPS(sub, sub, -)

#undef ATOMIC_OPS

#define ATOMIC_OPS(op, asm_op, c_op, asm_or, order) \
ATOMIC_OP_RETURN(op, asm_op, c_op, , _relaxed)  \
ATOMIC_FETCH_OP(op, asm_op, c_op, , _relaxed)

ATOMIC_OPS(add, add, +, ".aq", _acquire)
ATOMIC_OPS(add, add, +, ".rl", _release)
ATOMIC_OPS(add, add, +, ".aq.rl", )

ATOMIC_OPS(sub, sub, -, ".aq", _acquire)
ATOMIC_OPS(sub, sub, -, ".rl", _release)
ATOMIC_OPS(sub, sub, -, ".aq.rl", )

#undef ATOMIC_OPS

#define ATOMIC_OPS(op, asm_op, c_op)\
ATOMIC_OP(op, asm_op, c_op) \
ATOMIC_FETCH_OP(op, asm_op, c_op, , _relaxed)

ATOMIC_OPS(and, and, &)
ATOMIC_OPS(or, or, |)
ATOMIC_OPS(xor, xor, ^)

#undef ATOMIC_OPS

ATOMIC_FETCH_OP(and, and, &, ".aq", _acquire)
ATOMIC_FETCH_OP(and, and, &, ".rl", _release)
ATOMIC_FETCH_OP(and, and, &, ".aq.rl", )

ATOMIC_FETCH_OP(or, or, |, ".aq", _acquire)
ATOMIC_FETCH_OP(or, or, |, ".rl", _release)
ATOMIC_FETCH_OP(or, or, |, ".aq.rl", )

ATOMIC_FETCH_OP(xor, xor, ^, ".aq", _acquire)
ATOMIC_FETCH_OP(xor, xor, ^, ".rl", _release)

Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 02:06:13PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
> What pretty much all the other architectures do is something like:
> 
> #define ATOMIC_OP(op, asm_op, c_op)   \
> static __always_inline void atomic_##op(int i, atomic_t *v)   \
> { \
>   __asm__ __volatile__ (  \
>   "amo" #asm_op ".w zero, %1, %0" \
>   : "+A" (v->counter) \
>   : "r" (i)); \
> }
> 
> #define ATOMIC_FETCH_OP(op, asm_op, c_op) \
> static __always_inline int atomic_fetch_##op(int i, atomic_t *v)\
> { \
>   register int ret;   \
>   __asm__ __volatile__ (  \
>   "amo" #asm_op ".w %2, %1, %0"   \
>   : "+A" (v->counter), "=r" (ret) \
>   : "r" (mask));  \
>   return ret; \
> }
> 
> #define ATOMIC_OP_RETURN(op, asm_op, c_op)\
> static __always_inline int atomic_##op##_return(int i, atomic_t *v) \
> { \
>   return atomic_fetch_##op(i, v) c_op i;  \
> }
> 
> #define ATOMIC_OPS(op, asm_op, c_op)  \
>   ATOMIC_OP(op, asm_op, c_op) \
>   ATOMIC_OP_RETURN(op, asm_op, c_op)  \
>   ATOMIC_FETCH_OP(op, asm_op, c_op)
> 
> ATOMIC_OPS(add, add, +)
> ATOMIC_OPS(sub, sub, -)
> 
> #undef ATOMIC_OPS
> 
> #define ATOMIC_OPS(op, asm_op, c_op)  \
>   ATOMIC_OP(op, asm_op, c_op) \
>   ATOMIC_FETCH_OP(op, asm_op, c_op)
> 
> ATOMIC_OPS(and, and, &)
> ATOMIC_OPS(or, or, |)
> ATOMIC_OPS(xor, xor, ^)
> 
> #undef ATOMIC_OPS
> 
> Which is much simpler no?

In fact, after having read your manual you'd want something like:


#define ATOMIC_OP(op, asm_op, c_op) \
static __always_inline void atomic_##op(int i, atomic_t *v) \
{   \
__asm__ __volatile__ (  \
"amo" #asm_op ".w zero, %1, %0" \
: "+A" (v->counter) \
: "r" (i)); \
}

#define ATOMIC_FETCH_OP(op, asm_op, c_op, asm_or, order)\
static __always_inline int atomic_fetch_##op##order(int i, atomic_t *v)\
{   \
register int ret;   \
__asm__ __volatile__ (  \
"amo" #asm_op ".w" #asm_or " %2, %1, %0"\
: "+A" (v->counter), "=r" (ret) \
: "r" (mask));  \
return ret; \
}

#define ATOMIC_OP_RETURN(op, asm_op, c_op, asm_or, order)   \
static __always_inline int atomic_##op##_return##order(int i, atomic_t *v) \
{   \
return atomic_fetch_##op##order(i, v) c_op i;   \
}

#define ATOMIC_OPS(op, asm_op, c_op)\
ATOMIC_OP(op, asm_op, c_op, , _relaxed) \
ATOMIC_OP_RETURN(op, asm_op, c_op, , _relaxed)  \
ATOMIC_FETCH_OP(op, asm_op, c_op, , _relaxed)

ATOMIC_OPS(add, add, +)
ATOMIC_OPS(sub, sub, -)

#undef ATOMIC_OPS

#define ATOMIC_OPS(op, asm_op, c_op, asm_or, order) \
ATOMIC_OP_RETURN(op, asm_op, c_op, , _relaxed)  \
ATOMIC_FETCH_OP(op, asm_op, c_op, , _relaxed)

ATOMIC_OPS(add, add, +, ".aq", _acquire)
ATOMIC_OPS(add, add, +, ".rl", _release)
ATOMIC_OPS(add, add, +, ".aq.rl", )

ATOMIC_OPS(sub, sub, -, ".aq", _acquire)
ATOMIC_OPS(sub, sub, -, ".rl", _release)
ATOMIC_OPS(sub, sub, -, ".aq.rl", )

#undef ATOMIC_OPS

#define ATOMIC_OPS(op, asm_op, c_op)\
ATOMIC_OP(op, asm_op, c_op) \
ATOMIC_FETCH_OP(op, asm_op, c_op, , _relaxed)

ATOMIC_OPS(and, and, &)
ATOMIC_OPS(or, or, |)
ATOMIC_OPS(xor, xor, ^)

#undef ATOMIC_OPS

ATOMIC_FETCH_OP(and, and, &, ".aq", _acquire)
ATOMIC_FETCH_OP(and, and, &, ".rl", _release)
ATOMIC_FETCH_OP(and, and, &, ".aq.rl", )

ATOMIC_FETCH_OP(or, or, |, ".aq", _acquire)
ATOMIC_FETCH_OP(or, or, |, ".rl", _release)
ATOMIC_FETCH_OP(or, or, |, ".aq.rl", )

ATOMIC_FETCH_OP(xor, xor, ^, ".aq", _acquire)
ATOMIC_FETCH_OP(xor, xor, ^, ".rl", _release)

Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 01:54:23PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
> > +/* Assume that atomic operations are already serializing */
> > +#define smp_mb__before_atomic_dec()barrier()
> > +#define smp_mb__after_atomic_dec() barrier()
> > +#define smp_mb__before_atomic_inc()barrier()
> > +#define smp_mb__after_atomic_inc() barrier()
> 
> > +#define smp_mb__before_clear_bit()  smp_mb()
> > +#define smp_mb__after_clear_bit()   smp_mb()
> 
> These no longer exist.. Also how can they be different? bitops would use
> the same atomic primitives as regular atomic ops would and would thus
> have the very same implicit ordering.

Your manual states that each atomic instruction (be it AMO or LR/SC)
have two ordering bits: AQ and RL. If neither are set the instruction is
unordered, if either one is set, its either an ACQUIRE or a RELEASE and
if both are set its SC.

So you can in fact make the above happen, however your atomic
implementation does not appear to use ".aq.rl" mnemonics so would be
entirely unordered, just like your bitops.

Therefore the above is just plain wrong and broken.





Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 01:54:23PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
> > +/* Assume that atomic operations are already serializing */
> > +#define smp_mb__before_atomic_dec()barrier()
> > +#define smp_mb__after_atomic_dec() barrier()
> > +#define smp_mb__before_atomic_inc()barrier()
> > +#define smp_mb__after_atomic_inc() barrier()
> 
> > +#define smp_mb__before_clear_bit()  smp_mb()
> > +#define smp_mb__after_clear_bit()   smp_mb()
> 
> These no longer exist.. Also how can they be different? bitops would use
> the same atomic primitives as regular atomic ops would and would thus
> have the very same implicit ordering.

Your manual states that each atomic instruction (be it AMO or LR/SC)
have two ordering bits: AQ and RL. If neither are set the instruction is
unordered, if either one is set, its either an ACQUIRE or a RELEASE and
if both are set its SC.

So you can in fact make the above happen, however your atomic
implementation does not appear to use ".aq.rl" mnemonics so would be
entirely unordered, just like your bitops.

Therefore the above is just plain wrong and broken.





Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 02:06:13PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:

> > +static inline int atomic_fetch_sub(unsigned int mask, atomic_t *v)
> > +{
> > +   int out;
> > +
> > +   __asm__ __volatile__ (
> > +   "amosub.w %2, %1, %0"
> > +   : "+A" (v->counter), "=r" (out)
> > +   : "r" (mask));
> > +   return out;
> > +}

Your instruction manual does not list AMOSUB as a valid instruction. So
either it is wrong or this code never compiled. Please clarify.


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Wed, Jun 07, 2017 at 02:06:13PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:

> > +static inline int atomic_fetch_sub(unsigned int mask, atomic_t *v)
> > +{
> > +   int out;
> > +
> > +   __asm__ __volatile__ (
> > +   "amosub.w %2, %1, %0"
> > +   : "+A" (v->counter), "=r" (out)
> > +   : "r" (mask));
> > +   return out;
> > +}

Your instruction manual does not list AMOSUB as a valid instruction. So
either it is wrong or this code never compiled. Please clarify.


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:

> + * atomic_add - add integer to atomic variable
> + * @i: integer value to add
> + * @v: pointer of type atomic_t
> + *
> + * Atomically adds @i to @v.
> + */
> +static inline void atomic_add(int i, atomic_t *v)
> +{
> + __asm__ __volatile__ (
> + "amoadd.w zero, %1, %0"
> + : "+A" (v->counter)
> + : "r" (i));
> +}
> +
> +#define atomic_fetch_add atomic_fetch_add
> +static inline int atomic_fetch_add(unsigned int mask, atomic_t *v)
> +{
> + int out;
> +
> + __asm__ __volatile__ (
> + "amoadd.w %2, %1, %0"
> + : "+A" (v->counter), "=r" (out)
> + : "r" (mask));
> + return out;
> +}
> +
> +/**
> + * atomic_sub - subtract integer from atomic variable
> + * @i: integer value to subtract
> + * @v: pointer of type atomic_t
> + *
> + * Atomically subtracts @i from @v.
> + */
> +static inline void atomic_sub(int i, atomic_t *v)
> +{
> + atomic_add(-i, v);
> +}
> +
> +#define atomic_fetch_sub atomic_fetch_sub
> +static inline int atomic_fetch_sub(unsigned int mask, atomic_t *v)
> +{
> + int out;
> +
> + __asm__ __volatile__ (
> + "amosub.w %2, %1, %0"
> + : "+A" (v->counter), "=r" (out)
> + : "r" (mask));
> + return out;
> +}
> +
> +/**
> + * atomic_add_return - add integer to atomic variable
> + * @i: integer value to add
> + * @v: pointer of type atomic_t
> + *
> + * Atomically adds @i to @v and returns the result
> + */
> +static inline int atomic_add_return(int i, atomic_t *v)
> +{
> + register int c;
> +
> + __asm__ __volatile__ (
> + "amoadd.w %0, %2, %1"
> + : "=r" (c), "+A" (v->counter)
> + : "r" (i));
> + return (c + i);
> +}
> +
> +/**
> + * atomic_sub_return - subtract integer from atomic variable
> + * @i: integer value to subtract
> + * @v: pointer of type atomic_t
> + *
> + * Atomically subtracts @i from @v and returns the result
> + */
> +static inline int atomic_sub_return(int i, atomic_t *v)
> +{
> + return atomic_add_return(-i, v);
> +}
> +

> +/**
> + * atomic_and - Atomically clear bits in atomic variable
> + * @mask: Mask of the bits to be retained
> + * @v: pointer of type atomic_t
> + *
> + * Atomically retains the bits set in @mask from @v
> + */
> +static inline void atomic_and(unsigned int mask, atomic_t *v)
> +{
> + __asm__ __volatile__ (
> + "amoand.w zero, %1, %0"
> + : "+A" (v->counter)
> + : "r" (mask));
> +}
> +
> +#define atomic_fetch_and atomic_fetch_and
> +static inline int atomic_fetch_and(unsigned int mask, atomic_t *v)
> +{
> + int out;
> +
> + __asm__ __volatile__ (
> + "amoand.w %2, %1, %0"
> + : "+A" (v->counter), "=r" (out)
> + : "r" (mask));
> + return out;
> +}
> +
> +/**
> + * atomic_or - Atomically set bits in atomic variable
> + * @mask: Mask of the bits to be set
> + * @v: pointer of type atomic_t
> + *
> + * Atomically sets the bits set in @mask in @v
> + */
> +static inline void atomic_or(unsigned int mask, atomic_t *v)
> +{
> + __asm__ __volatile__ (
> + "amoor.w zero, %1, %0"
> + : "+A" (v->counter)
> + : "r" (mask));
> +}
> +
> +#define atomic_fetch_or atomic_fetch_or
> +static inline int atomic_fetch_or(unsigned int mask, atomic_t *v)
> +{
> + int out;
> +
> + __asm__ __volatile__ (
> + "amoor.w %2, %1, %0"
> + : "+A" (v->counter), "=r" (out)
> + : "r" (mask));
> + return out;
> +}
> +
> +/**
> + * atomic_xor - Atomically flips bits in atomic variable
> + * @mask: Mask of the bits to be flipped
> + * @v: pointer of type atomic_t
> + *
> + * Atomically flips the bits set in @mask in @v
> + */
> +static inline void atomic_xor(unsigned int mask, atomic_t *v)
> +{
> + __asm__ __volatile__ (
> + "amoxor.w zero, %1, %0"
> + : "+A" (v->counter)
> + : "r" (mask));
> +}
> +
> +#define atomic_fetch_xor atomic_fetch_xor
> +static inline int atomic_fetch_xor(unsigned int mask, atomic_t *v)
> +{
> + int out;
> +
> + __asm__ __volatile__ (
> + "amoxor.w %2, %1, %0"
> + : "+A" (v->counter), "=r" (out)
> + : "r" (mask));
> + return out;
> +}

What pretty much all the other architectures do is something like:

#define ATOMIC_OP(op, asm_op, c_op) \
static __always_inline void atomic_##op(int i, atomic_t *v) \
{   \
__asm__ __volatile__ (  \
"amo" #asm_op ".w zero, %1, %0" \
: "+A" (v->counter) \
: "r" (i)); \
}

#define ATOMIC_FETCH_OP(op, asm_op, c_op)   \
static __always_inline int 

Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:

> + * atomic_add - add integer to atomic variable
> + * @i: integer value to add
> + * @v: pointer of type atomic_t
> + *
> + * Atomically adds @i to @v.
> + */
> +static inline void atomic_add(int i, atomic_t *v)
> +{
> + __asm__ __volatile__ (
> + "amoadd.w zero, %1, %0"
> + : "+A" (v->counter)
> + : "r" (i));
> +}
> +
> +#define atomic_fetch_add atomic_fetch_add
> +static inline int atomic_fetch_add(unsigned int mask, atomic_t *v)
> +{
> + int out;
> +
> + __asm__ __volatile__ (
> + "amoadd.w %2, %1, %0"
> + : "+A" (v->counter), "=r" (out)
> + : "r" (mask));
> + return out;
> +}
> +
> +/**
> + * atomic_sub - subtract integer from atomic variable
> + * @i: integer value to subtract
> + * @v: pointer of type atomic_t
> + *
> + * Atomically subtracts @i from @v.
> + */
> +static inline void atomic_sub(int i, atomic_t *v)
> +{
> + atomic_add(-i, v);
> +}
> +
> +#define atomic_fetch_sub atomic_fetch_sub
> +static inline int atomic_fetch_sub(unsigned int mask, atomic_t *v)
> +{
> + int out;
> +
> + __asm__ __volatile__ (
> + "amosub.w %2, %1, %0"
> + : "+A" (v->counter), "=r" (out)
> + : "r" (mask));
> + return out;
> +}
> +
> +/**
> + * atomic_add_return - add integer to atomic variable
> + * @i: integer value to add
> + * @v: pointer of type atomic_t
> + *
> + * Atomically adds @i to @v and returns the result
> + */
> +static inline int atomic_add_return(int i, atomic_t *v)
> +{
> + register int c;
> +
> + __asm__ __volatile__ (
> + "amoadd.w %0, %2, %1"
> + : "=r" (c), "+A" (v->counter)
> + : "r" (i));
> + return (c + i);
> +}
> +
> +/**
> + * atomic_sub_return - subtract integer from atomic variable
> + * @i: integer value to subtract
> + * @v: pointer of type atomic_t
> + *
> + * Atomically subtracts @i from @v and returns the result
> + */
> +static inline int atomic_sub_return(int i, atomic_t *v)
> +{
> + return atomic_add_return(-i, v);
> +}
> +

> +/**
> + * atomic_and - Atomically clear bits in atomic variable
> + * @mask: Mask of the bits to be retained
> + * @v: pointer of type atomic_t
> + *
> + * Atomically retains the bits set in @mask from @v
> + */
> +static inline void atomic_and(unsigned int mask, atomic_t *v)
> +{
> + __asm__ __volatile__ (
> + "amoand.w zero, %1, %0"
> + : "+A" (v->counter)
> + : "r" (mask));
> +}
> +
> +#define atomic_fetch_and atomic_fetch_and
> +static inline int atomic_fetch_and(unsigned int mask, atomic_t *v)
> +{
> + int out;
> +
> + __asm__ __volatile__ (
> + "amoand.w %2, %1, %0"
> + : "+A" (v->counter), "=r" (out)
> + : "r" (mask));
> + return out;
> +}
> +
> +/**
> + * atomic_or - Atomically set bits in atomic variable
> + * @mask: Mask of the bits to be set
> + * @v: pointer of type atomic_t
> + *
> + * Atomically sets the bits set in @mask in @v
> + */
> +static inline void atomic_or(unsigned int mask, atomic_t *v)
> +{
> + __asm__ __volatile__ (
> + "amoor.w zero, %1, %0"
> + : "+A" (v->counter)
> + : "r" (mask));
> +}
> +
> +#define atomic_fetch_or atomic_fetch_or
> +static inline int atomic_fetch_or(unsigned int mask, atomic_t *v)
> +{
> + int out;
> +
> + __asm__ __volatile__ (
> + "amoor.w %2, %1, %0"
> + : "+A" (v->counter), "=r" (out)
> + : "r" (mask));
> + return out;
> +}
> +
> +/**
> + * atomic_xor - Atomically flips bits in atomic variable
> + * @mask: Mask of the bits to be flipped
> + * @v: pointer of type atomic_t
> + *
> + * Atomically flips the bits set in @mask in @v
> + */
> +static inline void atomic_xor(unsigned int mask, atomic_t *v)
> +{
> + __asm__ __volatile__ (
> + "amoxor.w zero, %1, %0"
> + : "+A" (v->counter)
> + : "r" (mask));
> +}
> +
> +#define atomic_fetch_xor atomic_fetch_xor
> +static inline int atomic_fetch_xor(unsigned int mask, atomic_t *v)
> +{
> + int out;
> +
> + __asm__ __volatile__ (
> + "amoxor.w %2, %1, %0"
> + : "+A" (v->counter), "=r" (out)
> + : "r" (mask));
> + return out;
> +}

What pretty much all the other architectures do is something like:

#define ATOMIC_OP(op, asm_op, c_op) \
static __always_inline void atomic_##op(int i, atomic_t *v) \
{   \
__asm__ __volatile__ (  \
"amo" #asm_op ".w zero, %1, %0" \
: "+A" (v->counter) \
: "r" (i)); \
}

#define ATOMIC_FETCH_OP(op, asm_op, c_op)   \
static __always_inline int 

Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
> +/* Assume that atomic operations are already serializing */
> +#define smp_mb__before_atomic_dec()  barrier()
> +#define smp_mb__after_atomic_dec()   barrier()
> +#define smp_mb__before_atomic_inc()  barrier()
> +#define smp_mb__after_atomic_inc()   barrier()

> +#define smp_mb__before_clear_bit()  smp_mb()
> +#define smp_mb__after_clear_bit()   smp_mb()

These no longer exist.. Also how can they be different? bitops would use
the same atomic primitives as regular atomic ops would and would thus
have the very same implicit ordering.




Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Peter Zijlstra
On Tue, Jun 06, 2017 at 04:00:03PM -0700, Palmer Dabbelt wrote:
> +/* Assume that atomic operations are already serializing */
> +#define smp_mb__before_atomic_dec()  barrier()
> +#define smp_mb__after_atomic_dec()   barrier()
> +#define smp_mb__before_atomic_inc()  barrier()
> +#define smp_mb__after_atomic_inc()   barrier()

> +#define smp_mb__before_clear_bit()  smp_mb()
> +#define smp_mb__after_clear_bit()   smp_mb()

These no longer exist.. Also how can they be different? bitops would use
the same atomic primitives as regular atomic ops would and would thus
have the very same implicit ordering.




Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Arnd Bergmann
On Wed, Jun 7, 2017 at 1:00 AM, Palmer Dabbelt  wrote:
> This patch adds the include files for the RISC-V port.  These are mostly
> based on the score port, but there are a lot of arm64-based files as
> well.
>
> Signed-off-by: Palmer Dabbelt 

It might be better to split this up into several parts, as the patch
is longer than
most people are willing to review at once.

The uapi should definitely be a separate patch, as it includes the parts that
cannot be changed any more later. memory management (pgtable, mmu,
uaccess) would be another part to split out, and possibly all the atomics
in one separate patch (along with spinlocks and bitops).

> +
> +/* IO barriers.  These only fence on the IO bits because they're only 
> required
> + * to order device access.  We're defining mmiowb because our AMO 
> instructions
> + * (which are used to implement locks) don't specify ordering.  From Chapter 
> 7
> + * of v2.2 of the user ISA:
> + * "The bits order accesses to one of the two address domains, memory or I/O,
> + * depending on which address domain the atomic instruction is accessing. No
> + * ordering constraint is implied to accesses to the other domain, and a 
> FENCE
> + * instruction should be used to order across both domains."
> + */
> +
> +#define __iormb()  __asm__ __volatile__ ("fence i,io" : : : "memory");
> +#define __iowmb()  __asm__ __volatile__ ("fence io,o" : : : "memory");
> +
> +#define mmiowb()   __asm__ __volatile__ ("fence io,io" : : : "memory");
> +
> +/*
> + * Relaxed I/O memory access primitives. These follow the Device memory
> + * ordering rules but do not guarantee any ordering relative to Normal memory
> + * accesses.
> + */
> +#define readb_relaxed(c)   ({ u8  __r = __raw_readb(c); __r; })
> +#define readw_relaxed(c)   ({ u16 __r = le16_to_cpu((__force 
> __le16)__raw_readw(c)); __r; })
> +#define readl_relaxed(c)   ({ u32 __r = le32_to_cpu((__force 
> __le32)__raw_readl(c)); __r; })
> +#define readq_relaxed(c)   ({ u64 __r = le64_to_cpu((__force 
> __le64)__raw_readq(c)); __r; })
> +
> +#define writeb_relaxed(v,c)((void)__raw_writeb((v),(c)))
> +#define writew_relaxed(v,c)((void)__raw_writew((__force 
> u16)cpu_to_le16(v),(c)))
> +#define writel_relaxed(v,c)((void)__raw_writel((__force 
> u32)cpu_to_le32(v),(c)))
> +#define writeq_relaxed(v,c)((void)__raw_writeq((__force 
> u64)cpu_to_le64(v),(c)))
> +
> +/*
> + * I/O memory access primitives. Reads are ordered relative to any
> + * following Normal memory access. Writes are ordered relative to any prior
> + * Normal memory access.
> + */
> +#define readb(c)   ({ u8  __v = readb_relaxed(c); __iormb(); 
> __v; })
> +#define readw(c)   ({ u16 __v = readw_relaxed(c); __iormb(); 
> __v; })
> +#define readl(c)   ({ u32 __v = readl_relaxed(c); __iormb(); 
> __v; })
> +#define readq(c)   ({ u64 __v = readq_relaxed(c); __iormb(); 
> __v; })
> +
> +#define writeb(v,c)({ __iowmb(); writeb_relaxed((v),(c)); })
> +#define writew(v,c)({ __iowmb(); writew_relaxed((v),(c)); })
> +#define writel(v,c)({ __iowmb(); writel_relaxed((v),(c)); })
> +#define writeq(v,c)({ __iowmb(); writeq_relaxed((v),(c)); })
> +
> +#include 

These do not yet contain all the changes we discussed: the relaxed operations
don't seem to be ordered against one another and the regular accessors
are not ordered against DMA.

 Arnd


Re: [PATCH 13/17] RISC-V: Add include subdirectory

2017-06-07 Thread Arnd Bergmann
On Wed, Jun 7, 2017 at 1:00 AM, Palmer Dabbelt  wrote:
> This patch adds the include files for the RISC-V port.  These are mostly
> based on the score port, but there are a lot of arm64-based files as
> well.
>
> Signed-off-by: Palmer Dabbelt 

It might be better to split this up into several parts, as the patch
is longer than
most people are willing to review at once.

The uapi should definitely be a separate patch, as it includes the parts that
cannot be changed any more later. memory management (pgtable, mmu,
uaccess) would be another part to split out, and possibly all the atomics
in one separate patch (along with spinlocks and bitops).

> +
> +/* IO barriers.  These only fence on the IO bits because they're only 
> required
> + * to order device access.  We're defining mmiowb because our AMO 
> instructions
> + * (which are used to implement locks) don't specify ordering.  From Chapter 
> 7
> + * of v2.2 of the user ISA:
> + * "The bits order accesses to one of the two address domains, memory or I/O,
> + * depending on which address domain the atomic instruction is accessing. No
> + * ordering constraint is implied to accesses to the other domain, and a 
> FENCE
> + * instruction should be used to order across both domains."
> + */
> +
> +#define __iormb()  __asm__ __volatile__ ("fence i,io" : : : "memory");
> +#define __iowmb()  __asm__ __volatile__ ("fence io,o" : : : "memory");
> +
> +#define mmiowb()   __asm__ __volatile__ ("fence io,io" : : : "memory");
> +
> +/*
> + * Relaxed I/O memory access primitives. These follow the Device memory
> + * ordering rules but do not guarantee any ordering relative to Normal memory
> + * accesses.
> + */
> +#define readb_relaxed(c)   ({ u8  __r = __raw_readb(c); __r; })
> +#define readw_relaxed(c)   ({ u16 __r = le16_to_cpu((__force 
> __le16)__raw_readw(c)); __r; })
> +#define readl_relaxed(c)   ({ u32 __r = le32_to_cpu((__force 
> __le32)__raw_readl(c)); __r; })
> +#define readq_relaxed(c)   ({ u64 __r = le64_to_cpu((__force 
> __le64)__raw_readq(c)); __r; })
> +
> +#define writeb_relaxed(v,c)((void)__raw_writeb((v),(c)))
> +#define writew_relaxed(v,c)((void)__raw_writew((__force 
> u16)cpu_to_le16(v),(c)))
> +#define writel_relaxed(v,c)((void)__raw_writel((__force 
> u32)cpu_to_le32(v),(c)))
> +#define writeq_relaxed(v,c)((void)__raw_writeq((__force 
> u64)cpu_to_le64(v),(c)))
> +
> +/*
> + * I/O memory access primitives. Reads are ordered relative to any
> + * following Normal memory access. Writes are ordered relative to any prior
> + * Normal memory access.
> + */
> +#define readb(c)   ({ u8  __v = readb_relaxed(c); __iormb(); 
> __v; })
> +#define readw(c)   ({ u16 __v = readw_relaxed(c); __iormb(); 
> __v; })
> +#define readl(c)   ({ u32 __v = readl_relaxed(c); __iormb(); 
> __v; })
> +#define readq(c)   ({ u64 __v = readq_relaxed(c); __iormb(); 
> __v; })
> +
> +#define writeb(v,c)({ __iowmb(); writeb_relaxed((v),(c)); })
> +#define writew(v,c)({ __iowmb(); writew_relaxed((v),(c)); })
> +#define writel(v,c)({ __iowmb(); writel_relaxed((v),(c)); })
> +#define writeq(v,c)({ __iowmb(); writeq_relaxed((v),(c)); })
> +
> +#include 

These do not yet contain all the changes we discussed: the relaxed operations
don't seem to be ordered against one another and the regular accessors
are not ordered against DMA.

 Arnd