Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-27 Thread Leonid Yegoshin

On 01/27/2016 03:26 AM, Maciej W. Rozycki wrote:

On Fri, 15 Jan 2016, Leonid Yegoshin wrote:


So you need to build a different kernel for some types of MIPS systems?
Or do you do boot-time rewriting, like a number of other arches do?

I don't know. I would like to have responses. Ralf asked Maciej about old
systems and that came nowhere. Even rewrite - don't know what to do with that:
no lightweight SYNC or no SYNC at all - yes, it is still possible that SYNC on
some systems can be too heavy or even harmful, nobody tested that.

  I don't recall being asked;
In http://patchwork.linux-mips.org/patch/10505/ the very last mesg 
exchange is:


Maciej,

do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to
test this?
...
  Ralf

Maciej W. Rozycki 
<http://patchwork.linux-mips.org/project/linux-mips/list/?submitter=79> 
- June 5, 2015, 9:18 p.m.


On Fri, 5 Jun 2015, Ralf Baechle wrote:


do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to
test this?


 I should be able to check R4400 (that is virtually the same as R4000)
next week or so.  As to SiByte -- not before next month I'm afraid.  I
don't have access to any of the other processors you named.  You may
want to find a better person if you want to accept this change soon.

  Maciej

... and that stops forever...

- Leonid.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-27 Thread Leonid Yegoshin

On 01/27/2016 03:26 AM, Maciej W. Rozycki wrote:

On Fri, 15 Jan 2016, Leonid Yegoshin wrote:


So you need to build a different kernel for some types of MIPS systems?
Or do you do boot-time rewriting, like a number of other arches do?

I don't know. I would like to have responses. Ralf asked Maciej about old
systems and that came nowhere. Even rewrite - don't know what to do with that:
no lightweight SYNC or no SYNC at all - yes, it is still possible that SYNC on
some systems can be too heavy or even harmful, nobody tested that.

  I don't recall being asked; mind that I might not get to messages I have
not been cc-ed in a timely manner and I may miss some altogether.  With
the amount of mailing list traffic that passes by me my scanner may fail
to trigger.  Sorry if this causes anybody trouble, but such is life.

  Coincidentally, I have just posted some notes on SYNC in a different
thread, see <http://lkml.iu.edu/hypermail/linux/kernel/1601.3/03080.html>.
There's a reference to an older message of mine there too.  I hope this
answers your questions.

   Maciej
In http://patchwork.linux-mips.org/patch/10505/the very last mesg 
exchange is:


Maciej,

do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to
test this?
...
  Ralf

Maciej W. Rozycki- June 5, 2015, 9:18 p.m.

On Fri, 5 Jun 2015, Ralf Baechle wrote:


do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to
test this?


 I should be able to check R4400 (that is virtually the same as R4000)
next week or so.  As to SiByte -- not before next month I'm afraid.  I
don't have access to any of the other processors you named.  You may
want to find a better person if you want to accept this change soon.

  Maciej

... and that stops forever...

- Leonid.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-15 Thread Leonid Yegoshin

On 01/15/2016 01:57 AM, Will Deacon wrote:

Paul,


I think you figured this out while I was sleeping, but just to confirm:

  1. The MIPS64 ISA doc [1] talks about SYNC in a way that applies only
 to memory accesses appearing in *program-order* before the SYNC

  2. We need WRC+sync+addr to work, which means that the SYNC in P1 must
 also capture the store in P0 as being "before" the barrier. Leonid
 reckons it works, but his explanation [2] focussed on the address
 dependency in P2 as to why this works. If that is the case (i.e.
 address dependency provides global transitivity), then WRC+addr+addr
 should also work (even though its not required).


No, it is not correct. There is one old design which provides access to 
core (thread0 + thread1) write-buffers for threads load in advance of it 
is visible to other cores. It means, that WRC+sync+addr passes because 
of SYNC in write thread and register dependency inside other thread but 
WRC+addr+addr may fail because other core may get a stale data.




  3. It seems that WRC+addr+addr doesn't work, so I'm still suspicious
 about WRC+sync+addr, because neither the architecture document or
 Leonid's explanation tell me that it should be forbidden.

Will

[1] https://imgtec.com/?do-download=4302
[2] http://lkml.kernel.org/r/569565da.2010...@imgtec.com (scroll to the end)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

On 01/14/2016 02:24 PM, Paul E. McKenney wrote:
Actually, the Linux kernel doesn't have an acquire barrier, just an 
smp_load_acquire(). Or did someone sneak one in while I wasn't looking?
That was an exactly starting point for this discussion. This patch just 
pulls out from MIPS files smp_load_acquire() and smp_store_release(). 
However, I put into LMO half year ago the patch 
http://patchwork.linux-mips.org/patch/10506/ which replaces a generic 
smp_mb with MIPS specific smp_release/acquire in that functions. This 
patch also fixes use of SYNCs barriers in spin_locks/atomics/bitops for 
Imagination MIPS CPUs too - it is just absent now for any Imagination 
MIPS CPUs!


Michael later pointed me that it can be returned back with his series of 
patches but discussion was already here.


- Leonid.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

On 01/14/2016 02:55 PM, Paul E. McKenney wrote:

OK, so it looks like Will was asking not about WRC+addr+addr, but instead
about WRC+sync+addr.

(He actually asked twice about this and that too but skip this)


I am guessing that the manual's "Older instructions which must be globally
performed when the SYNC instruction completes" provides the equivalent
of ARM/Power A-cumulativity, which can be thought of as transitivity
backwards in time.  This leads me to believe that your smp_mb() needs
to use SYNC rather than SYNC_MB, as was the subject of earlier spirited
discussion in this thread.


Don't be fooled here by words "ordered" and "completed" - it is HW 
design items and actually written poorly.
Just assume that SYNC_MB is absolutely the same as SYNC for any CPU and 
coherent device (besides performance). The difference can be in 
non-coherent devices because SYNC actually tries to make a barrier for 
them too. In some SoCs it is just the same because there is no need to 
barrier a non-coherent device (device register access usually strictly 
ordered... if there is no bridge in between).




Suppose you have something like this:
...
Does your hardware guarantee that it is not possible for all of r0,
r1, r2, and r3 to be equal to zero at the end of the test, assuming
that a, b, c, and d are all initially zero, and the four functions
above run concurrently?


It is assumed to be so from Arch point of view. HW bugs are possible, of 
course.



Another (more academic) case is this one, with x and y initially zero:

...
Does SYNC_MB() prohibit r1 == 1 && r2 == 0 && r3 == 1 && r4 == 0?


It is assumed to be so from Arch point of view. HW bugs are possible, of 
course.


Note: I am not sure about ANY past MIPS R2 CPU because that stuff is 
implemented some time but nobody made it in Linux kernel (it was used by 
some vendor for non-Linux system). For that reason my patch for 
lightweight SYNCs has an option - implement it or implement a generic 
SYNC. It is possible that some vendor did it in different way but nobody 
knows or test it. But as a minimum - SYNC must be implemented in 
spinlocks/atomics/bitops, in recent P5600 it is proven that read can 
pass write in atomics.


MIPS R6 is a different story, I verified lightweight SYNCs from the 
beginning and it also should use SYNCs.


- Leonid.


















___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

On 01/14/2016 08:16 AM, Paul E. McKenney wrote:

On Thu, Jan 14, 2016 at 12:04:45PM +, Will Deacon wrote:

On Wed, Jan 13, 2016 at 12:58:22PM -0800, Leonid Yegoshin wrote:

On 01/13/2016 12:48 PM, Peter Zijlstra wrote:

On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:


I ask HW team about it but I have a question - has it any relationship with
replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?

Of course. If you cannot explain the semantics of the primitives you
introduce, how can we judge the patch.



You missed a point - it is a question about replacement of SYNC with
lightweight primitives. It is NOT a question about multithread system
behavior without any SYNC. The answer on a latest Will's question lies in
different area.

What Will said!

Yes, you can cut corners within MIPS architecture-specific code,
but primitives that are used in the core kernel really do need to
work as expected.

Thanx, Paul



Absolutelly! Please use SYNC - right now it is not.

An the only point - please use an appropriate SYNC_* barriers instead of 
heavy bold hammer. That stuff was design explicitly to support the 
requirements of Documentation/memory-barriers.txt


It is easy - just use smp_acquire instead of plain smp_mb 
insmp_load_acquire, at least for MIPS.


- Leonid.
- Leonid.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

On 01/14/2016 04:47 PM, Paul E. McKenney wrote:

On Thu, Jan 14, 2016 at 03:33:40PM -0800, Leonid Yegoshin wrote:

Don't be fooled here by words "ordered" and "completed" - it is HW
design items and actually written poorly.
Just assume that SYNC_MB is absolutely the same as SYNC for any CPU
and coherent device (besides performance). The difference can be in
non-coherent devices because SYNC actually tries to make a barrier
for them too. In some SoCs it is just the same because there is no
need to barrier a non-coherent device (device register access
usually strictly ordered... if there is no bridge in between).

So smp_mb() can be SYNC_MB.  However, mb() needs to be SYNC for MMIO
purposes, correct?


Absolutely. For MIPS R2 which is not Octeon.


Note: I am not sure about ANY past MIPS R2 CPU because that stuff is
implemented some time but nobody made it in Linux kernel (it was
used by some vendor for non-Linux system). For that reason my patch
for lightweight SYNCs has an option - implement it or implement a
generic SYNC. It is possible that some vendor did it in different
way but nobody knows or test it. But as a minimum - SYNC must be
implemented in spinlocks/atomics/bitops, in recent P5600 it is
proven that read can pass write in atomics.

MIPS R6 is a different story, I verified lightweight SYNCs from the
beginning and it also should use SYNCs.

So you need to build a different kernel for some types of MIPS systems?
Or do you do boot-time rewriting, like a number of other arches do?


I don't know. I would like to have responses. Ralf asked Maciej about 
old systems and that came nowhere. Even rewrite - don't know what to do 
with that: no lightweight SYNC or no SYNC at all - yes, it is still 
possible that SYNC on some systems can be too heavy or even harmful, 
nobody tested that.


- Leonid.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

On 01/14/2016 04:04 AM, Will Deacon wrote:
Consequently, it's important that the architecture back-ends implement 
these portable primitives (e.g. smp_mb()) in a way that satisfies the 
kernel memory model so that core code doesn't need to worry about the 
underlying architecture for synchronisation purposes.


It seems you don't listen me. I said multiple times - MIPS 
implementation of SYNC_RMB/SYNC_WMB/SYNC_MB/SYNC_ACQUIRE/SYNC_RELEASE 
instructions matches the description of 
smp_rmb/smp_wmb/smp_mb/sync_acquire/sync_release from 
Documentation/memory-barriers.txt file.


What else do you want from me - RTL or microArch design for that?

- Leonid.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

On 01/14/2016 12:48 PM, Paul E. McKenney wrote:


So SYNC_RMB is intended to implement smp_rmb(), correct?

Yes.


You could use SYNC_ACQUIRE() to implement read_barrier_depends() and
smp_read_barrier_depends(), but SYNC_RMB probably does not suffice.


If smp_read_barrier_depends() is used to separate not only two reads but 
read pointer and WRITE basing on that pointer (example below) - yes. I 
just doesn't see any example of this in famous 
Documentation/memory-barriers.txt and had no chance to know what you use 
it in this way too.



The reason for this is that smp_read_barrier_depends() must order the
pointer load against any subsequent read or write through a dereference
of that pointer.


I can't see that requirement anywhere in Documents directory. I mean - 
the words "write through a dereference of that pointer" or similar for 
smp_read_barrier_depends.



   For example:

p = READ_ONCE(gp);
smp_rmb();
r1 = p->a; /* ordered by smp_rmb(). */
p->b = 42; /* NOT ordered by smp_rmb(), BUG!!! */
r2 = x; /* ordered by smp_rmb(), but doesn't need to be. */

In contrast:

p = READ_ONCE(gp);
smp_read_barrier_depends();
r1 = p->a; /* ordered by smp_read_barrier_depends(). */
p->b = 42; /* ordered by smp_read_barrier_depends(). */
r2 = x; /* not ordered by smp_read_barrier_depends(), which is OK. */

Again, if your hardware maintains local ordering for address
and data dependencies, you can have read_barrier_depends() and
smp_read_barrier_depends() be no-ops like they are for most
architectures.


It is not so simple, I mean "local ordering for address and data 
dependencies". Local ordering is NOT enough. It happens that current 
MIPS R6 doesn't require in your example smp_read_barrier_depends() but 
in discussion it comes out that it may not. Because without 
smp_read_barrier_depends() your example can be a part of Will's 
WRC+addr+addr and we found some design which easily can bump into this 
test. And that design actually performs "local ordering for address and 
data dependencies" too.


- Leonid.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

On 01/14/2016 01:29 PM, Paul E. McKenney wrote:



On 01/14/2016 12:34 PM, Paul E. McKenney wrote:


The WRC+addr+addr is OK because data dependencies are not required to be
transitive, in other words, they are not required to flow from one CPU to
another without the help of an explicit memory barrier.

I don't see any reliable way to fit WRC+addr+addr into "DATA
DEPENDENCY BARRIERS" section recommendation to have data dependency
barrier between read of a shared pointer/index and read the shared
data based on that pointer. If you have this two reads, it doesn't
matter the rest of scenario, you should put the dependency barrier
in code anyway. If you don't do it in WRC+addr+addr scenario then
after years it can be easily changed to different scenario which
fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
fails.

The trick is that lockless_dereference() contains an
smp_read_barrier_depends():

#define lockless_dereference(p) \
({ \
typeof(p) _p1 = READ_ONCE(p); \
smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
(_p1); \
})

Or am I missing your point?


WRC+addr+addr has no any barrier. lockless_dereference() has a barrier. 
I don't see a common points between this and that in your answer, sorry.


- Leonid.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

On 01/14/2016 12:15 PM, Peter Zijlstra wrote:

On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote:

An the only point - please use an appropriate SYNC_* barriers instead of
heavy bold hammer. That stuff was design explicitly to support the
requirements of Documentation/memory-barriers.txt

That's madness. That document changes from version to version as to what
we _think_ the actual hardware does. It is _NOT_ a specification.

You cannot design hardware from that. Its incomplete and fails to
specify a bunch of things. It not a mathematically sound definition of a
memory model.

Please stop referring to that document for what a particular barrier
_should_ do.  Explain what MIPS does, so we can attempt to integrate
this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve
upon our understanding of hardware and improve the Linux memory model.


I am afraid I can't help you here. It is very complicated stuff and a 
model is actually doesn't fit your assumptions about CPUs well without 
some simplifications which are based on what you want to have.


I say that SYNC_ACQUIRE/etc follows what you expect for smp_acquire etc 
(basing on that document). And at least two CPU models were tested with 
my patches (see it in LMO) for that last year and that instructions are 
implemented now in engineering kernel.


If you have something else in mind, you can ask me. But I prefer to do 
not deviate too much from Documentation/memory-barriers.txt, for exam - 
if it asks to have memory barrier somewhere, then I assume the code 
should have it, and please - don't ask me a test which violates the 
current version of document recommendations.


For a moment I don't see a significant changes in this document for MIPS 
Arch at least 1.5 year, and the only significant point is that MIPS CPU 
Arch doesn't have yet smp_read_barrier_depends() and smp_rmb() should be 
used instead.


- Leonid.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

I need some time to understand your test examples. However,

On 01/14/2016 12:34 PM, Paul E. McKenney wrote:



The WRC+addr+addr is OK because data dependencies are not required to be
transitive, in other words, they are not required to flow from one CPU to
another without the help of an explicit memory barrier.


I don't see any reliable way to fit WRC+addr+addr into "DATA DEPENDENCY 
BARRIERS" section recommendation to have data dependency barrier between 
read of a shared pointer/index and read the shared data based on that 
pointer. If you have this two reads, it doesn't matter the rest of 
scenario, you should put the dependency barrier in code anyway. If you 
don't do it in WRC+addr+addr scenario then after years it can be easily 
changed to different scenario which fits some of scenario in "DATA 
DEPENDENCY BARRIERS" section and fails.



   Transitivity is


Peter Zijlstra recently wrote: "In particular we're very much all 
'confused' about the various notions of transitivity". I am confused 
too, so - please use some more simple way to explain your words. Sorry, 
but we need a common ground first.


- Leonid.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

On 01/14/2016 04:14 AM, Will Deacon wrote:

On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:


 Moreover, there are voices against guarantee that it will be in future
and that voices point me to Documentation/memory-barriers.txt section "DATA
DEPENDENCY BARRIERS" examples which require SYNC_RMB between loading
address/index and using that for loading data based on that address or index
for shared data (look on CPU2 pseudo-code):

To deal with this, a data dependency barrier or better must be inserted
between the address load and the data load:

CPU 1 CPU 2
===   ===
{ A == 1, B == 2, C = 3, P == , Q ==  }
B = 4;

WRITE_ONCE(P, );
  Q = READ_ONCE(P);
   <---
SYNC_RMB is here
  D = *Q;

...

Another example of where data dependency barriers might be required is
where a
number is read from memory and then used to calculate the index for an
array
access:

CPU 1 CPU 2
===   ===
{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
M[1] = 4;

WRITE_ONCE(P, 1);
  Q = READ_ONCE(P);
   <
SYNC_RMB is here
  D = M[Q];

That voices say that there is a legitimate reason to relax HW here for
performance if SYNC_RMB is needed anyway to work with this sequence of
shared data.

Are you saying that MIPS needs to implement [smp_]read_barrier_depends?


It is not me, it is Documentation/memory-barriers.txt from kernel sources.

HW team can't work on voice statements, it should do a work on written 
documents. If that is written (see above the lines which I marked by 
"SYNC_RMB") then anybody should use it and never mind how many 
CPUs/Threads are in play. This examples explicitly requires to insert 
"data dependency barrier" between reading a shared pointer/index and 
using it to fetch a shared data. So, your WRC+addr+addr test is a 
violation of that recommendation.


- Leonid.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-14 Thread Leonid Yegoshin

On 01/14/2016 01:34 PM, Paul E. McKenney wrote:

On Thu, Jan 14, 2016 at 12:46:43PM -0800, Leonid Yegoshin wrote:

On 01/14/2016 12:15 PM, Peter Zijlstra wrote:

On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote:

An the only point - please use an appropriate SYNC_* barriers instead of
heavy bold hammer. That stuff was design explicitly to support the
requirements of Documentation/memory-barriers.txt

That's madness. That document changes from version to version as to what
we _think_ the actual hardware does. It is _NOT_ a specification.

You cannot design hardware from that. Its incomplete and fails to
specify a bunch of things. It not a mathematically sound definition of a
memory model.

Please stop referring to that document for what a particular barrier
_should_ do.  Explain what MIPS does, so we can attempt to integrate
this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve
upon our understanding of hardware and improve the Linux memory model.

I am afraid I can't help you here. It is very complicated stuff and
a model is actually doesn't fit your assumptions about CPUs well
without some simplifications which are based on what you want to
have.

I say that SYNC_ACQUIRE/etc follows what you expect for smp_acquire
etc (basing on that document). And at least two CPU models were
tested with my patches (see it in LMO) for that last year and that
instructions are implemented now in engineering kernel.

If you have something else in mind, you can ask me. But I prefer to
do not deviate too much from Documentation/memory-barriers.txt, for
exam - if it asks to have memory barrier somewhere, then I assume
the code should have it, and please - don't ask me a test which
violates the current version of document recommendations.

For a moment I don't see a significant changes in this document for
MIPS Arch at least 1.5 year, and the only significant point is that
MIPS CPU Arch doesn't have yet smp_read_barrier_depends() and
smp_rmb() should be used instead.

Is SYNC_ACQUIRE a memory-barrier instruction that orders prior loads
against later loads and stores?


Yes, it is in MD00087 (table 6.6 of document Ver 6.04) - 
https://imgtec.com/?do-download=4302



   If so, and if MIPS does not do
ordering based on address and data dependencies, I suggest making
read_barrier_depends() be a SYNC_ACQUIRE rather than SYNC_RMB.


I understood that, after I see the example of using it.
Please consider to add that into Documentation/memory-barriers.txt (it 
is not easy to find that this barrier is used for shared WRITE basing on 
shared pointer), it would be helpful.


- Leonid.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-13 Thread Leonid Yegoshin

On 01/13/2016 02:45 AM, Will Deacon wrote:

On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:



I don't think the address dependency is enough on its own. By that
reasoning, the following variant (WRC+addr+addr) would work too:


P0:
Wx = 1

P1:
Rx == 1

Wy = 1

P2:
Ry == 1

Rx = 0


So are you saying that this is also forbidden?
Imagine that P0 and P1 are two threads that share a store buffer. What
then?



I ask HW team about it but I have a question - has it any relationship 
with replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)? You use 
any barrier or do not use it and I just voice an intention to use a more 
efficient instruction instead of bold hummer (SYNC instruction). If you 
don't use any barrier here then it is a different issue.


May be it has sense to return back to original issue?

- Leonid
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-13 Thread Leonid Yegoshin

On 01/13/2016 12:48 PM, Peter Zijlstra wrote:

On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:


I ask HW team about it but I have a question - has it any relationship with
replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?

Of course. If you cannot explain the semantics of the primitives you
introduce, how can we judge the patch.


You missed a point - it is a question about replacement of SYNC with 
lightweight primitives. It is NOT a question about multithread system 
behavior without any SYNC. The answer on a latest Will's question lies 
in different area.


- Leonid.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-13 Thread Leonid Yegoshin

On 01/13/2016 02:45 AM, Will Deacon wrote:



I don't think the address dependency is enough on its own. By that
reasoning, the following variant (WRC+addr+addr) would work too:


P0:
Wx = 1

P1:
Rx == 1

Wy = 1

P2:
Ry == 1

Rx = 0


So are you saying that this is also forbidden?
Imagine that P0 and P1 are two threads that share a store buffer. What
then?


OK, I collected answers and it is:

In MIPS R6 this test passes OK, I mean - P2: Rx = 1 if Ry is read 
as 1. By design.


However, it is unclear that happens in MIPS R2 1004K.

Moreover, there are voices against guarantee that it will be in 
future and that voices point me to Documentation/memory-barriers.txt 
section "DATA DEPENDENCY BARRIERS" examples which require SYNC_RMB 
between loading address/index and using that for loading data based on 
that address or index for shared data (look on CPU2 pseudo-code):

To deal with this, a data dependency barrier or better must be inserted
between the address load and the data load:

CPU 1 CPU 2
===   ===
{ A == 1, B == 2, C = 3, P == , Q ==  }
B = 4;

WRITE_ONCE(P, );
  Q = READ_ONCE(P);
   <--- 
SYNC_RMB is here

  D = *Q;

...
Another example of where data dependency barriers might be required is 
where a
number is read from memory and then used to calculate the index for an 
array

access:

CPU 1 CPU 2
===   ===
{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
M[1] = 4;

WRITE_ONCE(P, 1);
  Q = READ_ONCE(P);
   < 
SYNC_RMB is here

  D = M[Q];


That voices say that there is a legitimate reason to relax HW here for 
performance if SYNC_RMB is needed anyway to work with this sequence of 
shared data.



And all that is out-of-topic here in my mind. I just want to be sure 
that this patchset still provides a use of a specific lightweight SYNCs 
on MIPS vs bold and heavy generalized "SYNC 0" in any case.


- Leonid.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Leonid Yegoshin

(I try to answer on multiple mails in one)

First of all, it seems like some generic notes should be given here:

1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in 
some CPUs. On that CPUs it basically kills pipelines in each CPU, can do 
a special memory/IO bus transaction (similar to "fence") and hold a 
system until all R/W is completed. It is like Big Kernel Lock but worse. 
So, the move to SMP_* kind of barriers is needed to improve performance, 
especially on newest CPUs with long pipelines.


2. MIPS Arch document may be misleading because words "ordering" and 
"completion" means different from Linux, the SYNC instruction 
description is written for HW engineers. I wrote that in a separate 
patch of the same patchset - 
http://patchwork.linux-mips.org/patch/10505/ "MIPS: R6: Use lightweight 
SYNC instruction in smp_* memory barriers":



This instructions were specifically designed to work for smp_*() sort of
memory barriers in MIPS R2/R3/R5 and R6.

Unfortunately, it's description is very cryptic and is done in HW engineering
style which prevents use of it by SW.


3. I bother MIPS Arch team long time until I completely understood that 
MIPS SYNC_WMB, SYNC_MB, SYNC_RMB, SYNC_RELEASE and SYNC_ACQUIRE do an 
exactly that is required in Documentation/memory-barriers.txt



In Peter Zijlstra mail:


1) you do not make such things selectable; either the hardware needs
them or it doesn't. If it does you_must_  use them, however unlikely.
It is selectable only for MIPS R2 but not MIPS R6. The reason is - most 
of MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU 
resource, especially taking into account that "lightweight syncs" are 
converted to a heavy "SYNC 0" in many of that CPUs. However the latest 
MIPS/Imagination CPU have a pipeline long enough to hit a problem - 
absence of SYNC at LL/SC inside atomics, barriers etc.



And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
are_NOT_  transitive and therefore cannot be used to implement the
smp_mb__{before,after} stuff.

That is, in MIPS speak, those SYNC types are Ordering Barriers, not
Completion Barriers.


Please see above, point 2.


That is, currently all architectures -- with exception of PPC -- have
RCsc locks, but using these non-transitive things will get you RCpc
locks.

So yes, MIPS can go RCpc for its locks and share the burden of pain with
PPC, but that needs to be a very concious decision.


I don't understand that - I tried hard but I can't find any word like 
"RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.



In Will Deacon mail:


The issue I have with the SYNC description in the text above is that it
describes the single CPU (program order) and the dual-CPU (confusingly
named global order) cases, but then doesn't generalise any further. That
means we can't sensibly reason about transitivity properties when a third
agent is involved. For example, the WRC+sync+addr test:


P0:
Wx = 1

P1:
Rx == 1
SYNC
Wy = 1

P2:
Ry == 1

Rx = 0


I can't find anything to forbid that, given the text. The main problem
is having the SYNC on P1 affect the write by P0.


As I understand that test, the visibility of P0: W[x] = 1 is identical 
to P1 and P2 here. If P1 got X before SYNC and write to Y after SYNC 
then instruction source register dependency tracking in P2 prevents a 
speculative load of X before P2 obtains Y from the same place as P0/P1 
and calculate address of X. If some load of X in P2 happens before 
address dependency calculation it's result is discarded.


Yes, you can't find that in MIPS SYNC instruction description, it is 
more likely in CM (Coherence Manager) area. I just pointed our arch team 
member responsible for documents and he will think how to explain that.


- Leonid.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Leonid Yegoshin

On 01/12/2016 01:40 PM, Peter Zijlstra wrote:



It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of
MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU
resource, especially taking into account that "lightweight syncs" are
converted to a heavy "SYNC 0" in many of that CPUs. However the latest
MIPS/Imagination CPU have a pipeline long enough to hit a problem - absence
of SYNC at LL/SC inside atomics, barriers etc.

What ?! Are you saying that because R2 has short pipelines its unlikely
to hit the reordering issues and we can omit barriers?


It was my guess to explain - why barriers was not included originally. 
You can check with Ralf, he knows more about that time MIPS Linux code.


I bother with this more than 2 years and I just try to solve that issue 
- in recent CPUs the load after LL/SC synchronization instruction loop 
can get ahead of SC for sure, it was tested.





And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
are_NOT_  transitive and therefore cannot be used to implement the
smp_mb__{before,after} stuff.

That is, in MIPS speak, those SYNC types are Ordering Barriers, not
Completion Barriers.

Please see above, point 2.

That did not in fact enlighten things. Are they transitive/multi-copy
atomic or not?


Peter Zijlstra recently wrote: "In particular we're very much all 
'confused' about the various notions of transitivity". I am actually 
confused too and need some examples here.




(and here Will will go into great detail on the differences between the
two and make our collective brains explode :-)


That is, currently all architectures -- with exception of PPC -- have
RCsc locks, but using these non-transitive things will get you RCpc
locks.

So yes, MIPS can go RCpc for its locks and share the burden of pain with
PPC, but that needs to be a very concious decision.

I don't understand that - I tried hard but I can't find any word like
"RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.

From: lkml.kernel.org/r/20150828153921.gf19...@twins.programming.kicks-ass.net

Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
not.


MIPS Arch starting from R2 requires that. If some CPU can't, it should 
execute a full "SYNC 0" instead, which is a full memory barrier.




Currently PowerPC is the only arch that (can, and) does RCpc and gives a
weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
to see the stores of the CPU which did the RELEASE in order.


Yes, it was a goal for SYNC_ACQUIRE and SYNC_RELEASE.

Caveats:

- "Full memory barrier" on MIPS means - full barrier for any device 
in coherent domain. In MIPS Tech/Imagination Tech MIPS-based CPU it is 
"for any device connected to CM or IOCU + directly connected memory".


- It is not applied to instruction fetch. However, I-Cache flushes 
and SYNCI are consistent with that. There is also hazard barrier 
instructions to clear CPU pipeline to some extent - to help with this 
limitation.


I don't think that these caveats prevent a correct Acquire/Release semantic.

- Leonid.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-11 Thread Leonid Yegoshin

On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:

On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
smp_read_barrier_depends, smp_store_release and smp_load_acquire  match
the asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.

This statement doesn't fit MIPS barriers variations. Moreover, there is 
a reason to extend that even more specific, at least for 
smp_store_release and smp_load_acquire, look into


http://patchwork.linux-mips.org/patch/10506/

- Leonid.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev