Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/27/2016 03:26 AM, Maciej W. Rozycki wrote: On Fri, 15 Jan 2016, Leonid Yegoshin wrote: So you need to build a different kernel for some types of MIPS systems? Or do you do boot-time rewriting, like a number of other arches do? I don't know. I would like to have responses. Ralf asked Maciej about old systems and that came nowhere. Even rewrite - don't know what to do with that: no lightweight SYNC or no SYNC at all - yes, it is still possible that SYNC on some systems can be too heavy or even harmful, nobody tested that. I don't recall being asked; In http://patchwork.linux-mips.org/patch/10505/ the very last mesg exchange is: Maciej, do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to test this? ... Ralf Maciej W. Rozycki <http://patchwork.linux-mips.org/project/linux-mips/list/?submitter=79> - June 5, 2015, 9:18 p.m. On Fri, 5 Jun 2015, Ralf Baechle wrote: do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to test this? I should be able to check R4400 (that is virtually the same as R4000) next week or so. As to SiByte -- not before next month I'm afraid. I don't have access to any of the other processors you named. You may want to find a better person if you want to accept this change soon. Maciej ... and that stops forever... - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/27/2016 03:26 AM, Maciej W. Rozycki wrote: On Fri, 15 Jan 2016, Leonid Yegoshin wrote: So you need to build a different kernel for some types of MIPS systems? Or do you do boot-time rewriting, like a number of other arches do? I don't know. I would like to have responses. Ralf asked Maciej about old systems and that came nowhere. Even rewrite - don't know what to do with that: no lightweight SYNC or no SYNC at all - yes, it is still possible that SYNC on some systems can be too heavy or even harmful, nobody tested that. I don't recall being asked; mind that I might not get to messages I have not been cc-ed in a timely manner and I may miss some altogether. With the amount of mailing list traffic that passes by me my scanner may fail to trigger. Sorry if this causes anybody trouble, but such is life. Coincidentally, I have just posted some notes on SYNC in a different thread, see <http://lkml.iu.edu/hypermail/linux/kernel/1601.3/03080.html>. There's a reference to an older message of mine there too. I hope this answers your questions. Maciej In http://patchwork.linux-mips.org/patch/10505/the very last mesg exchange is: Maciej, do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to test this? ... Ralf Maciej W. Rozycki- June 5, 2015, 9:18 p.m. On Fri, 5 Jun 2015, Ralf Baechle wrote: do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to test this? I should be able to check R4400 (that is virtually the same as R4000) next week or so. As to SiByte -- not before next month I'm afraid. I don't have access to any of the other processors you named. You may want to find a better person if you want to accept this change soon. Maciej ... and that stops forever... - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/15/2016 01:57 AM, Will Deacon wrote: Paul, I think you figured this out while I was sleeping, but just to confirm: 1. The MIPS64 ISA doc [1] talks about SYNC in a way that applies only to memory accesses appearing in *program-order* before the SYNC 2. We need WRC+sync+addr to work, which means that the SYNC in P1 must also capture the store in P0 as being "before" the barrier. Leonid reckons it works, but his explanation [2] focussed on the address dependency in P2 as to why this works. If that is the case (i.e. address dependency provides global transitivity), then WRC+addr+addr should also work (even though its not required). No, it is not correct. There is one old design which provides access to core (thread0 + thread1) write-buffers for threads load in advance of it is visible to other cores. It means, that WRC+sync+addr passes because of SYNC in write thread and register dependency inside other thread but WRC+addr+addr may fail because other core may get a stale data. 3. It seems that WRC+addr+addr doesn't work, so I'm still suspicious about WRC+sync+addr, because neither the architecture document or Leonid's explanation tell me that it should be forbidden. Will [1] https://imgtec.com/?do-download=4302 [2] http://lkml.kernel.org/r/569565da.2010...@imgtec.com (scroll to the end) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/14/2016 02:24 PM, Paul E. McKenney wrote: Actually, the Linux kernel doesn't have an acquire barrier, just an smp_load_acquire(). Or did someone sneak one in while I wasn't looking? That was an exactly starting point for this discussion. This patch just pulls out from MIPS files smp_load_acquire() and smp_store_release(). However, I put into LMO half year ago the patch http://patchwork.linux-mips.org/patch/10506/ which replaces a generic smp_mb with MIPS specific smp_release/acquire in that functions. This patch also fixes use of SYNCs barriers in spin_locks/atomics/bitops for Imagination MIPS CPUs too - it is just absent now for any Imagination MIPS CPUs! Michael later pointed me that it can be returned back with his series of patches but discussion was already here. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/14/2016 02:55 PM, Paul E. McKenney wrote: OK, so it looks like Will was asking not about WRC+addr+addr, but instead about WRC+sync+addr. (He actually asked twice about this and that too but skip this) I am guessing that the manual's "Older instructions which must be globally performed when the SYNC instruction completes" provides the equivalent of ARM/Power A-cumulativity, which can be thought of as transitivity backwards in time. This leads me to believe that your smp_mb() needs to use SYNC rather than SYNC_MB, as was the subject of earlier spirited discussion in this thread. Don't be fooled here by words "ordered" and "completed" - it is HW design items and actually written poorly. Just assume that SYNC_MB is absolutely the same as SYNC for any CPU and coherent device (besides performance). The difference can be in non-coherent devices because SYNC actually tries to make a barrier for them too. In some SoCs it is just the same because there is no need to barrier a non-coherent device (device register access usually strictly ordered... if there is no bridge in between). Suppose you have something like this: ... Does your hardware guarantee that it is not possible for all of r0, r1, r2, and r3 to be equal to zero at the end of the test, assuming that a, b, c, and d are all initially zero, and the four functions above run concurrently? It is assumed to be so from Arch point of view. HW bugs are possible, of course. Another (more academic) case is this one, with x and y initially zero: ... Does SYNC_MB() prohibit r1 == 1 && r2 == 0 && r3 == 1 && r4 == 0? It is assumed to be so from Arch point of view. HW bugs are possible, of course. Note: I am not sure about ANY past MIPS R2 CPU because that stuff is implemented some time but nobody made it in Linux kernel (it was used by some vendor for non-Linux system). For that reason my patch for lightweight SYNCs has an option - implement it or implement a generic SYNC. It is possible that some vendor did it in different way but nobody knows or test it. But as a minimum - SYNC must be implemented in spinlocks/atomics/bitops, in recent P5600 it is proven that read can pass write in atomics. MIPS R6 is a different story, I verified lightweight SYNCs from the beginning and it also should use SYNCs. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/14/2016 08:16 AM, Paul E. McKenney wrote: On Thu, Jan 14, 2016 at 12:04:45PM +, Will Deacon wrote: On Wed, Jan 13, 2016 at 12:58:22PM -0800, Leonid Yegoshin wrote: On 01/13/2016 12:48 PM, Peter Zijlstra wrote: On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote: I ask HW team about it but I have a question - has it any relationship with replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)? Of course. If you cannot explain the semantics of the primitives you introduce, how can we judge the patch. You missed a point - it is a question about replacement of SYNC with lightweight primitives. It is NOT a question about multithread system behavior without any SYNC. The answer on a latest Will's question lies in different area. What Will said! Yes, you can cut corners within MIPS architecture-specific code, but primitives that are used in the core kernel really do need to work as expected. Thanx, Paul Absolutelly! Please use SYNC - right now it is not. An the only point - please use an appropriate SYNC_* barriers instead of heavy bold hammer. That stuff was design explicitly to support the requirements of Documentation/memory-barriers.txt It is easy - just use smp_acquire instead of plain smp_mb insmp_load_acquire, at least for MIPS. - Leonid. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/14/2016 04:47 PM, Paul E. McKenney wrote: On Thu, Jan 14, 2016 at 03:33:40PM -0800, Leonid Yegoshin wrote: Don't be fooled here by words "ordered" and "completed" - it is HW design items and actually written poorly. Just assume that SYNC_MB is absolutely the same as SYNC for any CPU and coherent device (besides performance). The difference can be in non-coherent devices because SYNC actually tries to make a barrier for them too. In some SoCs it is just the same because there is no need to barrier a non-coherent device (device register access usually strictly ordered... if there is no bridge in between). So smp_mb() can be SYNC_MB. However, mb() needs to be SYNC for MMIO purposes, correct? Absolutely. For MIPS R2 which is not Octeon. Note: I am not sure about ANY past MIPS R2 CPU because that stuff is implemented some time but nobody made it in Linux kernel (it was used by some vendor for non-Linux system). For that reason my patch for lightweight SYNCs has an option - implement it or implement a generic SYNC. It is possible that some vendor did it in different way but nobody knows or test it. But as a minimum - SYNC must be implemented in spinlocks/atomics/bitops, in recent P5600 it is proven that read can pass write in atomics. MIPS R6 is a different story, I verified lightweight SYNCs from the beginning and it also should use SYNCs. So you need to build a different kernel for some types of MIPS systems? Or do you do boot-time rewriting, like a number of other arches do? I don't know. I would like to have responses. Ralf asked Maciej about old systems and that came nowhere. Even rewrite - don't know what to do with that: no lightweight SYNC or no SYNC at all - yes, it is still possible that SYNC on some systems can be too heavy or even harmful, nobody tested that. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/14/2016 04:04 AM, Will Deacon wrote: Consequently, it's important that the architecture back-ends implement these portable primitives (e.g. smp_mb()) in a way that satisfies the kernel memory model so that core code doesn't need to worry about the underlying architecture for synchronisation purposes. It seems you don't listen me. I said multiple times - MIPS implementation of SYNC_RMB/SYNC_WMB/SYNC_MB/SYNC_ACQUIRE/SYNC_RELEASE instructions matches the description of smp_rmb/smp_wmb/smp_mb/sync_acquire/sync_release from Documentation/memory-barriers.txt file. What else do you want from me - RTL or microArch design for that? - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/14/2016 12:48 PM, Paul E. McKenney wrote: So SYNC_RMB is intended to implement smp_rmb(), correct? Yes. You could use SYNC_ACQUIRE() to implement read_barrier_depends() and smp_read_barrier_depends(), but SYNC_RMB probably does not suffice. If smp_read_barrier_depends() is used to separate not only two reads but read pointer and WRITE basing on that pointer (example below) - yes. I just doesn't see any example of this in famous Documentation/memory-barriers.txt and had no chance to know what you use it in this way too. The reason for this is that smp_read_barrier_depends() must order the pointer load against any subsequent read or write through a dereference of that pointer. I can't see that requirement anywhere in Documents directory. I mean - the words "write through a dereference of that pointer" or similar for smp_read_barrier_depends. For example: p = READ_ONCE(gp); smp_rmb(); r1 = p->a; /* ordered by smp_rmb(). */ p->b = 42; /* NOT ordered by smp_rmb(), BUG!!! */ r2 = x; /* ordered by smp_rmb(), but doesn't need to be. */ In contrast: p = READ_ONCE(gp); smp_read_barrier_depends(); r1 = p->a; /* ordered by smp_read_barrier_depends(). */ p->b = 42; /* ordered by smp_read_barrier_depends(). */ r2 = x; /* not ordered by smp_read_barrier_depends(), which is OK. */ Again, if your hardware maintains local ordering for address and data dependencies, you can have read_barrier_depends() and smp_read_barrier_depends() be no-ops like they are for most architectures. It is not so simple, I mean "local ordering for address and data dependencies". Local ordering is NOT enough. It happens that current MIPS R6 doesn't require in your example smp_read_barrier_depends() but in discussion it comes out that it may not. Because without smp_read_barrier_depends() your example can be a part of Will's WRC+addr+addr and we found some design which easily can bump into this test. And that design actually performs "local ordering for address and data dependencies" too. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/14/2016 01:29 PM, Paul E. McKenney wrote: On 01/14/2016 12:34 PM, Paul E. McKenney wrote: The WRC+addr+addr is OK because data dependencies are not required to be transitive, in other words, they are not required to flow from one CPU to another without the help of an explicit memory barrier. I don't see any reliable way to fit WRC+addr+addr into "DATA DEPENDENCY BARRIERS" section recommendation to have data dependency barrier between read of a shared pointer/index and read the shared data based on that pointer. If you have this two reads, it doesn't matter the rest of scenario, you should put the dependency barrier in code anyway. If you don't do it in WRC+addr+addr scenario then after years it can be easily changed to different scenario which fits some of scenario in "DATA DEPENDENCY BARRIERS" section and fails. The trick is that lockless_dereference() contains an smp_read_barrier_depends(): #define lockless_dereference(p) \ ({ \ typeof(p) _p1 = READ_ONCE(p); \ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ (_p1); \ }) Or am I missing your point? WRC+addr+addr has no any barrier. lockless_dereference() has a barrier. I don't see a common points between this and that in your answer, sorry. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/14/2016 12:15 PM, Peter Zijlstra wrote: On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote: An the only point - please use an appropriate SYNC_* barriers instead of heavy bold hammer. That stuff was design explicitly to support the requirements of Documentation/memory-barriers.txt That's madness. That document changes from version to version as to what we _think_ the actual hardware does. It is _NOT_ a specification. You cannot design hardware from that. Its incomplete and fails to specify a bunch of things. It not a mathematically sound definition of a memory model. Please stop referring to that document for what a particular barrier _should_ do. Explain what MIPS does, so we can attempt to integrate this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve upon our understanding of hardware and improve the Linux memory model. I am afraid I can't help you here. It is very complicated stuff and a model is actually doesn't fit your assumptions about CPUs well without some simplifications which are based on what you want to have. I say that SYNC_ACQUIRE/etc follows what you expect for smp_acquire etc (basing on that document). And at least two CPU models were tested with my patches (see it in LMO) for that last year and that instructions are implemented now in engineering kernel. If you have something else in mind, you can ask me. But I prefer to do not deviate too much from Documentation/memory-barriers.txt, for exam - if it asks to have memory barrier somewhere, then I assume the code should have it, and please - don't ask me a test which violates the current version of document recommendations. For a moment I don't see a significant changes in this document for MIPS Arch at least 1.5 year, and the only significant point is that MIPS CPU Arch doesn't have yet smp_read_barrier_depends() and smp_rmb() should be used instead. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
I need some time to understand your test examples. However, On 01/14/2016 12:34 PM, Paul E. McKenney wrote: The WRC+addr+addr is OK because data dependencies are not required to be transitive, in other words, they are not required to flow from one CPU to another without the help of an explicit memory barrier. I don't see any reliable way to fit WRC+addr+addr into "DATA DEPENDENCY BARRIERS" section recommendation to have data dependency barrier between read of a shared pointer/index and read the shared data based on that pointer. If you have this two reads, it doesn't matter the rest of scenario, you should put the dependency barrier in code anyway. If you don't do it in WRC+addr+addr scenario then after years it can be easily changed to different scenario which fits some of scenario in "DATA DEPENDENCY BARRIERS" section and fails. Transitivity is Peter Zijlstra recently wrote: "In particular we're very much all 'confused' about the various notions of transitivity". I am confused too, so - please use some more simple way to explain your words. Sorry, but we need a common ground first. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/14/2016 04:14 AM, Will Deacon wrote: On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote: Moreover, there are voices against guarantee that it will be in future and that voices point me to Documentation/memory-barriers.txt section "DATA DEPENDENCY BARRIERS" examples which require SYNC_RMB between loading address/index and using that for loading data based on that address or index for shared data (look on CPU2 pseudo-code): To deal with this, a data dependency barrier or better must be inserted between the address load and the data load: CPU 1 CPU 2 === === { A == 1, B == 2, C = 3, P == , Q == } B = 4; WRITE_ONCE(P, ); Q = READ_ONCE(P); <--- SYNC_RMB is here D = *Q; ... Another example of where data dependency barriers might be required is where a number is read from memory and then used to calculate the index for an array access: CPU 1 CPU 2 === === { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } M[1] = 4; WRITE_ONCE(P, 1); Q = READ_ONCE(P); < SYNC_RMB is here D = M[Q]; That voices say that there is a legitimate reason to relax HW here for performance if SYNC_RMB is needed anyway to work with this sequence of shared data. Are you saying that MIPS needs to implement [smp_]read_barrier_depends? It is not me, it is Documentation/memory-barriers.txt from kernel sources. HW team can't work on voice statements, it should do a work on written documents. If that is written (see above the lines which I marked by "SYNC_RMB") then anybody should use it and never mind how many CPUs/Threads are in play. This examples explicitly requires to insert "data dependency barrier" between reading a shared pointer/index and using it to fetch a shared data. So, your WRC+addr+addr test is a violation of that recommendation. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/14/2016 01:34 PM, Paul E. McKenney wrote: On Thu, Jan 14, 2016 at 12:46:43PM -0800, Leonid Yegoshin wrote: On 01/14/2016 12:15 PM, Peter Zijlstra wrote: On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote: An the only point - please use an appropriate SYNC_* barriers instead of heavy bold hammer. That stuff was design explicitly to support the requirements of Documentation/memory-barriers.txt That's madness. That document changes from version to version as to what we _think_ the actual hardware does. It is _NOT_ a specification. You cannot design hardware from that. Its incomplete and fails to specify a bunch of things. It not a mathematically sound definition of a memory model. Please stop referring to that document for what a particular barrier _should_ do. Explain what MIPS does, so we can attempt to integrate this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve upon our understanding of hardware and improve the Linux memory model. I am afraid I can't help you here. It is very complicated stuff and a model is actually doesn't fit your assumptions about CPUs well without some simplifications which are based on what you want to have. I say that SYNC_ACQUIRE/etc follows what you expect for smp_acquire etc (basing on that document). And at least two CPU models were tested with my patches (see it in LMO) for that last year and that instructions are implemented now in engineering kernel. If you have something else in mind, you can ask me. But I prefer to do not deviate too much from Documentation/memory-barriers.txt, for exam - if it asks to have memory barrier somewhere, then I assume the code should have it, and please - don't ask me a test which violates the current version of document recommendations. For a moment I don't see a significant changes in this document for MIPS Arch at least 1.5 year, and the only significant point is that MIPS CPU Arch doesn't have yet smp_read_barrier_depends() and smp_rmb() should be used instead. Is SYNC_ACQUIRE a memory-barrier instruction that orders prior loads against later loads and stores? Yes, it is in MD00087 (table 6.6 of document Ver 6.04) - https://imgtec.com/?do-download=4302 If so, and if MIPS does not do ordering based on address and data dependencies, I suggest making read_barrier_depends() be a SYNC_ACQUIRE rather than SYNC_RMB. I understood that, after I see the example of using it. Please consider to add that into Documentation/memory-barriers.txt (it is not easy to find that this barrier is used for shared WRITE basing on shared pointer), it would be helpful. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/13/2016 02:45 AM, Will Deacon wrote: On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote: I don't think the address dependency is enough on its own. By that reasoning, the following variant (WRC+addr+addr) would work too: P0: Wx = 1 P1: Rx == 1 Wy = 1 P2: Ry == 1 Rx = 0 So are you saying that this is also forbidden? Imagine that P0 and P1 are two threads that share a store buffer. What then? I ask HW team about it but I have a question - has it any relationship with replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)? You use any barrier or do not use it and I just voice an intention to use a more efficient instruction instead of bold hummer (SYNC instruction). If you don't use any barrier here then it is a different issue. May be it has sense to return back to original issue? - Leonid ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/13/2016 12:48 PM, Peter Zijlstra wrote: On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote: I ask HW team about it but I have a question - has it any relationship with replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)? Of course. If you cannot explain the semantics of the primitives you introduce, how can we judge the patch. You missed a point - it is a question about replacement of SYNC with lightweight primitives. It is NOT a question about multithread system behavior without any SYNC. The answer on a latest Will's question lies in different area. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/13/2016 02:45 AM, Will Deacon wrote: I don't think the address dependency is enough on its own. By that reasoning, the following variant (WRC+addr+addr) would work too: P0: Wx = 1 P1: Rx == 1 Wy = 1 P2: Ry == 1 Rx = 0 So are you saying that this is also forbidden? Imagine that P0 and P1 are two threads that share a store buffer. What then? OK, I collected answers and it is: In MIPS R6 this test passes OK, I mean - P2: Rx = 1 if Ry is read as 1. By design. However, it is unclear that happens in MIPS R2 1004K. Moreover, there are voices against guarantee that it will be in future and that voices point me to Documentation/memory-barriers.txt section "DATA DEPENDENCY BARRIERS" examples which require SYNC_RMB between loading address/index and using that for loading data based on that address or index for shared data (look on CPU2 pseudo-code): To deal with this, a data dependency barrier or better must be inserted between the address load and the data load: CPU 1 CPU 2 === === { A == 1, B == 2, C = 3, P == , Q == } B = 4; WRITE_ONCE(P, ); Q = READ_ONCE(P); <--- SYNC_RMB is here D = *Q; ... Another example of where data dependency barriers might be required is where a number is read from memory and then used to calculate the index for an array access: CPU 1 CPU 2 === === { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } M[1] = 4; WRITE_ONCE(P, 1); Q = READ_ONCE(P); < SYNC_RMB is here D = M[Q]; That voices say that there is a legitimate reason to relax HW here for performance if SYNC_RMB is needed anyway to work with this sequence of shared data. And all that is out-of-topic here in my mind. I just want to be sure that this patchset still provides a use of a specific lightweight SYNCs on MIPS vs bold and heavy generalized "SYNC 0" in any case. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
(I try to answer on multiple mails in one) First of all, it seems like some generic notes should be given here: 1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in some CPUs. On that CPUs it basically kills pipelines in each CPU, can do a special memory/IO bus transaction (similar to "fence") and hold a system until all R/W is completed. It is like Big Kernel Lock but worse. So, the move to SMP_* kind of barriers is needed to improve performance, especially on newest CPUs with long pipelines. 2. MIPS Arch document may be misleading because words "ordering" and "completion" means different from Linux, the SYNC instruction description is written for HW engineers. I wrote that in a separate patch of the same patchset - http://patchwork.linux-mips.org/patch/10505/ "MIPS: R6: Use lightweight SYNC instruction in smp_* memory barriers": This instructions were specifically designed to work for smp_*() sort of memory barriers in MIPS R2/R3/R5 and R6. Unfortunately, it's description is very cryptic and is done in HW engineering style which prevents use of it by SW. 3. I bother MIPS Arch team long time until I completely understood that MIPS SYNC_WMB, SYNC_MB, SYNC_RMB, SYNC_RELEASE and SYNC_ACQUIRE do an exactly that is required in Documentation/memory-barriers.txt In Peter Zijlstra mail: 1) you do not make such things selectable; either the hardware needs them or it doesn't. If it does you_must_ use them, however unlikely. It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU resource, especially taking into account that "lightweight syncs" are converted to a heavy "SYNC 0" in many of that CPUs. However the latest MIPS/Imagination CPU have a pipeline long enough to hit a problem - absence of SYNC at LL/SC inside atomics, barriers etc. And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12 are_NOT_ transitive and therefore cannot be used to implement the smp_mb__{before,after} stuff. That is, in MIPS speak, those SYNC types are Ordering Barriers, not Completion Barriers. Please see above, point 2. That is, currently all architectures -- with exception of PPC -- have RCsc locks, but using these non-transitive things will get you RCpc locks. So yes, MIPS can go RCpc for its locks and share the burden of pain with PPC, but that needs to be a very concious decision. I don't understand that - I tried hard but I can't find any word like "RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course. In Will Deacon mail: The issue I have with the SYNC description in the text above is that it describes the single CPU (program order) and the dual-CPU (confusingly named global order) cases, but then doesn't generalise any further. That means we can't sensibly reason about transitivity properties when a third agent is involved. For example, the WRC+sync+addr test: P0: Wx = 1 P1: Rx == 1 SYNC Wy = 1 P2: Ry == 1 Rx = 0 I can't find anything to forbid that, given the text. The main problem is having the SYNC on P1 affect the write by P0. As I understand that test, the visibility of P0: W[x] = 1 is identical to P1 and P2 here. If P1 got X before SYNC and write to Y after SYNC then instruction source register dependency tracking in P2 prevents a speculative load of X before P2 obtains Y from the same place as P0/P1 and calculate address of X. If some load of X in P2 happens before address dependency calculation it's result is discarded. Yes, you can't find that in MIPS SYNC instruction description, it is more likely in CM (Coherence Manager) area. I just pointed our arch team member responsible for documents and he will think how to explain that. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/12/2016 01:40 PM, Peter Zijlstra wrote: It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU resource, especially taking into account that "lightweight syncs" are converted to a heavy "SYNC 0" in many of that CPUs. However the latest MIPS/Imagination CPU have a pipeline long enough to hit a problem - absence of SYNC at LL/SC inside atomics, barriers etc. What ?! Are you saying that because R2 has short pipelines its unlikely to hit the reordering issues and we can omit barriers? It was my guess to explain - why barriers was not included originally. You can check with Ralf, he knows more about that time MIPS Linux code. I bother with this more than 2 years and I just try to solve that issue - in recent CPUs the load after LL/SC synchronization instruction loop can get ahead of SC for sure, it was tested. And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12 are_NOT_ transitive and therefore cannot be used to implement the smp_mb__{before,after} stuff. That is, in MIPS speak, those SYNC types are Ordering Barriers, not Completion Barriers. Please see above, point 2. That did not in fact enlighten things. Are they transitive/multi-copy atomic or not? Peter Zijlstra recently wrote: "In particular we're very much all 'confused' about the various notions of transitivity". I am actually confused too and need some examples here. (and here Will will go into great detail on the differences between the two and make our collective brains explode :-) That is, currently all architectures -- with exception of PPC -- have RCsc locks, but using these non-transitive things will get you RCpc locks. So yes, MIPS can go RCpc for its locks and share the burden of pain with PPC, but that needs to be a very concious decision. I don't understand that - I tried hard but I can't find any word like "RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course. From: lkml.kernel.org/r/20150828153921.gf19...@twins.programming.kicks-ass.net Yes, the difference between RCpc and RCsc is in the meaning of RELEASE + ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does not. MIPS Arch starting from R2 requires that. If some CPU can't, it should execute a full "SYNC 0" instead, which is a full memory barrier. Currently PowerPC is the only arch that (can, and) does RCpc and gives a weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed to see the stores of the CPU which did the RELEASE in order. Yes, it was a goal for SYNC_ACQUIRE and SYNC_RELEASE. Caveats: - "Full memory barrier" on MIPS means - full barrier for any device in coherent domain. In MIPS Tech/Imagination Tech MIPS-based CPU it is "for any device connected to CM or IOCU + directly connected memory". - It is not applied to instruction fetch. However, I-Cache flushes and SYNCI are consistent with that. There is also hazard barrier instructions to clear CPU pipeline to some extent - to help with this limitation. I don't think that these caveats prevent a correct Acquire/Release semantic. - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote: On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends, smp_read_barrier_depends, smp_store_release and smp_load_acquire match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This statement doesn't fit MIPS barriers variations. Moreover, there is a reason to extend that even more specific, at least for smp_store_release and smp_load_acquire, look into http://patchwork.linux-mips.org/patch/10506/ - Leonid. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev