Re: [PATCH] vhost: move is_le setup to the backend

2016-01-12 Thread Michael S. Tsirkin
On Mon, Jan 11, 2016 at 03:39:38PM +0100, Greg Kurz wrote: > The vq->is_le field is used to fix endianness when accessing the vring via > the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases: > > 1) host is big endian and device is modern virtio > > 2) host has cross-endian

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Michael S. Tsirkin
On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote: > On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote: > >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends, > >smp_read_barrier_depends, smp_store_release and smp_load_acquire match > >the asm-generic variants exactly. Drop

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Peter Zijlstra
On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote: > This statement doesn't fit MIPS barriers variations. Moreover, there is a > reason to extend that even more specific, at least for smp_store_release and > smp_load_acquire, look into > >

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Peter Zijlstra
On Tue, Jan 12, 2016 at 10:43:36AM +0200, Michael S. Tsirkin wrote: > On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote: > > On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote: > > >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends, > > >smp_read_barrier_depends,

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Will Deacon
On Tue, Jan 12, 2016 at 11:40:12AM +0100, Peter Zijlstra wrote: > On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote: > > On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote: > > > 2) the changelog _completely_ fails to explain the sync 0x11 and sync > > > 0x12 semantics nor

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Peter Zijlstra
On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote: > On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote: > > 2) the changelog _completely_ fails to explain the sync 0x11 and sync > > 0x12 semantics nor does it provide a publicly accessible link to > > documentation that

Re: [PATCH] vhost: move is_le setup to the backend

2016-01-12 Thread Michael S. Tsirkin
On Tue, Jan 12, 2016 at 11:31:00AM +0100, Greg Kurz wrote: > On Tue, 12 Jan 2016 12:01:32 +0200 > "Michael S. Tsirkin" wrote: > > > On Mon, Jan 11, 2016 at 03:39:38PM +0100, Greg Kurz wrote: > > > The vq->is_le field is used to fix endianness when accessing the vring via > > >

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Michael S. Tsirkin
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Linus Torvalds
On Tue, Jan 12, 2016 at 9:45 AM, Michael S. Tsirkin wrote: > > By the way, the comment in barrier.h says: > > /* > * Some non-Intel clones support out of order store. wmb() ceases to be > * a nop for these. > */ > > and while the 1st sentence may well be true, if you have > an

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Michael S. Tsirkin
On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin wrote: > > #ifdef xchgrz > > /* same as xchg but poking at gcc red zone */ > > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": > >

[PATCH v2 1/3] x86: drop mfence in favor of lock+addl

2016-01-12 Thread Michael S. Tsirkin
mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, same as we always did on old 32-bit. Signed-off-by: Michael S. Tsirkin --- arch/x86/include/asm/barrier.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Peter Zijlstra
On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote: > (I try to answer on multiple mails in one) > > First of all, it seems like some generic notes should be given here: > > 1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in some > CPUs. On that CPUs it basically

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Linus Torvalds
On Tue, Jan 12, 2016 at 12:59 PM, Andy Lutomirski wrote: > > Here's an article with numbers: > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ Well, that's with the busy loop and one set of code generation. It doesn't show the "oops, deeper stack isn't even

[PATCH v2 2/3] x86: drop a comment left over from X86_OOSTORE

2016-01-12 Thread Michael S. Tsirkin
The comment about wmb being non-nop is a left over from before commit 09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE"). It makes no sense now: if you have an SMP system with out of order stores, making wmb not a nop will not help. Additionally, wmb is not a nop even for regular intel CPUs because

[PATCH v2 3/3] x86: tweak the comment about use of wmb for IO

2016-01-12 Thread Michael S. Tsirkin
On x86, we *do* still use the non-nop rmb/wmb for IO barriers, but even that is generally questionable. Leave them around as historial unless somebody can point to a case where they care about the performance, but tweak the comment so people don't think they are strictly required in all cases.

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

2016-01-12 Thread Michael S. Tsirkin
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. So let's use the locked variant everywhere - helps keep the code simple as well. While I was at it, I found some inconsistencies in comments

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Michael S. Tsirkin
On Tue, Jan 12, 2016 at 01:37:38PM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 12:59 PM, Andy Lutomirski wrote: > > > > Here's an article with numbers: > > > > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ > > Well, that's with the busy loop and

Re: [PATCH] vhost: move is_le setup to the backend

2016-01-12 Thread Greg Kurz
On Tue, 12 Jan 2016 12:01:32 +0200 "Michael S. Tsirkin" wrote: > On Mon, Jan 11, 2016 at 03:39:38PM +0100, Greg Kurz wrote: > > The vq->is_le field is used to fix endianness when accessing the vring via > > the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-12 Thread Peter Zijlstra
On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote: > 2) the changelog _completely_ fails to explain the sync 0x11 and sync > 0x12 semantics nor does it provide a publicly accessible link to > documentation that does. Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/

Re: [PATCH v3 00/41] arch: barrier cleanup + barriers for virt

2016-01-12 Thread Peter Zijlstra
On Sun, Jan 10, 2016 at 04:16:22PM +0200, Michael S. Tsirkin wrote: > I parked this in vhost tree for now, though the inclusion of patch 1 from tip > creates a merge conflict - but one that is trivial to resolve. > > So I intend to just merge it all through my tree, including the > duplicate

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Andy Lutomirski
On 01/12/2016 09:20 AM, Linus Torvalds wrote: On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin wrote: #ifdef xchgrz /* same as xchg but poking at gcc red zone */ #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Linus Torvalds
On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski wrote: > > I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64) > was better because it avoided stomping on very-likely-to-be-hot write > buffers. I suspect it could go either way. You want a small

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Andy Lutomirski
On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski wrote: >> >> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64) >> was better because it avoided stomping on

Re: [PATCH v3 27/41] x86: define __smp_xxx

2016-01-12 Thread Thomas Gleixner
On Sun, 10 Jan 2016, Michael S. Tsirkin wrote: > This defines __smp_xxx barriers for x86, > for use by virtualization. > > smp_xxx barriers are removed as they are > defined correctly by asm-generic/barriers.h > > Signed-off-by: Michael S. Tsirkin > Acked-by: Arnd Bergmann

Re: [PATCH v3 13/41] x86: reuse asm-generic/barrier.h

2016-01-12 Thread Thomas Gleixner
On Sun, 10 Jan 2016, Michael S. Tsirkin wrote: > As on most architectures, on x86 read_barrier_depends and > smp_read_barrier_depends are empty. Drop the local definitions and pull > the generic ones from asm-generic/barrier.h instead: they are identical. > > This is in preparation to

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread H. Peter Anvin
On 01/12/16 14:21, Michael S. Tsirkin wrote: > > OK so I'll have to tweak the test to put something > on stack to measure the difference: my test tweaks a > global variable instead. > I'll try that by tomorrow. > > I couldn't measure any difference between mfence and lock+addl > except in a

Re: [PATCH v2 2/3] x86: drop a comment left over from X86_OOSTORE

2016-01-12 Thread One Thousand Gnomes
On Wed, 13 Jan 2016 00:10:19 +0200 "Michael S. Tsirkin" wrote: > The comment about wmb being non-nop is a left over from before commit > 09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE"). > > It makes no sense now: if you have an SMP system with out of order > stores, making wmb

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Michael S. Tsirkin
On Tue, Jan 12, 2016 at 12:59:58PM -0800, Andy Lutomirski wrote: > On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds > wrote: > > On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski wrote: > >> > >> I recall reading somewhere that lock addl $0,

Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

2016-01-12 Thread H. Peter Anvin
On 01/12/16 14:10, Michael S. Tsirkin wrote: > mb() typically uses mfence on modern x86, but a micro-benchmark shows that > it's > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. > > So let's use the locked variant everywhere - helps keep the code simple as > well. >

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Linus Torvalds
On Tue, Jan 12, 2016 at 2:55 PM, H. Peter Anvin wrote: > > Be careful with this: if it only shows up in a microbenchmark, we may > introduce a hard-to-debug regression for no real benefit. So I can pretty much guarantee that it shouldn't regress from a correctness angle, since we

Re: [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()

2016-01-12 Thread Michael S. Tsirkin
On Tue, Jan 12, 2016 at 08:28:44AM -0800, Paul E. McKenney wrote: > On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote: > > From: Davidlohr Bueso > > > > With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()") > > it was made clear that the

Re: [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()

2016-01-12 Thread Paul E. McKenney
On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote: > From: Davidlohr Bueso > > With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()") > it was made clear that the context of this call (and thus set_mb) > is strictly for CPU ordering, as

Re: [PATCH v3 05/41] powerpc: reuse asm-generic/barrier.h

2016-01-12 Thread Paul E. McKenney
On Sun, Jan 10, 2016 at 04:17:09PM +0200, Michael S. Tsirkin wrote: > On powerpc read_barrier_depends, smp_read_barrier_depends > smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the > asm-generic variants exactly. Drop the local definitions and pull in > asm-generic/barrier.h

Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016-01-12 Thread Linus Torvalds
On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin wrote: > #ifdef xchgrz > /* same as xchg but poking at gcc red zone */ > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": > "=r"(ret) :: "memory", "cc"); } while (0) > #endif That's not safe in general.