On Mon, Jan 11, 2016 at 03:39:38PM +0100, Greg Kurz wrote:
> The vq->is_le field is used to fix endianness when accessing the vring via
> the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases:
>
> 1) host is big endian and device is modern virtio
>
> 2) host has cross-endian
On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> >smp_read_barrier_depends, smp_store_release and smp_load_acquire match
> >the asm-generic variants exactly. Drop
On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> This statement doesn't fit MIPS barriers variations. Moreover, there is a
> reason to extend that even more specific, at least for smp_store_release and
> smp_load_acquire, look into
>
>
On Tue, Jan 12, 2016 at 10:43:36AM +0200, Michael S. Tsirkin wrote:
> On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> > On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> > >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> > >smp_read_barrier_depends,
On Tue, Jan 12, 2016 at 11:40:12AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> > > 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> > > 0x12 semantics nor
On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> > 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> > 0x12 semantics nor does it provide a publicly accessible link to
> > documentation that
On Tue, Jan 12, 2016 at 11:31:00AM +0100, Greg Kurz wrote:
> On Tue, 12 Jan 2016 12:01:32 +0200
> "Michael S. Tsirkin" wrote:
>
> > On Mon, Jan 11, 2016 at 03:39:38PM +0100, Greg Kurz wrote:
> > > The vq->is_le field is used to fix endianness when accessing the vring via
> > >
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso wrote:
> >
> > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> > constantly cheaper (by at least half the latency) than MFENCE. While
On Tue, Jan 12, 2016 at 9:45 AM, Michael S. Tsirkin wrote:
>
> By the way, the comment in barrier.h says:
>
> /*
> * Some non-Intel clones support out of order store. wmb() ceases to be
> * a nop for these.
> */
>
> and while the 1st sentence may well be true, if you have
> an
On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin wrote:
> > #ifdef xchgrz
> > /* same as xchg but poking at gcc red zone */
> > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");":
> >
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, same as we always did on old 32-bit.
Signed-off-by: Michael S. Tsirkin
---
arch/x86/include/asm/barrier.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git
On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
> (I try to answer on multiple mails in one)
>
> First of all, it seems like some generic notes should be given here:
>
> 1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in some
> CPUs. On that CPUs it basically
On Tue, Jan 12, 2016 at 12:59 PM, Andy Lutomirski wrote:
>
> Here's an article with numbers:
>
> http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
Well, that's with the busy loop and one set of code generation. It
doesn't show the "oops, deeper stack isn't even
The comment about wmb being non-nop is a left over from before commit
09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE").
It makes no sense now: if you have an SMP system with out of order
stores, making wmb not a nop will not help.
Additionally, wmb is not a nop even for regular intel CPUs because
On x86, we *do* still use the non-nop rmb/wmb for IO barriers, but even
that is generally questionable.
Leave them around as historial unless somebody can point to a case where
they care about the performance, but tweak the comment so people
don't think they are strictly required in all cases.
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
So let's use the locked variant everywhere - helps keep the code simple as
well.
While I was at it, I found some inconsistencies in comments
On Tue, Jan 12, 2016 at 01:37:38PM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 12:59 PM, Andy Lutomirski wrote:
> >
> > Here's an article with numbers:
> >
> > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
>
> Well, that's with the busy loop and
On Tue, 12 Jan 2016 12:01:32 +0200
"Michael S. Tsirkin" wrote:
> On Mon, Jan 11, 2016 at 03:39:38PM +0100, Greg Kurz wrote:
> > The vq->is_le field is used to fix endianness when accessing the vring via
> > the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following
On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> 0x12 semantics nor does it provide a publicly accessible link to
> documentation that does.
Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/
On Sun, Jan 10, 2016 at 04:16:22PM +0200, Michael S. Tsirkin wrote:
> I parked this in vhost tree for now, though the inclusion of patch 1 from tip
> creates a merge conflict - but one that is trivial to resolve.
>
> So I intend to just merge it all through my tree, including the
> duplicate
On 01/12/2016 09:20 AM, Linus Torvalds wrote:
On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin wrote:
#ifdef xchgrz
/* same as xchg but poking at gcc red zone */
#define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) ::
"memory", "cc"); } while
On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski wrote:
>
> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64)
> was better because it avoided stomping on very-likely-to-be-hot write
> buffers.
I suspect it could go either way. You want a small
On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds
wrote:
> On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski wrote:
>>
>> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64)
>> was better because it avoided stomping on
On Sun, 10 Jan 2016, Michael S. Tsirkin wrote:
> This defines __smp_xxx barriers for x86,
> for use by virtualization.
>
> smp_xxx barriers are removed as they are
> defined correctly by asm-generic/barriers.h
>
> Signed-off-by: Michael S. Tsirkin
> Acked-by: Arnd Bergmann
On Sun, 10 Jan 2016, Michael S. Tsirkin wrote:
> As on most architectures, on x86 read_barrier_depends and
> smp_read_barrier_depends are empty. Drop the local definitions and pull
> the generic ones from asm-generic/barrier.h instead: they are identical.
>
> This is in preparation to
On 01/12/16 14:21, Michael S. Tsirkin wrote:
>
> OK so I'll have to tweak the test to put something
> on stack to measure the difference: my test tweaks a
> global variable instead.
> I'll try that by tomorrow.
>
> I couldn't measure any difference between mfence and lock+addl
> except in a
On Wed, 13 Jan 2016 00:10:19 +0200
"Michael S. Tsirkin" wrote:
> The comment about wmb being non-nop is a left over from before commit
> 09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE").
>
> It makes no sense now: if you have an SMP system with out of order
> stores, making wmb
On Tue, Jan 12, 2016 at 12:59:58PM -0800, Andy Lutomirski wrote:
> On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds
> wrote:
> > On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski wrote:
> >>
> >> I recall reading somewhere that lock addl $0,
On 01/12/16 14:10, Michael S. Tsirkin wrote:
> mb() typically uses mfence on modern x86, but a micro-benchmark shows that
> it's
> 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
>
> So let's use the locked variant everywhere - helps keep the code simple as
> well.
>
On Tue, Jan 12, 2016 at 2:55 PM, H. Peter Anvin wrote:
>
> Be careful with this: if it only shows up in a microbenchmark, we may
> introduce a hard-to-debug regression for no real benefit.
So I can pretty much guarantee that it shouldn't regress from a
correctness angle, since we
On Tue, Jan 12, 2016 at 08:28:44AM -0800, Paul E. McKenney wrote:
> On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote:
> > From: Davidlohr Bueso
> >
> > With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
> > it was made clear that the
On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote:
> From: Davidlohr Bueso
>
> With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
> it was made clear that the context of this call (and thus set_mb)
> is strictly for CPU ordering, as
On Sun, Jan 10, 2016 at 04:17:09PM +0200, Michael S. Tsirkin wrote:
> On powerpc read_barrier_depends, smp_read_barrier_depends
> smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
> asm-generic variants exactly. Drop the local definitions and pull in
> asm-generic/barrier.h
On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin wrote:
> #ifdef xchgrz
> /* same as xchg but poking at gcc red zone */
> #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");":
> "=r"(ret) :: "memory", "cc"); } while (0)
> #endif
That's not safe in general.
34 matches
Mail list logo