subject:"RE\: \[PATCH\] x86\: only use ERMS for user copies for larger sizes"

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2019-01-08 Thread Linus Torvalds

On Tue, Jan 8, 2019 at 1:10 AM David Laight wrote: > > > > It will never work for memcpy_fromio(). Any driver that thinks it will > > copy from io space to user space absolutely *has* to do it by hand. No > > questions, and no exceptions. Some loop like > > > >for (..) > > put_user(readl

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2019-01-08 Thread David Laight

From: Linus Torvalds > Sent: 07 January 2019 17:44 > On Mon, Jan 7, 2019 at 1:55 AM David Laight wrote: > > > > I needed to open-code one part because it wants to do copy_to_user() > > from a PCIe address buffer (which has to work). > > It will never work for memcpy_fromio(). Any driver that thin

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2019-01-07 Thread Linus Torvalds

On Mon, Jan 7, 2019 at 1:55 AM David Laight wrote: > > I needed to open-code one part because it wants to do copy_to_user() > from a PCIe address buffer (which has to work). It will never work for memcpy_fromio(). Any driver that thinks it will copy from io space to user space absolutely *has* to

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2019-01-07 Thread David Laight

From: Linus Torvalds > Sent: 05 January 2019 02:39 ... > Anyway, it would be lovely to hear whether memcpy_toio() now works > reasonably. I just picked our very old legacy function for this, so it > will do things in 32-bit chunks (even on x86-64), and I'm certainly > open to somebody doing somethi

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2019-01-04 Thread Linus Torvalds

Coming back to this old thread, because I've spent most of the day resurrecting some of my old core x86 patches, and one of them was for the issue David Laight complained about: horrible memcpy_toio() performance. Yes, I should have done this before the merge window instead of at the end of it, bu

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-26 Thread David Laight

From: Linus Torvalds > Sent: 23 November 2018 16:36 ... > End result: we *used* to do this right. For the last eight years our > "memcpy_{to,from}io()" has been entirely broken, and apparently even > the people who noticed oddities like David, never reported it as > breakage but instead just worked

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-26 Thread David Laight

From: Andy Lutomirski > Sent: 23 November 2018 19:11 > > On Nov 23, 2018, at 11:44 AM, Linus Torvalds > > wrote: > > > >> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski > >> wrote: > >> > >> What is memcpy_to_io even supposed to do? I’m guessing it’s defined as > >> something like “copy this

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-26 Thread David Laight

From: Linus Torvalds > Sent: 23 November 2018 16:36 > > On Fri, Nov 23, 2018 at 2:12 AM David Laight wrote: > > > > I've just patched my driver and redone the test on a 4.13 (ubuntu) kernel. > > Calling memcpy_fromio(kernel_buffer, PCIe_address, length) > > generates a lot of single byte TLP. >

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Jens Axboe

On 11/21/18 11:16 AM, Linus Torvalds wrote: > On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds > wrote: >> >> It would be interesting to know exactly which copy it is that matters >> so much... *inlining* the erms case might show that nicely in >> profiles. > > Side note: the fact that Jens' patch

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Andy Lutomirski

> On Nov 23, 2018, at 11:44 AM, Linus Torvalds > wrote: > >> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski wrote: >> >> What is memcpy_to_io even supposed to do? I’m guessing it’s defined as >> something like “copy this data to IO space using at most long-sized writes, >> all aligned,

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds

On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski wrote: > > What is memcpy_to_io even supposed to do? I’m guessing it’s defined as > something like “copy this data to IO space using at most long-sized writes, > all aligned, and writing each byte exactly once, in order.” That sounds... > dubio

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Andy Lutomirski

> On Nov 23, 2018, at 10:42 AM, Linus Torvalds > wrote: > > On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds > wrote: >> >> Let me write a generic routine in lib/iomap_copy.c (which already does >> the "user specifies chunk size" cases), and hook it up for x86. > > Something like this? > >

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds

On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds wrote: > > Let me write a generic routine in lib/iomap_copy.c (which already does > the "user specifies chunk size" cases), and hook it up for x86. Something like this? ENTIRELY UNTESTED! It might not compile. Seriously. And if it does compile, it m

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Josh Poimboeuf

On Thu, Nov 22, 2018 at 12:13:41PM +0100, Ingo Molnar wrote: > Note to self: watch out for patches that change altinstructions and don't > make premature vmlinux size impact assumptions. :-) I noticed a similar problem with ORC data. As it turns out, size's "text" calculation also includes read-

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds

On Fri, Nov 23, 2018 at 2:12 AM David Laight wrote: > > I've just patched my driver and redone the test on a 4.13 (ubuntu) kernel. > Calling memcpy_fromio(kernel_buffer, PCIe_address, length) > generates a lot of single byte TLP. I just tested it too - it turns out that the __inline_memcpy() code

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread David Laight

From: David Laight > Sent: 23 November 2018 09:35 > From: Linus Torvalds > > Sent: 22 November 2018 18:58 > ... > > Oh, and I just noticed that on x86 we expressly use our old "safe and > > sane" functions: see __inline_memcpy(), and its use in > > __memcpy_{from,to}io(). > > > > So the "falls back

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread David Laight

From: Linus Torvalds > Sent: 22 November 2018 18:58 ... > Oh, and I just noticed that on x86 we expressly use our old "safe and > sane" functions: see __inline_memcpy(), and its use in > __memcpy_{from,to}io(). > > So the "falls back to memcpy" was always a red herring. We don't > actually do that

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds

On Thu, Nov 22, 2018 at 10:07 AM Andy Lutomirski wrote: > > I'm not personally volunteering, but I suspect we can do much better > than we do now: > > - The new MOVDIRI and MOVDIR64B instructions can do big writes to WC > and UC memory. > > - MOVNTDQA can, I think, do 64-byte loads, but only fro

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Andy Lutomirski

On Thu, Nov 22, 2018 at 9:53 AM Linus Torvalds wrote: > > On Thu, Nov 22, 2018 at 9:36 AM David Laight wrote: > > > > The other problem with the ERMS copy is that it gets used > > for copy_to/from_io() - and the 'rep movsb' on uncached > > locations has to do byte copies. > > Ugh. I thought we ch

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds

On Thu, Nov 22, 2018 at 9:36 AM David Laight wrote: > > The other problem with the ERMS copy is that it gets used > for copy_to/from_io() - and the 'rep movsb' on uncached > locations has to do byte copies. Ugh. I thought we changed that *long* ago, because even our non-ERMS copy is broken for PC

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread David Laight

From: Denys Vlasenko > Sent: 21 November 2018 13:44 ... > I also tested this while working for string ops code in musl. > > I think at least 128 bytes would be the minimum where "REP insn" > are more efficient. In my testing, it's more like 256 bytes... What happens for misaligned copies? I had a

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds

On Thu, Nov 22, 2018 at 9:26 AM Andy Lutomirski wrote: > > So I think your patch is viable. Also, with that patch applied, > put_user_ex() should become worse than worthless Yes. I hate those special-case _ex variants. I guess I should just properly forward-port my patch series where the differ

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Andy Lutomirski

On Thu, Nov 22, 2018 at 8:56 AM Linus Torvalds wrote: > > On Thu, Nov 22, 2018 at 2:32 AM Ingo Molnar wrote: > > * Linus Torvalds wrote: > > > > > > Random patch (with my "asm goto" hack included) attached, in case > > > people want to play with it. > > > > Doesn't even look all that hacky to me

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds

On Thu, Nov 22, 2018 at 2:32 AM Ingo Molnar wrote: > * Linus Torvalds wrote: > > > > Random patch (with my "asm goto" hack included) attached, in case > > people want to play with it. > > Doesn't even look all that hacky to me. Any hack in it that I didn't > notice? :-) The code to use asm goto

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Ingo Molnar

* Ingo Molnar wrote: > So I dug into this some more: > > 1) > > Firstly I tracked down GCC bloating the might_fault() checks and the > related out-of-line code exception handling which bloats the full > generated function. Sorry, I mis-remembered that detail when I wrote the email: it was

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Ingo Molnar

* Ingo Molnar wrote: > The kernel text size reduction with Jen's patch is small but real: > > text databss dec hex filename > 19572694 115169341987388850963516309a43c > vmlinux.before > 19572468 11516934

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Ingo Molnar

* Linus Torvalds wrote: > On Wed, Nov 21, 2018 at 10:16 AM Linus Torvalds > wrote: > > > > It might be interesting to just change raw_copy_to/from_user() to > > handle a lot more cases (in particular, handle cases where 'size' is > > 8-byte aligned). The special cases we *do* have may not be t

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Andy Lutomirski

On Wed, Nov 21, 2018 at 10:44 AM Linus Torvalds wrote: > > On Wed, Nov 21, 2018 at 10:26 AM Andy Lutomirski wrote: > > > > Can we maybe use this as an excuse to ask for some reasonable instructions > > to access user memory? > > I did that long ago. It's why we have CLAC/STAC today. I was told t

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds

On Wed, Nov 21, 2018 at 10:16 AM Linus Torvalds wrote: > > It might be interesting to just change raw_copy_to/from_user() to > handle a lot more cases (in particular, handle cases where 'size' is > 8-byte aligned). The special cases we *do* have may not be the right > ones (the 10-byte case in par

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds

On Wed, Nov 21, 2018 at 10:26 AM Andy Lutomirski wrote: > > Can we maybe use this as an excuse to ask for some reasonable instructions to > access user memory? I did that long ago. It's why we have CLAC/STAC today. I was told that what I actually asked for (get an instruction to access user spac

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Andy Lutomirski

> On Nov 21, 2018, at 11:04 AM, Jens Axboe wrote: > >> On 11/21/18 10:27 AM, Linus Torvalds wrote: >>> On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote: >>> >>> In my experiments 64 bytes was the break even point for all the CPUs I >>> had handy, but I guess that may change with other model

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds

On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds wrote: > > It would be interesting to know exactly which copy it is that matters > so much... *inlining* the erms case might show that nicely in > profiles. Side note: the fact that Jens' patch (which I don't like in that form) allegedly shrunk the

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Jens Axboe

On 11/21/18 10:27 AM, Linus Torvalds wrote: > On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote: >> >> In my experiments 64 bytes was the break even point for all the CPUs I >> had handy, but I guess that may change with other models. > > Note that experiments with memcpy speed are almost invaria

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds

On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote: > > In my experiments 64 bytes was the break even point for all the CPUs I > had handy, but I guess that may change with other models. Note that experiments with memcpy speed are almost invariably broken. microbenchmarks don't show the impact of

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Paolo Abeni

On Wed, 2018-11-21 at 06:32 -0700, Jens Axboe wrote: > I did some more investigation yesterday, and found this: > > commit 236222d39347e0e486010f10c1493e83dbbdfba8 > Author: Paolo Abeni > Date: Thu Jun 29 15:55:58 2017 +0200 > > x86/uaccess: Optimize copy_user_enhanced_fast_string() for sh

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Denys Vlasenko

On 11/21/2018 02:32 PM, Jens Axboe wrote: On 11/20/18 11:36 PM, Ingo Molnar wrote: * Jens Axboe wrote: So this is a fun one... While I was doing the aio polled work, I noticed that the submitting process spent a substantial amount of time copying data to/from userspace. For aio, that's iocb an

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Jens Axboe

On 11/20/18 11:36 PM, Ingo Molnar wrote: > > [ Cc:-ed a few other gents and lkml. ] > > * Jens Axboe wrote: > >> Hi, >> >> So this is a fun one... While I was doing the aio polled work, I noticed >> that the submitting process spent a substantial amount of time copying >> data to/from userspace

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-20 Thread Ingo Molnar

[ Cc:-ed a few other gents and lkml. ] * Jens Axboe wrote: > Hi, > > So this is a fun one... While I was doing the aio polled work, I noticed > that the submitting process spent a substantial amount of time copying > data to/from userspace. For aio, that's iocb and io_event, which are 64 > an

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-20 Thread Jens Axboe

Forgot to CC the mailing list... On 11/20/18 1:18 PM, Jens Axboe wrote: > Hi, > > So this is a fun one... While I was doing the aio polled work, I noticed > that the submitting process spent a substantial amount of time copying > data to/from userspace. For aio, that's iocb and io_event, which a

39 matches

Mail list logo