On Tue, Jan 8, 2019 at 1:10 AM David Laight wrote:
> >
> > It will never work for memcpy_fromio(). Any driver that thinks it will
> > copy from io space to user space absolutely *has* to do it by hand. No
> > questions, and no exceptions. Some loop like
> >
> >for (..)
> > put_user(readl
From: Linus Torvalds
> Sent: 07 January 2019 17:44
> On Mon, Jan 7, 2019 at 1:55 AM David Laight wrote:
> >
> > I needed to open-code one part because it wants to do copy_to_user()
> > from a PCIe address buffer (which has to work).
>
> It will never work for memcpy_fromio(). Any driver that thin
On Mon, Jan 7, 2019 at 1:55 AM David Laight wrote:
>
> I needed to open-code one part because it wants to do copy_to_user()
> from a PCIe address buffer (which has to work).
It will never work for memcpy_fromio(). Any driver that thinks it will
copy from io space to user space absolutely *has* to
From: Linus Torvalds
> Sent: 05 January 2019 02:39
...
> Anyway, it would be lovely to hear whether memcpy_toio() now works
> reasonably. I just picked our very old legacy function for this, so it
> will do things in 32-bit chunks (even on x86-64), and I'm certainly
> open to somebody doing somethi
Coming back to this old thread, because I've spent most of the day
resurrecting some of my old core x86 patches, and one of them was for
the issue David Laight complained about: horrible memcpy_toio()
performance.
Yes, I should have done this before the merge window instead of at the
end of it, bu
From: Linus Torvalds
> Sent: 23 November 2018 16:36
...
> End result: we *used* to do this right. For the last eight years our
> "memcpy_{to,from}io()" has been entirely broken, and apparently even
> the people who noticed oddities like David, never reported it as
> breakage but instead just worked
From: Andy Lutomirski
> Sent: 23 November 2018 19:11
> > On Nov 23, 2018, at 11:44 AM, Linus Torvalds
> > wrote:
> >
> >> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski
> >> wrote:
> >>
> >> What is memcpy_to_io even supposed to do? I’m guessing it’s defined as
> >> something like “copy this
From: Linus Torvalds
> Sent: 23 November 2018 16:36
>
> On Fri, Nov 23, 2018 at 2:12 AM David Laight wrote:
> >
> > I've just patched my driver and redone the test on a 4.13 (ubuntu) kernel.
> > Calling memcpy_fromio(kernel_buffer, PCIe_address, length)
> > generates a lot of single byte TLP.
>
On 11/21/18 11:16 AM, Linus Torvalds wrote:
> On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds
> wrote:
>>
>> It would be interesting to know exactly which copy it is that matters
>> so much... *inlining* the erms case might show that nicely in
>> profiles.
>
> Side note: the fact that Jens' patch
> On Nov 23, 2018, at 11:44 AM, Linus Torvalds
> wrote:
>
>> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski wrote:
>>
>> What is memcpy_to_io even supposed to do? I’m guessing it’s defined as
>> something like “copy this data to IO space using at most long-sized writes,
>> all aligned,
On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski wrote:
>
> What is memcpy_to_io even supposed to do? I’m guessing it’s defined as
> something like “copy this data to IO space using at most long-sized writes,
> all aligned, and writing each byte exactly once, in order.” That sounds...
> dubio
> On Nov 23, 2018, at 10:42 AM, Linus Torvalds
> wrote:
>
> On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds
> wrote:
>>
>> Let me write a generic routine in lib/iomap_copy.c (which already does
>> the "user specifies chunk size" cases), and hook it up for x86.
>
> Something like this?
>
>
On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds
wrote:
>
> Let me write a generic routine in lib/iomap_copy.c (which already does
> the "user specifies chunk size" cases), and hook it up for x86.
Something like this?
ENTIRELY UNTESTED! It might not compile. Seriously. And if it does
compile, it m
On Thu, Nov 22, 2018 at 12:13:41PM +0100, Ingo Molnar wrote:
> Note to self: watch out for patches that change altinstructions and don't
> make premature vmlinux size impact assumptions. :-)
I noticed a similar problem with ORC data. As it turns out, size's
"text" calculation also includes read-
On Fri, Nov 23, 2018 at 2:12 AM David Laight wrote:
>
> I've just patched my driver and redone the test on a 4.13 (ubuntu) kernel.
> Calling memcpy_fromio(kernel_buffer, PCIe_address, length)
> generates a lot of single byte TLP.
I just tested it too - it turns out that the __inline_memcpy() code
From: David Laight
> Sent: 23 November 2018 09:35
> From: Linus Torvalds
> > Sent: 22 November 2018 18:58
> ...
> > Oh, and I just noticed that on x86 we expressly use our old "safe and
> > sane" functions: see __inline_memcpy(), and its use in
> > __memcpy_{from,to}io().
> >
> > So the "falls back
From: Linus Torvalds
> Sent: 22 November 2018 18:58
...
> Oh, and I just noticed that on x86 we expressly use our old "safe and
> sane" functions: see __inline_memcpy(), and its use in
> __memcpy_{from,to}io().
>
> So the "falls back to memcpy" was always a red herring. We don't
> actually do that
On Thu, Nov 22, 2018 at 10:07 AM Andy Lutomirski wrote:
>
> I'm not personally volunteering, but I suspect we can do much better
> than we do now:
>
> - The new MOVDIRI and MOVDIR64B instructions can do big writes to WC
> and UC memory.
>
> - MOVNTDQA can, I think, do 64-byte loads, but only fro
On Thu, Nov 22, 2018 at 9:53 AM Linus Torvalds
wrote:
>
> On Thu, Nov 22, 2018 at 9:36 AM David Laight wrote:
> >
> > The other problem with the ERMS copy is that it gets used
> > for copy_to/from_io() - and the 'rep movsb' on uncached
> > locations has to do byte copies.
>
> Ugh. I thought we ch
On Thu, Nov 22, 2018 at 9:36 AM David Laight wrote:
>
> The other problem with the ERMS copy is that it gets used
> for copy_to/from_io() - and the 'rep movsb' on uncached
> locations has to do byte copies.
Ugh. I thought we changed that *long* ago, because even our non-ERMS
copy is broken for PC
From: Denys Vlasenko
> Sent: 21 November 2018 13:44
...
> I also tested this while working for string ops code in musl.
>
> I think at least 128 bytes would be the minimum where "REP insn"
> are more efficient. In my testing, it's more like 256 bytes...
What happens for misaligned copies?
I had a
On Thu, Nov 22, 2018 at 9:26 AM Andy Lutomirski wrote:
>
> So I think your patch is viable. Also, with that patch applied,
> put_user_ex() should become worse than worthless
Yes. I hate those special-case _ex variants.
I guess I should just properly forward-port my patch series where the
differ
On Thu, Nov 22, 2018 at 8:56 AM Linus Torvalds
wrote:
>
> On Thu, Nov 22, 2018 at 2:32 AM Ingo Molnar wrote:
> > * Linus Torvalds wrote:
> > >
> > > Random patch (with my "asm goto" hack included) attached, in case
> > > people want to play with it.
> >
> > Doesn't even look all that hacky to me
On Thu, Nov 22, 2018 at 2:32 AM Ingo Molnar wrote:
> * Linus Torvalds wrote:
> >
> > Random patch (with my "asm goto" hack included) attached, in case
> > people want to play with it.
>
> Doesn't even look all that hacky to me. Any hack in it that I didn't
> notice? :-)
The code to use asm goto
* Ingo Molnar wrote:
> So I dug into this some more:
>
> 1)
>
> Firstly I tracked down GCC bloating the might_fault() checks and the
> related out-of-line code exception handling which bloats the full
> generated function.
Sorry, I mis-remembered that detail when I wrote the email: it was
* Ingo Molnar wrote:
> The kernel text size reduction with Jen's patch is small but real:
>
> text databss dec hex filename
> 19572694 115169341987388850963516309a43c
> vmlinux.before
> 19572468 11516934
* Linus Torvalds wrote:
> On Wed, Nov 21, 2018 at 10:16 AM Linus Torvalds
> wrote:
> >
> > It might be interesting to just change raw_copy_to/from_user() to
> > handle a lot more cases (in particular, handle cases where 'size' is
> > 8-byte aligned). The special cases we *do* have may not be t
On Wed, Nov 21, 2018 at 10:44 AM Linus Torvalds
wrote:
>
> On Wed, Nov 21, 2018 at 10:26 AM Andy Lutomirski wrote:
> >
> > Can we maybe use this as an excuse to ask for some reasonable instructions
> > to access user memory?
>
> I did that long ago. It's why we have CLAC/STAC today. I was told t
On Wed, Nov 21, 2018 at 10:16 AM Linus Torvalds
wrote:
>
> It might be interesting to just change raw_copy_to/from_user() to
> handle a lot more cases (in particular, handle cases where 'size' is
> 8-byte aligned). The special cases we *do* have may not be the right
> ones (the 10-byte case in par
On Wed, Nov 21, 2018 at 10:26 AM Andy Lutomirski wrote:
>
> Can we maybe use this as an excuse to ask for some reasonable instructions to
> access user memory?
I did that long ago. It's why we have CLAC/STAC today. I was told that
what I actually asked for (get an instruction to access user spac
> On Nov 21, 2018, at 11:04 AM, Jens Axboe wrote:
>
>> On 11/21/18 10:27 AM, Linus Torvalds wrote:
>>> On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote:
>>>
>>> In my experiments 64 bytes was the break even point for all the CPUs I
>>> had handy, but I guess that may change with other model
On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds
wrote:
>
> It would be interesting to know exactly which copy it is that matters
> so much... *inlining* the erms case might show that nicely in
> profiles.
Side note: the fact that Jens' patch (which I don't like in that form)
allegedly shrunk the
On 11/21/18 10:27 AM, Linus Torvalds wrote:
> On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote:
>>
>> In my experiments 64 bytes was the break even point for all the CPUs I
>> had handy, but I guess that may change with other models.
>
> Note that experiments with memcpy speed are almost invaria
On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote:
>
> In my experiments 64 bytes was the break even point for all the CPUs I
> had handy, but I guess that may change with other models.
Note that experiments with memcpy speed are almost invariably broken.
microbenchmarks don't show the impact of
On Wed, 2018-11-21 at 06:32 -0700, Jens Axboe wrote:
> I did some more investigation yesterday, and found this:
>
> commit 236222d39347e0e486010f10c1493e83dbbdfba8
> Author: Paolo Abeni
> Date: Thu Jun 29 15:55:58 2017 +0200
>
> x86/uaccess: Optimize copy_user_enhanced_fast_string() for sh
On 11/21/2018 02:32 PM, Jens Axboe wrote:
On 11/20/18 11:36 PM, Ingo Molnar wrote:
* Jens Axboe wrote:
So this is a fun one... While I was doing the aio polled work, I noticed
that the submitting process spent a substantial amount of time copying
data to/from userspace. For aio, that's iocb an
On 11/20/18 11:36 PM, Ingo Molnar wrote:
>
> [ Cc:-ed a few other gents and lkml. ]
>
> * Jens Axboe wrote:
>
>> Hi,
>>
>> So this is a fun one... While I was doing the aio polled work, I noticed
>> that the submitting process spent a substantial amount of time copying
>> data to/from userspace
[ Cc:-ed a few other gents and lkml. ]
* Jens Axboe wrote:
> Hi,
>
> So this is a fun one... While I was doing the aio polled work, I noticed
> that the submitting process spent a substantial amount of time copying
> data to/from userspace. For aio, that's iocb and io_event, which are 64
> an
Forgot to CC the mailing list...
On 11/20/18 1:18 PM, Jens Axboe wrote:
> Hi,
>
> So this is a fun one... While I was doing the aio polled work, I noticed
> that the submitting process spent a substantial amount of time copying
> data to/from userspace. For aio, that's iocb and io_event, which a
39 matches
Mail list logo