Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-04-03 Thread Ingo Molnar
* Pavel Machek wrote: > > > > Yeah, so generic memcpy() replacement is only feasible I think if the > > > > most > > > > optimistic implementation is actually correct: > > > > > > > > - if no preempt disable()/enable() is required > > > > > > > > - if direct access to the

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-04-03 Thread Pavel Machek
Hi! > > On Tue, 20 Mar 2018, Ingo Molnar wrote: > > > * Thomas Gleixner wrote: > > > > > > > > So I do think we could do more in this area to improve driver > > > > > performance, if the > > > > > code is correct and if there's actual benchmarks that are showing > > > > >

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-22 Thread Andy Lutomirski
On Thu, Mar 22, 2018 at 5:40 PM, Alexei Starovoitov wrote: > On Thu, Mar 22, 2018 at 10:33:43AM +0100, Ingo Molnar wrote: >> >> - I think the BPF JIT, whose byte code machine languge is used by an >>increasing number of kernel subsystems, could benefit from

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-22 Thread Alexei Starovoitov
On Thu, Mar 22, 2018 at 10:33:43AM +0100, Ingo Molnar wrote: > > - I think the BPF JIT, whose byte code machine languge is used by an >increasing number of kernel subsystems, could benefit from having vector > ops. >It would possibly allow the handling of floating point types. this is

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-22 Thread Linus Torvalds
On Thu, Mar 22, 2018 at 5:48 AM, David Laight wrote: > > So if we needed to do PIO reads using the AVX2 (or better AVX-512) > registers would make a significant difference. > Fortunately we can 'dma' most of the data we need to transfer. I think this is the really

RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-22 Thread David Laight
From: David Laight > Sent: 22 March 2018 10:36 ... > Any code would need to be in memcpy_fromio(), not in every driver that > might benefit. > Then fallback code can be used if the registers aren't available. > > > (b) we can't guarantee that %ymm register write will show up on any > > bus as a

RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-22 Thread David Laight
From: Sent: 21 March 2018 18:16 > To: Ingo Molnar ... > All this to do a 32-byte PIO access, with absolutely zero data right > now on what the win is? > > Yes, yes, I can find an Intel white-paper that talks about setting WC > and then using xmm and ymm instructions to write a single 64-byte >

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-22 Thread Ingo Molnar
* Andy Lutomirski wrote: > On Wed, Mar 21, 2018 at 6:32 AM, Ingo Molnar wrote: > > > > * Linus Torvalds wrote: > > > >> And even if you ignore that "maintenance problems down the line" issue > >> ("we can fix them when they

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-22 Thread Ingo Molnar
* Linus Torvalds wrote: > And the real worry is things like AVX-512 etc, which is exactly when > things like "save and restore one ymm register" will quite likely > clear the upper bits of the zmm register. Yeah, I think the only valid save/restore pattern is to

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-21 Thread Linus Torvalds
On Wed, Mar 21, 2018 at 12:46 AM, Ingo Molnar wrote: > > So I added a bit of instrumentation and the current state of things is that on > 64-bit x86 every single task has an initialized FPU, every task has the exact > same, fully filled in xfeatures (XINUSE) value: Bah. Your

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-21 Thread Andy Lutomirski
On Wed, Mar 21, 2018 at 6:32 AM, Ingo Molnar wrote: > > * Linus Torvalds wrote: > >> And even if you ignore that "maintenance problems down the line" issue >> ("we can fix them when they happen") I don't want to see games like >> this, because I'm

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-21 Thread Ingo Molnar
So I poked around a bit and I'm having second thoughts: * Linus Torvalds wrote: > On Tue, Mar 20, 2018 at 1:26 AM, Ingo Molnar wrote: > > > > So assuming the target driver will only load on modern FPUs I *think* it > > should > > actually be

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-21 Thread Ingo Molnar
* Linus Torvalds wrote: > And even if you ignore that "maintenance problems down the line" issue > ("we can fix them when they happen") I don't want to see games like > this, because I'm pretty sure it breaks the optimized xsave by tagging > the state as being

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread Andy Lutomirski
On Tue, Mar 20, 2018 at 3:10 PM, David Laight wrote: > From: Andy Lutomirski >> Sent: 20 March 2018 14:57 > ... >> I'd rather see us finally finish the work that Rik started to rework >> this differently. I'd like kernel_fpu_begin() to look like: >> >> if

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread Linus Torvalds
On Tue, Mar 20, 2018 at 1:26 AM, Ingo Molnar wrote: > > So assuming the target driver will only load on modern FPUs I *think* it > should > actually be possible to do something like (pseudocode): > > vmovdqa %ymm0, 40(%rsp) > vmovdqa %ymm1, 80(%rsp) > >

RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread David Laight
From: Andy Lutomirski > Sent: 20 March 2018 14:57 ... > I'd rather see us finally finish the work that Rik started to rework > this differently. I'd like kernel_fpu_begin() to look like: > > if (test_thread_flag(TIF_NEED_FPU_RESTORE)) { > return; // we're already okay. maybe we need to check

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread Andy Lutomirski
On Tue, Mar 20, 2018 at 8:26 AM, Ingo Molnar wrote: > > * Thomas Gleixner wrote: > >> > Useful also for code that needs AVX-like registers to do things like CRCs. >> >> x86/crypto/ has a lot of AVX optimized code. > > Yeah, that's true, but the crypto code

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread Rahul Lakkireddy
On Monday, March 03/19/18, 2018 at 20:57:22 +0530, Christoph Hellwig wrote: > On Mon, Mar 19, 2018 at 07:50:33PM +0530, Rahul Lakkireddy wrote: > > This series of patches add support for 256-bit IO read and write. > > The APIs are readqq and writeqq (quad quadword - 4 x 64), that read > > and

RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread David Laight
From: Ingo Molnar > Sent: 20 March 2018 10:54 ... > Note that a generic version might still be worth trying out, if and only if > it's > safe to access those vector registers directly: modern x86 CPUs will do their > non-constant memcpy()s via the common memcpy_erms() function - which could in >

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread Ingo Molnar
* Thomas Gleixner wrote: > On Tue, 20 Mar 2018, Ingo Molnar wrote: > > * Thomas Gleixner wrote: > > > > > > So I do think we could do more in this area to improve driver > > > > performance, if the > > > > code is correct and if there's actual

RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread David Laight
From: Thomas Gleixner > Sent: 20 March 2018 09:41 > On Tue, 20 Mar 2018, Ingo Molnar wrote: > > * Thomas Gleixner wrote: ... > > > And if we go down that road then we want a AVX based memcpy() > > > implementation which is runtime conditional on the feature bit(s) and > > >

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread Thomas Gleixner
On Tue, 20 Mar 2018, Ingo Molnar wrote: > * Thomas Gleixner wrote: > > > > So I do think we could do more in this area to improve driver > > > performance, if the > > > code is correct and if there's actual benchmarks that are showing real > > > benefits. > > > > If it's

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread Ingo Molnar
* Thomas Gleixner wrote: > > So I do think we could do more in this area to improve driver performance, > > if the > > code is correct and if there's actual benchmarks that are showing real > > benefits. > > If it's about hotpath performance I'm all for it, but the use

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread Thomas Gleixner
On Tue, 20 Mar 2018, Ingo Molnar wrote: > * Thomas Gleixner wrote: > > > > Useful also for code that needs AVX-like registers to do things like CRCs. > > > > x86/crypto/ has a lot of AVX optimized code. > > Yeah, that's true, but the crypto code is processing fundamentally

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-20 Thread Ingo Molnar
* Thomas Gleixner wrote: > > Useful also for code that needs AVX-like registers to do things like CRCs. > > x86/crypto/ has a lot of AVX optimized code. Yeah, that's true, but the crypto code is processing fundamentally bigger blocks of data, which amortizes the cost of

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-19 Thread Linus Torvalds
On Mon, Mar 19, 2018 at 8:53 AM, David Laight wrote: > > The x87 and SSE registers can't be changed - they can contain callee-saved > registers. > But (IIRC) the AVX and AVX2 registers are all caller-saved. No. The kernel entry is not the usual function call. On kernel

RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-19 Thread David Laight
From: Thomas Gleixner > Sent: 19 March 2018 15:37 ... > > If system call entry reset the AVX registers then any FP save/restore > > would be faster because the AVX registers wouldn't need to be saved > > (and the cpu won't save them). > > I believe the instruction to reset the AVX registers is

RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-19 Thread Thomas Gleixner
On Mon, 19 Mar 2018, David Laight wrote: > From: Thomas Gleixner > > Sent: 19 March 2018 15:05 > > > > On Mon, 19 Mar 2018, David Laight wrote: > > > From: Rahul Lakkireddy > > > In principle it ought to be possible to get access to one or two > > > (eg) AVX registers by saving them to stack and

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-19 Thread Christoph Hellwig
On Mon, Mar 19, 2018 at 07:50:33PM +0530, Rahul Lakkireddy wrote: > This series of patches add support for 256-bit IO read and write. > The APIs are readqq and writeqq (quad quadword - 4 x 64), that read > and write 256-bits at a time from IO, respectively. What a horrible name. please encode

RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-19 Thread David Laight
From: Thomas Gleixner > Sent: 19 March 2018 15:05 > > On Mon, 19 Mar 2018, David Laight wrote: > > From: Rahul Lakkireddy > > In principle it ought to be possible to get access to one or two > > (eg) AVX registers by saving them to stack and telling the fpu > > save code where you've put them. >

RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-19 Thread Thomas Gleixner
On Mon, 19 Mar 2018, David Laight wrote: > From: Rahul Lakkireddy > In principle it ought to be possible to get access to one or two > (eg) AVX registers by saving them to stack and telling the fpu > save code where you've put them. No. We have functions for this and we are not adding new ad hoc

RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-19 Thread David Laight
From: Rahul Lakkireddy > Sent: 19 March 2018 14:21 > > This series of patches add support for 256-bit IO read and write. > The APIs are readqq and writeqq (quad quadword - 4 x 64), that read > and write 256-bits at a time from IO, respectively. Why not use the AVX2 registers to get 512bit