Re: Optimised memset64/memset32 for powerpc

2017-03-27 Thread Naveen N. Rao
On 2017/03/22 12:30PM, Matthew Wilcox wrote: > On Wed, Mar 22, 2017 at 06:18:05AM -0700, Matthew Wilcox wrote: > > There's one other potential user I've been wondering about, which are the > > various console drivers. They use 'memsetw' to blank the entire console > > or lines of the console when

Re: Optimised memset64/memset32 for powerpc

2017-03-22 Thread Matthew Wilcox
On Wed, Mar 22, 2017 at 06:18:05AM -0700, Matthew Wilcox wrote: > There's one other potential user I've been wondering about, which are the > various console drivers. They use 'memsetw' to blank the entire console > or lines of the console when scrolling, but the only architecture which > ever bot

Re: Optimised memset64/memset32 for powerpc

2017-03-22 Thread Matthew Wilcox
On Wed, Mar 22, 2017 at 08:26:12AM +1100, Benjamin Herrenschmidt wrote: > On Tue, 2017-03-21 at 06:29 -0700, Matthew Wilcox wrote: > > > > Well, those are the generic versions in the first patch: > > > > http://git.infradead.org/users/willy/linux-dax.git/commitdiff/538b977 > > 6ac925199969bd5af4e

Re: Optimised memset64/memset32 for powerpc

2017-03-21 Thread Benjamin Herrenschmidt
On Tue, 2017-03-21 at 06:29 -0700, Matthew Wilcox wrote: > > Well, those are the generic versions in the first patch: > > http://git.infradead.org/users/willy/linux-dax.git/commitdiff/538b977 > 6ac925199969bd5af4e994da776d461e7 > > so if those are good enough for you guys, there's no need for yo

Re: Optimised memset64/memset32 for powerpc

2017-03-21 Thread Segher Boessenkool
On Tue, Mar 21, 2017 at 06:29:10AM -0700, Matthew Wilcox wrote: > > Unrolling the loop could help a bit on old powerpc32s that don't have branch > > units, but on those processors the main driver is the time spent to do the > > effective write to memory, and the operations necessary to unroll the l

Re: Optimised memset64/memset32 for powerpc

2017-03-21 Thread Matthew Wilcox
On Tue, Mar 21, 2017 at 01:23:36PM +0100, Christophe LEROY wrote: > > It doesn't look free for you as you only store one register each time > > around the loop in the 32-bit memset implementation: > > > > 1: stwur4,4(r6) > > bdnz1b > > > > (wouldn't you get better performance

Re: Optimised memset64/memset32 for powerpc

2017-03-21 Thread Christophe LEROY
Hi Matthew Le 20/03/2017 à 22:14, Matthew Wilcox a écrit : I recently introduced memset32() / memset64(). I've done implementations for x86 & ARM; akpm has agreed to take the patchset through his tree. Do you fancy doing a powerpc version? Minchan Kim got a 7% performance increase with zram fr

Re: Optimised memset64/memset32 for powerpc

2017-03-20 Thread Benjamin Herrenschmidt
On Mon, 2017-03-20 at 14:14 -0700, Matthew Wilcox wrote: > I recently introduced memset32() / memset64().  I've done implementations > for x86 & ARM; akpm has agreed to take the patchset through his tree. > Do you fancy doing a powerpc version?  Minchan Kim got a 7% performance > increase with zram

Optimised memset64/memset32 for powerpc

2017-03-20 Thread Matthew Wilcox
I recently introduced memset32() / memset64(). I've done implementations for x86 & ARM; akpm has agreed to take the patchset through his tree. Do you fancy doing a powerpc version? Minchan Kim got a 7% performance increase with zram from switching to the optimised version on x86. Here's the deve