Re: [PATCH 1/2] open: add close_range()

2019-05-21 Thread Matthew Wilcox
On Tue, May 21, 2019 at 08:20:09PM +0100, Al Viro wrote:
> On Tue, May 21, 2019 at 05:30:27PM +0100, David Howells wrote:
> 
> > If we can live with close_from(int first) rather than close_range(), then 
> > this
> > can perhaps be done a lot more efficiently by:
> > 
> > new = alloc_fdtable(first);
> > spin_lock(>file_lock);
> > old = files_fdtable(files);
> > copy_fds(new, old, 0, first - 1);
> > rcu_assign_pointer(files->fdt, new);
> > spin_unlock(>file_lock);
> > clear_fds(old, 0, first - 1);
> > close_fdt_from(old, first);
> > kfree_rcu(old);
> 
> I really hate to think how that would interact with POSIX locks...

POSIX locks store current->files in fl_owner; David's resizing the
underlying files->fdt, just like growing from 64 to 256 fds.


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Matthew Wilcox
On Tue, Feb 19, 2019 at 02:20:26PM +0100, Jan Kara wrote:
> Thanks for information. Yeah, that makes somewhat more sense. Can you ever
> see the failure if you disable CONFIG_TRANSPARENT_HUGEPAGE? Because your
> findings still seem to indicate that there' some problem with page
> migration and Alpha (added MM list to CC).

Could
https://lore.kernel.org/linux-mm/20190219123212.29838-1-lar...@axis.com/T/#u
be relevant?


Re: [PATCH RFC 3/4] barriers: convert a control to a data dependency

2019-01-02 Thread Matthew Wilcox
On Wed, Jan 02, 2019 at 03:57:58PM -0500, Michael S. Tsirkin wrote:
> @@ -875,6 +893,8 @@ to the CPU containing it.  See the section on "Multicopy 
> atomicity"
>  for more information.
>  
>  
> +
> +
>  In summary:
>  
>(*) Control dependencies can order prior loads against later stores.

Was this hunk intentional?


Re: [PATCH 00/32] docs/vm: convert to ReST format

2018-04-13 Thread Matthew Wilcox
On Fri, Apr 13, 2018 at 01:55:51PM -0600, Jonathan Corbet wrote:
> > I believe that keeping the mm docs together will give better visibility of
> > what (little) mm documentation we have and will make the updates easier.
> > The documents that fit well into a certain topic could be linked there. For
> > instance:
> 
> ...but this sounds like just the opposite...?  
> 
> I've had this conversation with folks in a number of subsystems.
> Everybody wants to keep their documentation together in one place - it's
> easier for the developers after all.  But for the readers I think it's
> objectively worse.  It perpetuates the mess that Documentation/ is, and
> forces readers to go digging through all kinds of inappropriate material
> in the hope of finding something that tells them what they need to know.
> 
> So I would *really* like to split the documentation by audience, as has
> been done for a number of other kernel subsystems (and eventually all, I
> hope).
> 
> I can go ahead and apply the RST conversion, that seems like a step in
> the right direction regardless.  But I sure hope we don't really have to
> keep it as an unorganized jumble of stuff...

I've started on Documentation/core-api/memory.rst which covers just
memory allocation.  So far it has the Overview and GFP flags sections
written and an outline for 'The slab allocator', 'The page allocator',
'The vmalloc allocator' and 'The page_frag allocator'.  And typing this
up, I realise we need a 'The percpu allocator'.  I'm thinking that this
is *not* the right document for the DMA memory allocators (although it
should link to that documentation).

I suspect the existing Documentation/vm/ should probably stay as an
unorganised jumble of stuff.  Developers mostly talking to other MM
developers.  Stuff that people outside the MM fraternity should know
about needs to be centrally documented.  By all means convert it to
ReST ... I don't much care, and it may make it easier to steal bits
or link to it from the organised documentation.
--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 0/2] Randomization of address chosen by mmap.

2018-03-23 Thread Matthew Wilcox
On Fri, Mar 23, 2018 at 03:16:21PM -0400, Rich Felker wrote:
> > Huh, I thought libc was aware of this.  Also, I'd expect a libc-based
> > implementation to restrict itself to, eg, only loading libraries in
> > the bottom 1GB to avoid applications who want to map huge things from
> > running out of unfragmented address space.
> 
> That seems like a rather arbitrary expectation and I'm not sure why
> you'd expect it to result in less fragmentation rather than more. For
> example if it started from 1GB and worked down, you'd immediately
> reduce the contiguous free space from ~3GB to ~2GB, and if it started
> from the bottom and worked up, brk would immediately become
> unavailable, increasing mmap pressure elsewhere.

By *not* limiting yourself to the bottom 1GB, you'll almost immediately
fragment the address space even worse.  Just looking at 'ls' as a
hopefully-good example of a typical app, it maps:

linux-vdso.so.1 (0x7ffef5eef000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 
(0x7fb3657f5000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7fb36543b000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x7fb3651c9000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7fb364fc5000)
/lib64/ld-linux-x86-64.so.2 (0x7fb365c3f000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x7fb364da7000)

The VDSO wouldn't move, but look at the distribution of mapping 6 things
into a 3GB address space in random locations.  What are the odds you have
a contiguous 1GB chunk of address space?  If you restrict yourself to the
bottom 1GB before running out of room and falling back to a sequential
allocation, you'll prevent a lot of fragmentation.
--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 0/2] Randomization of address chosen by mmap.

2018-03-23 Thread Matthew Wilcox
On Fri, Mar 23, 2018 at 02:00:24PM -0400, Rich Felker wrote:
> On Fri, Mar 23, 2018 at 05:48:06AM -0700, Matthew Wilcox wrote:
> > On Thu, Mar 22, 2018 at 07:36:36PM +0300, Ilya Smith wrote:
> > > Current implementation doesn't randomize address returned by mmap.
> > > All the entropy ends with choosing mmap_base_addr at the process
> > > creation. After that mmap build very predictable layout of address
> > > space. It allows to bypass ASLR in many cases. This patch make
> > > randomization of address on any mmap call.
> > 
> > Why should this be done in the kernel rather than libc?  libc is perfectly
> > capable of specifying random numbers in the first argument of mmap.
> 
> Generally libc does not have a view of the current vm maps, and thus
> in passing "random numbers", they would have to be uniform across the
> whole vm space and thus non-uniform once the kernel rounds up to avoid
> existing mappings.

I'm aware that you're the musl author, but glibc somehow manages to
provide etext, edata and end, demonstrating that it does know where at
least some of the memory map lies.  Virtually everything after that is
brought into the address space via mmap, which at least glibc intercepts,
so it's entirely possible for a security-conscious libc to know where
other things are in the memory map.  Not to mention that what we're
primarily talking about here are libraries which are dynamically linked
and are loaded by ld.so before calling main(); not dlopen() or even
regular user mmaps.

> Also this would impose requirements that libc be
> aware of the kernel's use of the virtual address space and what's
> available to userspace -- for example, on 32-bit archs whether 2GB,
> 3GB, or full 4GB (for 32-bit-user-on-64-bit-kernel) is available, and
> on 64-bit archs where fewer than the full 64 bits are actually valid
> in addresses, what the actual usable pointer size is. There is
> currently no clean way of conveying this information to userspace.

Huh, I thought libc was aware of this.  Also, I'd expect a libc-based
implementation to restrict itself to, eg, only loading libraries in
the bottom 1GB to avoid applications who want to map huge things from
running out of unfragmented address space.
--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/7] alpha: Add support for memset16

2017-03-24 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Alpha already had an optimised memset-16-bit-quantity assembler routine
called memsetw().  It has a slightly different calling convention
from memset16() in that it takes a byte count, not a count of words.
That's the same convention used by ARM's __memset16(), so rename Alpha's
routine to match and add a memset16() wrapper around it.  Then convert
Alpha's scr_memsetw() to call memset16() instead of memsetw().

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 arch/alpha/include/asm/string.h | 15 ---
 arch/alpha/include/asm/vga.h|  2 +-
 arch/alpha/lib/memset.S | 10 +-
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/alpha/include/asm/string.h b/arch/alpha/include/asm/string.h
index c2911f591704..74c0a693b76b 100644
--- a/arch/alpha/include/asm/string.h
+++ b/arch/alpha/include/asm/string.h
@@ -65,13 +65,14 @@ extern void * memchr(const void *, int, size_t);
aligned values.  The DEST and COUNT parameters must be even for 
correct operation.  */
 
-#define __HAVE_ARCH_MEMSETW
-extern void * __memsetw(void *dest, unsigned short, size_t count);
-
-#define memsetw(s, c, n)\
-(__builtin_constant_p(c)\
- ? __constant_c_memset((s),0x0001000100010001UL*(unsigned short)(c),(n)) \
- : __memsetw((s),(c),(n)))
+#define __HAVE_ARCH_MEMSET16
+extern void * __memset16(void *dest, unsigned short, size_t count);
+static inline void *memset16(uint16_t *p, uint16_t v, size_t n)
+{
+   if (__builtin_constant_p(v))
+   return __constant_c_memset(p, 0x0001000100010001UL * v, n * 2)
+   return __memset16(p, v, n * 2);
+}
 
 #endif /* __KERNEL__ */
 
diff --git a/arch/alpha/include/asm/vga.h b/arch/alpha/include/asm/vga.h
index c00106bac521..3c1c2b6128e7 100644
--- a/arch/alpha/include/asm/vga.h
+++ b/arch/alpha/include/asm/vga.h
@@ -34,7 +34,7 @@ static inline void scr_memsetw(u16 *s, u16 c, unsigned int 
count)
if (__is_ioaddr(s))
memsetw_io((u16 __iomem *) s, c, count);
else
-   memsetw(s, c, count);
+   memset16(s, c, count / 2);
 }
 
 /* Do not trust that the usage will be correct; analyze the arguments.  */
diff --git a/arch/alpha/lib/memset.S b/arch/alpha/lib/memset.S
index 89a26f5e89de..f824969e9e77 100644
--- a/arch/alpha/lib/memset.S
+++ b/arch/alpha/lib/memset.S
@@ -20,7 +20,7 @@
.globl memset
.globl __memset
.globl ___memset
-   .globl __memsetw
+   .globl __memset16
.globl __constant_c_memset
 
.ent ___memset
@@ -110,8 +110,8 @@ EXPORT_SYMBOL(___memset)
 EXPORT_SYMBOL(__constant_c_memset)
 
.align 5
-   .ent __memsetw
-__memsetw:
+   .ent __memset16
+__memset16:
.prologue 0
 
inswl $17,0,$1  /* E0 */
@@ -123,8 +123,8 @@ __memsetw:
or $1,$4,$17/* E0 */
br __constant_c_memset  /* .. E1 */
 
-   .end __memsetw
-EXPORT_SYMBOL(__memsetw)
+   .end __memset16
+EXPORT_SYMBOL(__memset16)
 
 memset = ___memset
 __memset = ___memset
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/7] sym53c8xx_2: Convert to use memset32

2017-03-24 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

memset32() can be used to initialise these three arrays.  Minor code
footprint reduction.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 drivers/scsi/sym53c8xx_2/sym_hipd.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c 
b/drivers/scsi/sym53c8xx_2/sym_hipd.c
index 6b349e301869..b886b10e3499 100644
--- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
+++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
@@ -4985,13 +4985,10 @@ struct sym_lcb *sym_alloc_lcb (struct sym_hcb *np, 
u_char tn, u_char ln)
 *  Compute the bus address of this table.
 */
if (ln && !tp->luntbl) {
-   int i;
-
tp->luntbl = sym_calloc_dma(256, "LUNTBL");
if (!tp->luntbl)
goto fail;
-   for (i = 0 ; i < 64 ; i++)
-   tp->luntbl[i] = cpu_to_scr(vtobus(>badlun_sa));
+   memset32(tp->luntbl, cpu_to_scr(vtobus(>badlun_sa)), 64);
tp->head.luntbl_sa = cpu_to_scr(vtobus(tp->luntbl));
}
 
@@ -5077,8 +5074,7 @@ static void sym_alloc_lcb_tags (struct sym_hcb *np, 
u_char tn, u_char ln)
/*
 *  Initialize the task table with invalid entries.
 */
-   for (i = 0 ; i < SYM_CONF_MAX_TASK ; i++)
-   lp->itlq_tbl[i] = cpu_to_scr(np->notask_ba);
+   memset32(lp->itlq_tbl, cpu_to_scr(np->notask_ba), SYM_CONF_MAX_TASK);
 
/*
 *  Fill up the tag buffer with tag numbers.
@@ -5764,8 +5760,7 @@ int sym_hcb_attach(struct Scsi_Host *shost, struct sym_fw 
*fw, struct sym_nvram
goto attach_failed;
 
np->badlun_sa = cpu_to_scr(SCRIPTB_BA(np, resel_bad_lun));
-   for (i = 0 ; i < 64 ; i++)  /* 64 luns/target, no less */
-   np->badluntbl[i] = cpu_to_scr(vtobus(>badlun_sa));
+   memset32(np->badluntbl, cpu_to_scr(vtobus(>badlun_sa)), 64);
 
/*
 *  Prepare the bus address array that contains the bus 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 7/7] vga: Optimise console scrolling

2017-03-24 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Where possible, call memset16(), memmove() or memcpy() instead of using
open-coded loops.  If an architecture doesn't define VT_BUF_HAVE_RW,
we can do that from the generic code.  For the architectures which do
have special RW routines, usually we can do the special thing (pointer
test or byteswap) once (and then use a mem* call) instead of each time
around a loop.  Alpha is the only architecture missing a scr_memmovew()
definition (because it's non-trivial to write).

I don't like the calling convention that uses a byte count instead of
a count of u16s, but it's a little late to change that.  Reduces code
size of fbcon.o by almost 400 bytes on my laptop build.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 arch/mips/include/asm/vga.h|  6 ++
 arch/powerpc/include/asm/vga.h |  8 
 arch/sparc/include/asm/vga.h   | 24 
 include/linux/vt_buffer.h  | 12 
 4 files changed, 50 insertions(+)

diff --git a/arch/mips/include/asm/vga.h b/arch/mips/include/asm/vga.h
index f82c83749a08..7510f406e1e1 100644
--- a/arch/mips/include/asm/vga.h
+++ b/arch/mips/include/asm/vga.h
@@ -40,9 +40,15 @@ static inline u16 scr_readw(volatile const u16 *addr)
return le16_to_cpu(*addr);
 }
 
+static inline void scr_memsetw(u16 *s, u16 v, unsigned int count)
+{
+   memset16(s, cpu_to_le16(v), count / 2);
+}
+
 #define scr_memcpyw(d, s, c) memcpy(d, s, c)
 #define scr_memmovew(d, s, c) memmove(d, s, c)
 #define VT_BUF_HAVE_MEMCPYW
 #define VT_BUF_HAVE_MEMMOVEW
+#define VT_BUF_HAVE_MEMSETW
 
 #endif /* _ASM_VGA_H */
diff --git a/arch/powerpc/include/asm/vga.h b/arch/powerpc/include/asm/vga.h
index ab3acd2f2786..7a7b541b7493 100644
--- a/arch/powerpc/include/asm/vga.h
+++ b/arch/powerpc/include/asm/vga.h
@@ -33,8 +33,16 @@ static inline u16 scr_readw(volatile const u16 *addr)
return le16_to_cpu(*addr);
 }
 
+#define VT_BUF_HAVE_MEMSETW
+static inline void scr_memsetw(u16 *s, u16 v, unsigned int n)
+{
+   memset16(s, cpu_to_le16(v), n / 2);
+}
+
 #define VT_BUF_HAVE_MEMCPYW
+#define VT_BUF_HAVE_MEMMOVEW
 #define scr_memcpywmemcpy
+#define scr_memmovew   memmove
 
 #endif /* !CONFIG_VGA_CONSOLE && !CONFIG_MDA_CONSOLE */
 
diff --git a/arch/sparc/include/asm/vga.h b/arch/sparc/include/asm/vga.h
index ec0e9967d93d..1fab92b110d9 100644
--- a/arch/sparc/include/asm/vga.h
+++ b/arch/sparc/include/asm/vga.h
@@ -11,6 +11,9 @@
 #include 
 
 #define VT_BUF_HAVE_RW
+#define VT_BUF_HAVE_MEMSETW
+#define VT_BUF_HAVE_MEMCPYW
+#define VT_BUF_HAVE_MEMMOVEW
 
 #undef scr_writew
 #undef scr_readw
@@ -29,6 +32,27 @@ static inline u16 scr_readw(const u16 *addr)
return *addr;
 }
 
+static inline void scr_memsetw(u16 *p, u16 v, unsigned int n)
+{
+   BUG_ON((long) p >= 0);
+
+   memset16(s, cpu_to_le16(v), n / 2);
+}
+
+static inline void scr_memcpyw(u16 *d, u16 *s, unsigned int n)
+{
+   BUG_ON((long) d >= 0);
+
+   memcpy(d, s, n);
+}
+
+static inline void scr_memmovew(u16 *d, u16 *s, unsigned int n)
+{
+   BUG_ON((long) d >= 0);
+
+   memmove(d, s, n);
+}
+
 #define VGA_MAP_MEM(x,s) (x)
 
 #endif
diff --git a/include/linux/vt_buffer.h b/include/linux/vt_buffer.h
index f38c10ba3ff5..31b92fcd8f03 100644
--- a/include/linux/vt_buffer.h
+++ b/include/linux/vt_buffer.h
@@ -26,24 +26,33 @@
 #ifndef VT_BUF_HAVE_MEMSETW
 static inline void scr_memsetw(u16 *s, u16 c, unsigned int count)
 {
+#ifdef VT_BUF_HAVE_RW
count /= 2;
while (count--)
scr_writew(c, s++);
+#else
+   memset16(s, c, count / 2);
+#endif
 }
 #endif
 
 #ifndef VT_BUF_HAVE_MEMCPYW
 static inline void scr_memcpyw(u16 *d, const u16 *s, unsigned int count)
 {
+#ifdef VT_BUF_HAVE_RW
count /= 2;
while (count--)
scr_writew(scr_readw(s++), d++);
+#else
+   memcpy(d, s, count);
+#endif
 }
 #endif
 
 #ifndef VT_BUF_HAVE_MEMMOVEW
 static inline void scr_memmovew(u16 *d, const u16 *s, unsigned int count)
 {
+#ifdef VT_BUF_HAVE_RW
if (d < s)
scr_memcpyw(d, s, count);
else {
@@ -53,6 +62,9 @@ static inline void scr_memmovew(u16 *d, const u16 *s, 
unsigned int count)
while (count--)
scr_writew(scr_readw(--s), --d);
}
+#else
+   memmove(d, s, count);
+#endif
 }
 #endif
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/7] Add memsetN functions

2017-03-24 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

zram was recently enhanced to support compressing pages with a repeating
pattern up to the size of an unsigned long.  As part of the discussion,
we noted it would be nice if architectures had optimised routines
to fill regions of memory with patterns larger than those contained
in a single byte.  Our suspicions were right; the x86 version offers
approximately a 7% performance improvement over the C implementation.

The generic memfill() function is part of Lars Wirzenius' publib,
but it doesn't offer the most convenient interface.  I chose to add
five more-specific functions as part of this patchset -- memset16(),
memset32(), memset64(), memset_l() (long) and memset_p() (pointer).

It would be nice to have some more architectures implement optimised
memsetN calls.  It would also be nice to find more places in the kernel
which could benefit from calling these functions.  Maybe a coccinelle
script could be written to find such places?  We're looking for loops
over an array where the value being stored into the array does not depend
on the iteration variable.

Since v1 of the patchset, I stumbled on Alpha's memsetw() which
caused me to add memset16() to complete the set.  I removed the
'__HAVE_ARCH_MEMSET_PLUS' preprocessor symbol in favour of separate
MEMSET16 MEMSET32 and MEMSET64 symbols.  I also reviewed the scr_mem*w()
usages across the different architectures and implemented some obvious
missing optimisations.  Alpha is still missing scr_memmovew() as it
would be non-trivial to write.

Russell's review on patch 2 only applies to the memset32/memset64
implementation.  The memset16 is unreviewed (and, indeed, untested)
to date.

Matthew Wilcox (7):
  Add multibyte memset functions
  ARM: Implement memset16, memset32 & memset64
  x86: Implement memset16, memset32 & memset64
  alpha: Add support for memset16
  zram: Convert to using memset_l
  sym53c8xx_2: Convert to use memset32
  vga: Optimise console scrolling

 arch/alpha/include/asm/string.h | 15 
 arch/alpha/include/asm/vga.h|  2 +-
 arch/alpha/lib/memset.S | 10 +++---
 arch/arm/include/asm/string.h   | 21 
 arch/arm/kernel/armksyms.c  |  3 ++
 arch/arm/lib/memset.S   | 44 +++-
 arch/mips/include/asm/vga.h |  6 
 arch/powerpc/include/asm/vga.h  |  8 +
 arch/sparc/include/asm/vga.h| 24 +
 arch/x86/include/asm/string_32.h| 24 +
 arch/x86/include/asm/string_64.h| 36 
 drivers/block/zram/zram_drv.c   | 15 ++--
 drivers/scsi/sym53c8xx_2/sym_hipd.c | 11 ++
 include/linux/string.h  | 30 
 include/linux/vt_buffer.h   | 12 +++
 lib/string.c| 68 +
 16 files changed, 287 insertions(+), 42 deletions(-)

-- 
2.11.0
--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/7] zram: Convert to using memset_l

2017-03-24 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

zram was the motivation for creating memset_l().  Minchan Kim sees a 7%
performance improvement on x86 with 100MB of non-zero deduplicatable
data:

perf stat -r 10 dd if=/dev/zram0 of=/dev/null

vanilla:0.232050465 seconds time elapsed ( +-  0.51% )
memset_l:   0.217219387 seconds time elapsed ( +-  0.07% )

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
Tested-by: Minchan Kim <minc...@kernel.org>
---
 drivers/block/zram/zram_drv.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e27d89a36c34..25dcad309695 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -157,20 +157,11 @@ static inline void update_used_max(struct zram *zram,
} while (old_max != cur_max);
 }
 
-static inline void zram_fill_page(char *ptr, unsigned long len,
+static inline void zram_fill_page(void *ptr, unsigned long len,
unsigned long value)
 {
-   int i;
-   unsigned long *page = (unsigned long *)ptr;
-
WARN_ON_ONCE(!IS_ALIGNED(len, sizeof(unsigned long)));
-
-   if (likely(value == 0)) {
-   memset(ptr, 0, len);
-   } else {
-   for (i = 0; i < len / sizeof(*page); i++)
-   page[i] = value;
-   }
+   memset_l(ptr, value, len / sizeof(unsigned long));
 }
 
 static bool page_same_filled(void *ptr, unsigned long *element)
@@ -193,7 +184,7 @@ static bool page_same_filled(void *ptr, unsigned long 
*element)
 static void handle_same_page(struct bio_vec *bvec, unsigned long element)
 {
struct page *page = bvec->bv_page;
-   void *user_mem;
+   char *user_mem;
 
user_mem = kmap_atomic(page);
zram_fill_page(user_mem + bvec->bv_offset, bvec->bv_len, element);
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/7] x86: Implement memset16, memset32 & memset64

2017-03-24 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

These are single instructions on x86.  There's no 64-bit instruction
for x86-32, but we don't yet have any user for memset64() on 32-bit
architectures, so don't bother to implement it.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 arch/x86/include/asm/string_32.h | 24 
 arch/x86/include/asm/string_64.h | 36 
 2 files changed, 60 insertions(+)

diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index 3d3e8353ee5c..84da91fe13ac 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -331,6 +331,30 @@ void *__constant_c_and_count_memset(void *s, unsigned long 
pattern,
 : __memset((s), (c), (count)))
 #endif
 
+#define __HAVE_ARCH_MEMSET16
+static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
+{
+   int d0, d1;
+   asm volatile("rep\n\t"
+"stosw"
+: "=" (d0), "=" (d1)
+: "a" (v), "1" (s), "0" (n)
+: "memory");
+   return s;
+}
+
+#define __HAVE_ARCH_MEMSET_32
+static inline void *memset32(uint32_t *s, uint32_t v, size_t n)
+{
+   int d0, d1;
+   asm volatile("rep\n\t"
+"stosl"
+: "=" (d0), "=" (d1)
+: "a" (v), "1" (s), "0" (n)
+: "memory");
+   return s;
+}
+
 /*
  * find the first occurrence of byte 'c', or 1 past the area if none
  */
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index a164862d77e3..71c5e860c7da 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -56,6 +56,42 @@ extern void *__memcpy(void *to, const void *from, size_t 
len);
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+#define __HAVE_ARCH_MEMSET16
+static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
+{
+   long d0, d1;
+   asm volatile("rep\n\t"
+"stosw"
+: "=" (d0), "=" (d1)
+: "a" (v), "1" (s), "0" (n)
+: "memory");
+   return s;
+}
+
+#define __HAVE_ARCH_MEMSET32
+static inline void *memset32(uint32_t *s, uint32_t v, size_t n)
+{
+   long d0, d1;
+   asm volatile("rep\n\t"
+"stosl"
+: "=" (d0), "=" (d1)
+: "a" (v), "1" (s), "0" (n)
+: "memory");
+   return s;
+}
+
+#define __HAVE_ARCH_MEMSET64
+static inline void *memset64(uint64_t *s, uint64_t v, size_t n)
+{
+   long d0, d1;
+   asm volatile("rep\n\t"
+"stosq"
+: "=" (d0), "=" (d1)
+: "a" (v), "1" (s), "0" (n)
+: "memory");
+   return s;
+}
+
 #define __HAVE_ARCH_MEMMOVE
 void *memmove(void *dest, const void *src, size_t count);
 void *__memmove(void *dest, const void *src, size_t count);
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/7] Add multibyte memset functions

2017-03-24 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

memset16(), memset32() and memset64() are like memset(), but allow the
caller to fill the destination with a multibyte pattern.  memset_l()
and memset_p() allow the caller to use unsigned long and pointer
values respectively.  memset64() is currently only available on 64-bit
architectures.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 include/linux/string.h | 30 ++
 lib/string.c   | 68 ++
 2 files changed, 98 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index 26b6f6a66f83..b376875b650c 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -99,6 +99,36 @@ extern __kernel_size_t strcspn(const char *,const char *);
 #ifndef __HAVE_ARCH_MEMSET
 extern void * memset(void *,int,__kernel_size_t);
 #endif
+
+#ifndef __HAVE_ARCH_MEMSET16
+extern void *memset16(uint16_t *, uint16_t, __kernel_size_t);
+#endif
+
+#ifndef __HAVE_ARCH_MEMSET32
+extern void *memset32(uint32_t *, uint32_t, __kernel_size_t);
+#endif
+
+#ifndef __HAVE_ARCH_MEMSET64
+extern void *memset64(uint64_t *, uint64_t, __kernel_size_t);
+#endif
+
+static inline void *memset_l(unsigned long *p, unsigned long v,
+   __kernel_size_t n)
+{
+   if (BITS_PER_LONG == 32)
+   return memset32((uint32_t *)p, v, n);
+   else
+   return memset64((uint64_t *)p, v, n);
+}
+
+static inline void *memset_p(void **p, void *v, __kernel_size_t n)
+{
+   if (BITS_PER_LONG == 32)
+   return memset32((uint32_t *)p, (uintptr_t)v, n);
+   else
+   return memset64((uint64_t *)p, (uintptr_t)v, n);
+}
+
 #ifndef __HAVE_ARCH_MEMCPY
 extern void * memcpy(void *,const void *,__kernel_size_t);
 #endif
diff --git a/lib/string.c b/lib/string.c
index ed83562a53ae..f18ba402e503 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -697,6 +697,74 @@ void memzero_explicit(void *s, size_t count)
 }
 EXPORT_SYMBOL(memzero_explicit);
 
+#ifndef __HAVE_ARCH_MEMSET16
+/**
+ * memset16() - Fill a memory area with a uint16_t
+ * @s: Pointer to the start of the area.
+ * @v: The value to fill the area with
+ * @count: The number of values to store
+ *
+ * Differs from memset() in that it fills with a uint16_t instead
+ * of a byte.  Remember that @count is the number of uint16_ts to
+ * store, not the number of bytes.
+ */
+void *memset16(uint16_t *s, uint16_t v, size_t count)
+{
+   uint16_t *xs = s;
+
+   while (count--)
+   *xs++ = v;
+   return s;
+}
+EXPORT_SYMBOL(memset16);
+#endif
+
+#ifndef __HAVE_ARCH_MEMSET32
+/**
+ * memset32() - Fill a memory area with a uint32_t
+ * @s: Pointer to the start of the area.
+ * @v: The value to fill the area with
+ * @count: The number of values to store
+ *
+ * Differs from memset() in that it fills with a uint32_t instead
+ * of a byte.  Remember that @count is the number of uint32_ts to
+ * store, not the number of bytes.
+ */
+void *memset32(uint32_t *s, uint32_t v, size_t count)
+{
+   uint32_t *xs = s;
+
+   while (count--)
+   *xs++ = v;
+   return s;
+}
+EXPORT_SYMBOL(memset32);
+#endif
+
+#ifndef __HAVE_ARCH_MEMSET64
+#if BITS_PER_LONG > 32
+/**
+ * memset64() - Fill a memory area with a uint64_t
+ * @s: Pointer to the start of the area.
+ * @v: The value to fill the area with
+ * @count: The number of values to store
+ *
+ * Differs from memset() in that it fills with a uint64_t instead
+ * of a byte.  Remember that @count is the number of uint64_ts to
+ * store, not the number of bytes.
+ */
+void *memset64(uint64_t *s, uint64_t v, size_t count)
+{
+   uint64_t *xs = s;
+
+   while (count--)
+   *xs++ = v;
+   return s;
+}
+EXPORT_SYMBOL(memset64);
+#endif
+#endif
+
 #ifndef __HAVE_ARCH_MEMCPY
 /**
  * memcpy - Copy one area of memory to another
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html