from:"Eric B Munson"

[PATCH v9 2/6] mm: mlock: Add new mlock system call

2015-09-08 Thread Eric B Munson

With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added.  mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.

Signed-off-by: Eric B Munson <emun...@akamai.com>
Acked-by: Vlastimil Babka <vba...@suse.cz>
Acked-by: Michal Hocko <mho...@suse.com>
Cc: Michal Hocko <mho...@suse.cz>
Cc: Vlastimil Babka <vba...@suse.cz>
Cc: Heiko Carstens <heiko.carst...@de.ibm.com>
Cc: Geert Uytterhoeven <ge...@linux-m68k.org>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Stephen Rothwell <s...@canb.auug.org.au>
Cc: Guenter Roeck <li...@roeck-us.net>
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: adi-buildroot-de...@lists.sourceforge.net
Cc: linux-cris-ker...@axis.com
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-am33-l...@redhat.com
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
---
Changes from V8:
* Update x86[_64] syscall numbers to follow the new userfaultfd syscalls

 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 include/linux/syscalls.h   | 2 ++
 include/uapi/asm-generic/unistd.h  | 4 +++-
 kernel/sys_ni.c| 1 +
 mm/mlock.c | 8 
 6 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 477bfa6..41e72a5 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -381,3 +381,4 @@
 372i386recvmsg sys_recvmsg 
compat_sys_recvmsg
 373i386shutdownsys_shutdown
 374i386userfaultfd sys_userfaultfd
+375i386mlock2  sys_mlock2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 81c4906..2366900 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -330,6 +330,7 @@
 321common  bpf sys_bpf
 32264  execveatstub_execveat
 323common  userfaultfd sys_userfaultfd
+324common  mlock2  sys_mlock2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 0800131..890632c 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -885,4 +885,6 @@ asmlinkage long sys_execveat(int dfd, const char __user 
*filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
 
+asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
+
 #endif
diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index e016bd9..14a6013 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
 __SYSCALL(__NR_bpf, sys_bpf)
 #define __NR_execveat 281
 __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_mlock2 282
+__SYSCALL(__NR_mlock2, sys_mlock2)
 
 #undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 03c3875..8de5b26 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -194,6 +194,7 @@ cond_syscall(sys_mlock);
 cond_syscall(sys_munlock);
 cond_syscall(sys_mlockall);
 cond_syscall(sys_munlockall);
+cond_syscall(sys_mlock2);
 cond_syscall(sys_mincore);
 cond_syscall(sys_madvise);
 cond_syscall(sys_mremap);
diff --git a/mm/mlock.c b/mm/mlock.c
index c32ad8f..a23a533 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -644,6 +644,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
 }
 
+SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
+{
+   if (flags)
+   return -EINVAL;
+
+   return do_mlock(start, len, VM_LOCKED);
+}
+
 SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
 {
int ret;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v9 0/6] Allow user to request memory to be locked on page fault

2015-09-08 Thread Eric B Munson

 that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise mlock(MLOCK_ONFAULT) is significantly faster.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).  The following are the average
values across 20 runs of stream and 10 runs of kernbench after a warmup
run whose results were discarded.

Avg throughput in MB/s from stream using 100 element arrays
Test 4.2-rc1  4.2-rc1+lock-on-fault
Copy:10,566.5 10,421
Scale:   10,685   10,503.5
Add: 12,044.1 11,814.2
Triad:   12,064.8 11,846.3

Kernbench optimal load
 4.2-rc1  4.2-rc1+lock-on-fault
Elapsed Time 78.453   78.991
User Time64.2395  65.2355
System Time  9.7335   9.7085
Context Switches 22211.5  22412.1
Sleeps   14965.3  14956.1

---
Changes from V8:
* Do not expose VM_LOCKONFAULT flag to rmap code
* Rebase on top of userfaultfd code

Changes from V7:
* Do not expose the VM_LOCKONFAULT flag to userspace via proc
* Fix mlock2 self tests

Changes from V6:
* Bump the x86 system call number to avoid collision with userfaultfd
* Fix FOLL_POPULATE and FOLL_MLOCK usage when mmap is called with
 MAP_POPULATE
* Add documentation for the proc smaps change
* checkpatch fixes

Changes from V5:
Drop MLOCK_LOCKED flag
* MLOCK_ONFAULT and MCL_ONFAULT are treated as a modifier to other locking
 operations, mirroring the relationship between VM_LOCKED and
 VM_LOCKONFAULT
* Drop mmap flag and related tests
* Fix clearing of MCL_CURRENT when mlockall is called with MCL_FUTURE,
 mlockall behavoir now matches the old behavior WRT to ordering

Changes from V4:
Drop all architectures for new sys call entries except x86[_64] and MIPS
Drop munlock2 and munlockall2
Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
Adjust tests to match

Changes from V3:
Ensure that pages present when mlock2(MLOCK_ONFAULT) is called are locked
Ensure that VM_LOCKONFAULT is handled in cases that used to only check VM_LOCKED
Add tests for new system calls
Add missing syscall entries, fix NR_syscalls on multiple arch's
Add missing MAP_LOCKONFAULT for tile

Changes from V2:
Added new system calls for mlock, munlock, and munlockall with added
flags arguments for controlling how memory is locked or unlocked.


Eric B Munson (6):
  mm: mlock: Refactor mlock, munlock, and munlockall code
  mm: mlock: Add new mlock system call
  mm: Introduce VM_LOCKONFAULT
  mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage
  selftests: vm: Add tests for lock on fault
  mips: Add entry for new mlock2 syscall

 arch/alpha/include/uapi/asm/mman.h  |   3 +
 arch/mips/include/uapi/asm/mman.h   |   6 +
 arch/mips/include/uapi/asm/unistd.h |  15 +-
 arch/mips/kernel/scall32-o32.S  |   1 +
 arch/mips/kernel/scall64-64.S   |   1 +
 arch/mips/kernel/scall64-n32.S  |   1 +
 arch/mips/kernel/scall64-o32.S  |   1 +
 arch/parisc/include/uapi/asm/mman.h |   3 +
 arch/powerpc/include/uapi/asm/mman.h|   1 +
 arch/sparc/include/uapi/asm/mman.h  |   1 +
 arch/tile/include/uapi/asm/mman.h   |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl  |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl  |   1 +
 arch/xtensa/include/uapi/asm/mman.h |   6 +
 include/linux/mm.h  |   5 +
 include/linux/syscalls.h|   2 +
 include/uapi/asm-generic/mman-common.h  |   5 +
 include/uapi/asm-generic/mman.h |   1 +
 include/uapi/asm-generic/unistd.h   |   4 +-
 kernel/fork.c   |   3 +-
 kernel/sys_ni.c |   1 +
 mm/debug.c  |   1 +
 mm/gup.c|  10 +-
 mm/huge_memory.c|   2 +-
 mm/hugetlb.c|   4 +-
 mm/mlock.c  |  86 +++-
 mm/mmap.c   |   2 +-
 tools/testing/selftests/vm/Makefile |   2 +
 tools/testing/selftests/vm/mlock2-tests.c   | 737 
 tools/testing/selftests/vm/on-fault-limit.c |  47 ++
 tools/testing/selftests/vm/run_vmtests  |  22 +
 31 files changed, 938 insertions(+), 38 deletions(-)
 create mode 100644 tools/testing/selftests/vm/mlock2-tests.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan <shua...@osg.samsung.com>
Cc: Michal Hocko <mho...@suse.cz>
Cc: Michael Kerrisk <mtk.manpa...@gmail.com>
Cc: Vlastimil Babka <vba...@suse.cz>
Cc: Jonathan Corbet <cor...@lwn.net>
Cc: Ralf Baechle <r...@linux-mips.org>
Cc: Andrea Arcangeli <aarca...@redhat.com>
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.oz

[PATCH v9 4/6] mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage

2015-09-08 Thread Eric B Munson

The previous patch introduced a flag that specified pages in a VMA
should be placed on the unevictable LRU, but they should not be made
present when the area is created.  This patch adds the ability to set
this state via the new mlock system calls.

We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall
flags.  When used with MCL_CURRENT, all current mappings will be marked
with VM_LOCKED | VM_LOCKONFAULT.  When used with MCL_FUTURE, the
mm->def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.  When used
with both MCL_CURRENT and MCL_FUTURE, all current mappings and
mm->def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.

Prior to this patch, mlockall() will unconditionally clear the
mm->def_flags any time it is called without MCL_FUTURE.  This behavior
is maintained after adding MCL_ONFAULT.  If a call to
mlockall(MCL_FUTURE) is followed by mlockall(MCL_CURRENT), the
mm->def_flags will be cleared and new VMAs will be unlocked.  This
remains true with or without MCL_ONFAULT in either mlockall()
invocation.

munlock() will unconditionally clear both vma flags.  munlockall()
unconditionally clears for VMA flags on all VMAs and in the
mm->def_flags field.

Signed-off-by: Eric B Munson <emun...@akamai.com>
Acked-by: Vlastimil Babka <vba...@suse.cz>
Acked-by: Michal Hocko <mho...@suse.com>
Cc: Michal Hocko <mho...@suse.cz>
Cc: Vlastimil Babka <vba...@suse.cz>
Cc: Jonathan Corbet <cor...@lwn.net>
Cc: "Kirill A. Shutemov" <kir...@shutemov.name>
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/alpha/include/uapi/asm/mman.h |  3 ++
 arch/mips/include/uapi/asm/mman.h  |  6 
 arch/parisc/include/uapi/asm/mman.h|  3 ++
 arch/powerpc/include/uapi/asm/mman.h   |  1 +
 arch/sparc/include/uapi/asm/mman.h |  1 +
 arch/tile/include/uapi/asm/mman.h  |  1 +
 arch/xtensa/include/uapi/asm/mman.h|  6 
 include/uapi/asm-generic/mman-common.h |  5 
 include/uapi/asm-generic/mman.h|  1 +
 mm/mlock.c | 52 +-
 10 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..f2f9496 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -37,6 +37,9 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ONFAULT32768   /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index cfcb876..97c03f4 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -61,6 +61,12 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 294d251..ecc3ae1 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -31,6 +31,9 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..03c06ba 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 
 #define MCL_CURRENT 0x2000

[PATCH v8 4/6] mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage

2015-08-26 Thread Eric B Munson

The previous patch introduced a flag that specified pages in a VMA
should be placed on the unevictable LRU, but they should not be made
present when the area is created.  This patch adds the ability to set
this state via the new mlock system calls.

We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall
flags.  When used with MCL_CURRENT, all current mappings will be marked
with VM_LOCKED | VM_LOCKONFAULT.  When used with MCL_FUTURE, the
mm-def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.  When used
with both MCL_CURRENT and MCL_FUTURE, all current mappings and
mm-def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.

Prior to this patch, mlockall() will unconditionally clear the
mm-def_flags any time it is called without MCL_FUTURE.  This behavior
is maintained after adding MCL_ONFAULT.  If a call to
mlockall(MCL_FUTURE) is followed by mlockall(MCL_CURRENT), the
mm-def_flags will be cleared and new VMAs will be unlocked.  This
remains true with or without MCL_ONFAULT in either mlockall()
invocation.

munlock() will unconditionally clear both vma flags.  munlockall()
unconditionally clears for VMA flags on all VMAs and in the
mm-def_flags field.

Signed-off-by: Eric B Munson emun...@akamai.com
Acked-by: Vlastimil Babka vba...@suse.cz
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Jonathan Corbet cor...@lwn.net
Cc: Kirill A. Shutemov kir...@shutemov.name
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/alpha/include/uapi/asm/mman.h |  3 ++
 arch/mips/include/uapi/asm/mman.h  |  6 
 arch/parisc/include/uapi/asm/mman.h|  3 ++
 arch/powerpc/include/uapi/asm/mman.h   |  1 +
 arch/sparc/include/uapi/asm/mman.h |  1 +
 arch/tile/include/uapi/asm/mman.h  |  1 +
 arch/xtensa/include/uapi/asm/mman.h|  6 
 include/uapi/asm-generic/mman-common.h |  5 
 include/uapi/asm-generic/mman.h|  1 +
 mm/mlock.c | 52 +-
 10 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..f2f9496 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -37,6 +37,9 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ONFAULT32768   /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index cfcb876..97c03f4 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -61,6 +61,12 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 294d251..ecc3ae1 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -31,6 +31,9 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..03c06ba 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space

[PATCH v8 0/6] Allow user to request memory to be locked on page fault

2015-08-26 Thread Eric B Munson

 mlock(MLOCK_ONFAULT) is significantly faster.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).  The following are the average
values across 20 runs of stream and 10 runs of kernbench after a warmup
run whose results were discarded.

Avg throughput in MB/s from stream using 100 element arrays
Test 4.2-rc1  4.2-rc1+lock-on-fault
Copy:10,566.5 10,421
Scale:   10,685   10,503.5
Add: 12,044.1 11,814.2
Triad:   12,064.8 11,846.3

Kernbench optimal load
 4.2-rc1  4.2-rc1+lock-on-fault
Elapsed Time 78.453   78.991
User Time64.2395  65.2355
System Time  9.7335   9.7085
Context Switches 22211.5  22412.1
Sleeps   14965.3  14956.1

---
Changes from V7:
* Do not expose the VM_LOCKONFAULT flag to userspace via proc
* Fix mlock2 self tests

Changes from V6:
* Bump the x86 system call number to avoid collision with userfaultfd
* Fix FOLL_POPULATE and FOLL_MLOCK usage when mmap is called with
 MAP_POPULATE
* Add documentation for the proc smaps change
* checkpatch fixes

Changes from V5:
Drop MLOCK_LOCKED flag
* MLOCK_ONFAULT and MCL_ONFAULT are treated as a modifier to other locking
 operations, mirroring the relationship between VM_LOCKED and
 VM_LOCKONFAULT
* Drop mmap flag and related tests
* Fix clearing of MCL_CURRENT when mlockall is called with MCL_FUTURE,
 mlockall behavoir now matches the old behavior WRT to ordering

Changes from V4:
Drop all architectures for new sys call entries except x86[_64] and MIPS
Drop munlock2 and munlockall2
Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
Adjust tests to match

Changes from V3:
Ensure that pages present when mlock2(MLOCK_ONFAULT) is called are locked
Ensure that VM_LOCKONFAULT is handled in cases that used to only check VM_LOCKED
Add tests for new system calls
Add missing syscall entries, fix NR_syscalls on multiple arch's
Add missing MAP_LOCKONFAULT for tile

Changes from V2:
Added new system calls for mlock, munlock, and munlockall with added
flags arguments for controlling how memory is locked or unlocked.

Eric B Munson (6):
  mm: mlock: Refactor mlock, munlock, and munlockall code
  mm: mlock: Add new mlock system call
  mm: Introduce VM_LOCKONFAULT
  mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage
  selftests: vm: Add tests for lock on fault
  mips: Add entry for new mlock2 syscall

 arch/alpha/include/uapi/asm/mman.h  |   3 +
 arch/mips/include/uapi/asm/mman.h   |   6 +
 arch/mips/include/uapi/asm/unistd.h |  15 +-
 arch/mips/kernel/scall32-o32.S  |   1 +
 arch/mips/kernel/scall64-64.S   |   1 +
 arch/mips/kernel/scall64-n32.S  |   1 +
 arch/mips/kernel/scall64-o32.S  |   1 +
 arch/parisc/include/uapi/asm/mman.h |   3 +
 arch/powerpc/include/uapi/asm/mman.h|   1 +
 arch/sparc/include/uapi/asm/mman.h  |   1 +
 arch/tile/include/uapi/asm/mman.h   |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl  |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl  |   1 +
 arch/xtensa/include/uapi/asm/mman.h |   6 +
 include/linux/mm.h  |   5 +
 include/linux/syscalls.h|   2 +
 include/uapi/asm-generic/mman-common.h  |   5 +
 include/uapi/asm-generic/mman.h |   1 +
 include/uapi/asm-generic/unistd.h   |   4 +-
 kernel/fork.c   |   2 +-
 kernel/sys_ni.c |   1 +
 mm/debug.c  |   1 +
 mm/gup.c|  10 +-
 mm/huge_memory.c|   2 +-
 mm/hugetlb.c|   4 +-
 mm/mlock.c  |  86 +++-
 mm/mmap.c   |   2 +-
 mm/rmap.c   |   6 +-
 tools/testing/selftests/vm/Makefile |   2 +
 tools/testing/selftests/vm/mlock2-tests.c   | 737 
 tools/testing/selftests/vm/on-fault-limit.c |  47 ++
 tools/testing/selftests/vm/run_vmtests  |  22 +
 32 files changed, 941 insertions(+), 40 deletions(-)
 create mode 100644 tools/testing/selftests/vm/mlock2-tests.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan shua...@osg.samsung.com
Cc: Michal Hocko mho...@suse.cz
Cc: Michael Kerrisk mtk.manpa...@gmail.com
Cc: Vlastimil Babka vba...@suse.cz
Cc: Jonathan Corbet cor...@lwn.net
Cc: Ralf Baechle r...@linux-mips.org
Cc: Andrea Arcangeli aarca...@redhat.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org

-- 
1.9.1

___
Linuxppc-dev

[PATCH v8 2/6] mm: mlock: Add new mlock system call

2015-08-26 Thread Eric B Munson

With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added.  mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.

Signed-off-by: Eric B Munson emun...@akamai.com
Acked-by: Vlastimil Babka vba...@suse.cz
Acked-by: Michal Hocko mho...@suse.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Geert Uytterhoeven ge...@linux-m68k.org
Cc: Catalin Marinas catalin.mari...@arm.com
Cc: Stephen Rothwell s...@canb.auug.org.au
Cc: Guenter Roeck li...@roeck-us.net
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: adi-buildroot-de...@lists.sourceforge.net
Cc: linux-cris-ker...@axis.com
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-am33-l...@redhat.com
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 include/linux/syscalls.h   | 2 ++
 include/uapi/asm-generic/unistd.h  | 4 +++-
 kernel/sys_ni.c| 1 +
 mm/mlock.c | 8 
 6 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..8e06da6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
 356i386memfd_createsys_memfd_create
 357i386bpf sys_bpf
 358i386execveatsys_execveat
stub32_execveat
+360i386mlock2  sys_mlock2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..67601e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
 320common  kexec_file_load sys_kexec_file_load
 321common  bpf sys_bpf
 32264  execveatstub_execveat
+324common  mlock2  sys_mlock2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b45c45b..56a3d59 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user 
*filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
 
+asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
+
 #endif
diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index e016bd9..14a6013 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
 __SYSCALL(__NR_bpf, sys_bpf)
 #define __NR_execveat 281
 __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_mlock2 282
+__SYSCALL(__NR_mlock2, sys_mlock2)
 
 #undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..4818b71 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
 cond_syscall(sys_munlock);
 cond_syscall(sys_mlockall);
 cond_syscall(sys_munlockall);
+cond_syscall(sys_mlock2);
 cond_syscall(sys_mincore);
 cond_syscall(sys_madvise);
 cond_syscall(sys_mremap);
diff --git a/mm/mlock.c b/mm/mlock.c
index 5692ee5..3094f27 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
 }
 
+SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
+{
+   if (flags)
+   return -EINVAL;
+
+   return do_mlock(start, len, VM_LOCKED);
+}
+
 SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
 {
int ret;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v7 0/6] Allow user to request memory to be locked on page fault

2015-08-08 Thread Eric B Munson

 mlock(MLOCK_ONFAULT) is significantly faster.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).  The following are the average
values across 20 runs of stream and 10 runs of kernbench after a warmup
run whose results were discarded.

Avg throughput in MB/s from stream using 100 element arrays
Test 4.2-rc1  4.2-rc1+lock-on-fault
Copy:10,566.5 10,421
Scale:   10,685   10,503.5
Add: 12,044.1 11,814.2
Triad:   12,064.8 11,846.3

Kernbench optimal load
 4.2-rc1  4.2-rc1+lock-on-fault
Elapsed Time 78.453   78.991
User Time64.2395  65.2355
System Time  9.7335   9.7085
Context Switches 22211.5  22412.1
Sleeps   14965.3  14956.1

---
Changes from V6:
* Bump the x86 system call number to avoid collision with userfaultfd
* Fix FOLL_POPULATE and FOLL_MLOCK usage when mmap is called with
 MAP_POPULATE
* Add documentation for the proc smaps change
* checkpatch fixes

Changes from V5:
Drop MLOCK_LOCKED flag
* MLOCK_ONFAULT and MCL_ONFAULT are treated as a modifier to other locking
 operations, mirroring the relationship between VM_LOCKED and
 VM_LOCKONFAULT
* Drop mmap flag and related tests
* Fix clearing of MCL_CURRENT when mlockall is called with MCL_FUTURE,
 mlockall behavoir now matches the old behavior WRT to ordering

Changes from V4:
Drop all architectures for new sys call entries except x86[_64] and MIPS
Drop munlock2 and munlockall2
Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
Adjust tests to match

Changes from V3:
Ensure that pages present when mlock2(MLOCK_ONFAULT) is called are locked
Ensure that VM_LOCKONFAULT is handled in cases that used to only check VM_LOCKED
Add tests for new system calls
Add missing syscall entries, fix NR_syscalls on multiple arch's
Add missing MAP_LOCKONFAULT for tile

Changes from V2:
Added new system calls for mlock, munlock, and munlockall with added
flags arguments for controlling how memory is locked or unlocked.


Eric B Munson (6):
  mm: mlock: Refactor mlock, munlock, and munlockall code
  mm: mlock: Add new mlock system call
  mm: Introduce VM_LOCKONFAULT
  mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage
  selftests: vm: Add tests for lock on fault
  mips: Add entry for new mlock2 syscall

 Documentation/filesystems/proc.txt  |   1 +
 arch/alpha/include/uapi/asm/mman.h  |   3 +
 arch/mips/include/uapi/asm/mman.h   |   6 +
 arch/mips/include/uapi/asm/unistd.h |  15 +-
 arch/mips/kernel/scall32-o32.S  |   1 +
 arch/mips/kernel/scall64-64.S   |   1 +
 arch/mips/kernel/scall64-n32.S  |   1 +
 arch/mips/kernel/scall64-o32.S  |   1 +
 arch/parisc/include/uapi/asm/mman.h |   3 +
 arch/powerpc/include/uapi/asm/mman.h|   1 +
 arch/sparc/include/uapi/asm/mman.h  |   1 +
 arch/tile/include/uapi/asm/mman.h   |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl  |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl  |   1 +
 arch/xtensa/include/uapi/asm/mman.h |   6 +
 drivers/gpu/drm/drm_vm.c|   8 +-
 fs/proc/task_mmu.c  |   1 +
 include/linux/mm.h  |   2 +
 include/linux/syscalls.h|   2 +
 include/uapi/asm-generic/mman-common.h  |   5 +
 include/uapi/asm-generic/mman.h |   1 +
 include/uapi/asm-generic/unistd.h   |   4 +-
 kernel/fork.c   |   2 +-
 kernel/sys_ni.c |   1 +
 mm/debug.c  |   1 +
 mm/gup.c|  10 +-
 mm/huge_memory.c|   2 +-
 mm/hugetlb.c|   4 +-
 mm/mlock.c  |  87 +++-
 mm/mmap.c   |   2 +-
 mm/rmap.c   |   6 +-
 tools/testing/selftests/vm/Makefile |   2 +
 tools/testing/selftests/vm/mlock2-tests.c   | 661 
 tools/testing/selftests/vm/on-fault-limit.c |  47 ++
 tools/testing/selftests/vm/run_vmtests  |  22 +
 35 files changed, 872 insertions(+), 41 deletions(-)
 create mode 100644 tools/testing/selftests/vm/mlock2-tests.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan shua...@osg.samsung.com
Cc: Michal Hocko mho...@suse.cz
Cc: Michael Kerrisk mtk.manpa...@gmail.com
Cc: Vlastimil Babka vba...@suse.cz
Cc: Jonathan Corbet cor...@lwn.net
Cc: Ralf Baechle r...@linux-mips.org
Cc: Andrea Arcangeli aarca...@redhat.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org

-- 
1.9.1

[PATCH v7 4/6] mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage

2015-08-08 Thread Eric B Munson

The previous patch introduced a flag that specified pages in a VMA
should be placed on the unevictable LRU, but they should not be made
present when the area is created.  This patch adds the ability to set
this state via the new mlock system calls.

We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall
flags.  When used with MCL_CURRENT, all current mappings will be marked
with VM_LOCKED | VM_LOCKONFAULT.  When used with MCL_FUTURE, the
mm-def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.  When used
with both MCL_CURRENT and MCL_FUTURE, all current mappings and
mm-def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.

Prior to this patch, mlockall() will unconditionally clear the
mm-def_flags any time it is called without MCL_FUTURE.  This behavior
is maintained after adding MCL_ONFAULT.  If a call to
mlockall(MCL_FUTURE) is followed by mlockall(MCL_CURRENT), the
mm-def_flags will be cleared and new VMAs will be unlocked.  This
remains true with or without MCL_ONFAULT in either mlockall()
invocation.

munlock() will unconditionally clear both vma flags.  munlockall()
unconditionally clears for VMA flags on all VMAs and in the
mm-def_flags field.

Signed-off-by: Eric B Munson emun...@akamai.com
Acked-by: Vlastimil Babka vba...@suse.cz
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Jonathan Corbet cor...@lwn.net
Cc: Kirill A. Shutemov kir...@shutemov.name
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/alpha/include/uapi/asm/mman.h |  3 ++
 arch/mips/include/uapi/asm/mman.h  |  6 
 arch/parisc/include/uapi/asm/mman.h|  3 ++
 arch/powerpc/include/uapi/asm/mman.h   |  1 +
 arch/sparc/include/uapi/asm/mman.h |  1 +
 arch/tile/include/uapi/asm/mman.h  |  1 +
 arch/xtensa/include/uapi/asm/mman.h|  6 
 include/uapi/asm-generic/mman-common.h |  5 
 include/uapi/asm-generic/mman.h|  1 +
 mm/mlock.c | 53 +-
 10 files changed, 67 insertions(+), 13 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..f2f9496 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -37,6 +37,9 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ONFAULT32768   /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index cfcb876..97c03f4 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -61,6 +61,12 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 294d251..ecc3ae1 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -31,6 +31,9 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..03c06ba 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space

[PATCH v7 2/6] mm: mlock: Add new mlock system call

2015-08-08 Thread Eric B Munson

With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added.  mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.

Signed-off-by: Eric B Munson emun...@akamai.com
Acked-by: Vlastimil Babka vba...@suse.cz
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Geert Uytterhoeven ge...@linux-m68k.org
Cc: Catalin Marinas catalin.mari...@arm.com
Cc: Stephen Rothwell s...@canb.auug.org.au
Cc: Guenter Roeck li...@roeck-us.net
Cc: Andrea Arcangeli aarca...@redhat.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: adi-buildroot-de...@lists.sourceforge.net
Cc: linux-cris-ker...@axis.com
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-am33-l...@redhat.com
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 include/linux/syscalls.h   | 2 ++
 include/uapi/asm-generic/unistd.h  | 4 +++-
 kernel/sys_ni.c| 1 +
 mm/mlock.c | 8 
 6 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..8e06da6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
 356i386memfd_createsys_memfd_create
 357i386bpf sys_bpf
 358i386execveatsys_execveat
stub32_execveat
+360i386mlock2  sys_mlock2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..67601e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
 320common  kexec_file_load sys_kexec_file_load
 321common  bpf sys_bpf
 32264  execveatstub_execveat
+324common  mlock2  sys_mlock2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b45c45b..56a3d59 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user 
*filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
 
+asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
+
 #endif
diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index e016bd9..14a6013 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
 __SYSCALL(__NR_bpf, sys_bpf)
 #define __NR_execveat 281
 __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_mlock2 282
+__SYSCALL(__NR_mlock2, sys_mlock2)
 
 #undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..4818b71 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
 cond_syscall(sys_munlock);
 cond_syscall(sys_mlockall);
 cond_syscall(sys_munlockall);
+cond_syscall(sys_mlock2);
 cond_syscall(sys_mincore);
 cond_syscall(sys_madvise);
 cond_syscall(sys_mremap);
diff --git a/mm/mlock.c b/mm/mlock.c
index 5692ee5..3094f27 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -643,6 +643,14 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
 }
 
+SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
+{
+   if (flags)
+   return -EINVAL;
+
+   return do_mlock(start, len, VM_LOCKED);
+}
+
 SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
 {
int ret;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V6 2/6] mm: mlock: Add new mlock system call

2015-07-29 Thread Eric B Munson

With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added.  mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Geert Uytterhoeven ge...@linux-m68k.org
Cc: Catalin Marinas catalin.mari...@arm.com
Cc: Stephen Rothwell s...@canb.auug.org.au
Cc: Guenter Roeck li...@roeck-us.net
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: adi-buildroot-de...@lists.sourceforge.net
Cc: linux-cris-ker...@axis.com
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-am33-l...@redhat.com
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 include/linux/syscalls.h   | 2 ++
 include/uapi/asm-generic/unistd.h  | 4 +++-
 kernel/sys_ni.c| 1 +
 mm/mlock.c | 9 +
 6 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..839d5df 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
 356i386memfd_createsys_memfd_create
 357i386bpf sys_bpf
 358i386execveatsys_execveat
stub32_execveat
+359i386mlock2  sys_mlock2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..ad36769 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
 320common  kexec_file_load sys_kexec_file_load
 321common  bpf sys_bpf
 32264  execveatstub_execveat
+323common  mlock2  sys_mlock2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b45c45b..56a3d59 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user 
*filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
 
+asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
+
 #endif
diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index e016bd9..14a6013 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
 __SYSCALL(__NR_bpf, sys_bpf)
 #define __NR_execveat 281
 __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_mlock2 282
+__SYSCALL(__NR_mlock2, sys_mlock2)
 
 #undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..4818b71 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -193,6 +193,7 @@ cond_syscall(sys_mlock);
 cond_syscall(sys_munlock);
 cond_syscall(sys_mlockall);
 cond_syscall(sys_munlockall);
+cond_syscall(sys_mlock2);
 cond_syscall(sys_mincore);
 cond_syscall(sys_madvise);
 cond_syscall(sys_mremap);
diff --git a/mm/mlock.c b/mm/mlock.c
index 1585cca..807f986 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -642,6 +642,15 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
 }
 
+SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
+{
+   vm_flags_t vm_flags = VM_LOCKED;
+   if (flags)
+   return -EINVAL;
+
+   return do_mlock(start, len, vm_flags);
+}
+
 SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
 {
int ret;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V6 4/6] mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage

2015-07-29 Thread Eric B Munson

The previous patch introduced a flag that specified pages in a VMA
should be placed on the unevictable LRU, but they should not be made
present when the area is created.  This patch adds the ability to set
this state via the new mlock system calls.

We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall
flags.  When used with MCL_CURRENT, all current mappings will be marked
with VM_LOCKED | VM_LOCKONFAULT.  When used with MCL_FUTURE, the
mm-def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.  When used
with both MCL_CURRENT and MCL_FUTURE, all current mappings and
mm-def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT.

Prior to this patch, mlockall() will unconditionally clear the
mm-def_flags any time it is called without MCL_FUTURE.  This behavior
is maintained after adding MCL_ONFAULT.  If a call to
mlockall(MCL_FUTURE) is followed by mlockall(MCL_CURRENT), the
mm-def_flags will be cleared and new VMAs will be unlocked.  This
remains true with or without MCL_ONFAULT in either mlockall()
invocation.

munlock() will unconditionally clear both vma flags.  munlockall()
unconditionally clears for VMA flags on all VMAs and in the
mm-def_flags field.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Jonathan Corbet cor...@lwn.net
Cc: Kirill A. Shutemov kir...@shutemov.name
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/alpha/include/uapi/asm/mman.h |  3 ++
 arch/mips/include/uapi/asm/mman.h  |  6 
 arch/parisc/include/uapi/asm/mman.h|  3 ++
 arch/powerpc/include/uapi/asm/mman.h   |  1 +
 arch/sparc/include/uapi/asm/mman.h |  1 +
 arch/tile/include/uapi/asm/mman.h  |  1 +
 arch/xtensa/include/uapi/asm/mman.h|  6 
 include/uapi/asm-generic/mman-common.h |  5 
 include/uapi/asm-generic/mman.h|  1 +
 mm/mlock.c | 55 ++
 10 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..f2f9496 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -37,6 +37,9 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ONFAULT32768   /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index cfcb876..97c03f4 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -61,6 +61,12 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
+
+/*
+ * Flags for mlock
+ */
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 294d251..ecc3ae1 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -31,6 +31,9 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
+
+#define MLOCK_ONFAULT  0x01/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..03c06ba 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ONFAULT0x8000

[PATCH V6 0/6] Allow user to request memory to be locked on page fault

2015-07-29 Thread Eric B Munson

 mlock(MLOCK_ONFAULT) is significantly faster.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).  The following are the average
values across 20 runs of stream and 10 runs of kernbench after a warmup
run whose results were discarded.

Avg throughput in MB/s from stream using 100 element arrays
Test 4.2-rc1  4.2-rc1+lock-on-fault
Copy:10,566.5 10,421
Scale:   10,685   10,503.5
Add: 12,044.1 11,814.2
Triad:   12,064.8 11,846.3

Kernbench optimal load
 4.2-rc1  4.2-rc1+lock-on-fault
Elapsed Time 78.453   78.991
User Time64.2395  65.2355
System Time  9.7335   9.7085
Context Switches 22211.5  22412.1
Sleeps   14965.3  14956.1

---
Changes from V5:
Drop MLOCK_LOCKED flag
* MLOCK_ONFAULT and MCL_ONFAULT are treated as a modifier to other locking
 operations, mirroring the relationship between VM_LOCKED and
 VM_LOCKONFAULT
* Drop mmap flag and related tests
* Fix clearing of MCL_CURRENT when mlockall is called with MCL_FUTURE,
 mlockall behavoir now matches the old behavior WRT to ordering

Changes from V4:
Drop all architectures for new sys call entries except x86[_64] and MIPS
Drop munlock2 and munlockall2
Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
Adjust tests to match

Changes from V3:
Ensure that pages present when mlock2(MLOCK_ONFAULT) is called are locked
Ensure that VM_LOCKONFAULT is handled in cases that used to only check VM_LOCKED
Add tests for new system calls
Add missing syscall entries, fix NR_syscalls on multiple arch's
Add missing MAP_LOCKONFAULT for tile

Changes from V2:
Added new system calls for mlock, munlock, and munlockall with added
flags arguments for controlling how memory is locked or unlocked.


Eric B Munson (6):
  mm: mlock: Refactor mlock, munlock, and munlockall code
  mm: mlock: Add new mlock system call
  mm: Introduce VM_LOCKONFAULT
  mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage
  selftests: vm: Add tests for lock on fault
  mips: Add entry for new mlock2 syscall

 arch/alpha/include/uapi/asm/mman.h  |   3 +
 arch/mips/include/uapi/asm/mman.h   |   6 +
 arch/mips/include/uapi/asm/unistd.h |  15 +-
 arch/mips/kernel/scall32-o32.S  |   1 +
 arch/mips/kernel/scall64-64.S   |   1 +
 arch/mips/kernel/scall64-n32.S  |   1 +
 arch/mips/kernel/scall64-o32.S  |   1 +
 arch/parisc/include/uapi/asm/mman.h |   3 +
 arch/powerpc/include/uapi/asm/mman.h|   1 +
 arch/sparc/include/uapi/asm/mman.h  |   1 +
 arch/tile/include/uapi/asm/mman.h   |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl  |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl  |   1 +
 arch/xtensa/include/uapi/asm/mman.h |   6 +
 drivers/gpu/drm/drm_vm.c|   8 +-
 fs/proc/task_mmu.c  |   1 +
 include/linux/mm.h  |   2 +
 include/linux/syscalls.h|   2 +
 include/uapi/asm-generic/mman-common.h  |   5 +
 include/uapi/asm-generic/mman.h |   1 +
 include/uapi/asm-generic/unistd.h   |   4 +-
 kernel/fork.c   |   2 +-
 kernel/sys_ni.c |   1 +
 mm/debug.c  |   1 +
 mm/gup.c|  10 +-
 mm/huge_memory.c|   2 +-
 mm/hugetlb.c|   4 +-
 mm/mlock.c  |  91 +++-
 mm/mmap.c   |   2 +-
 mm/rmap.c   |   4 +-
 tools/testing/selftests/vm/Makefile |   2 +
 tools/testing/selftests/vm/mlock2-tests.c   | 661 
 tools/testing/selftests/vm/on-fault-limit.c |  47 ++
 tools/testing/selftests/vm/run_vmtests  |  22 +
 34 files changed, 873 insertions(+), 41 deletions(-)
 create mode 100644 tools/testing/selftests/vm/mlock2-tests.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan shua...@osg.samsung.com
Cc: Michal Hocko mho...@suse.cz
Cc: Michael Kerrisk mtk.manpa...@gmail.com
Cc: Vlastimil Babka vba...@suse.cz
Cc: Jonathan Corbet cor...@lwn.net
Cc: Ralf Baechle r...@linux-mips.org
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org


-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V5 0/7] Allow user to request memory to be locked on page fault

2015-07-28 Thread Eric B Munson

On Tue, 28 Jul 2015, Michal Hocko wrote:

 [I am sorry but I didn't get to this sooner.]
 
 On Mon 27-07-15 10:54:09, Eric B Munson wrote:
  Now that VM_LOCKONFAULT is a modifier to VM_LOCKED and
  cannot be specified independentally, it might make more sense to mirror
  that relationship to userspace.  Which would lead to soemthing like the
  following:
 
 A modifier makes more sense.
  
  To lock and populate a region:
  mlock2(start, len, 0);
  
  To lock on fault a region:
  mlock2(start, len, MLOCK_ONFAULT);
  
  If LOCKONFAULT is seen as a modifier to mlock, then having the flags
  argument as 0 mean do mlock classic makes more sense to me.
  
  To mlock current on fault only:
  mlockall(MCL_CURRENT | MCL_ONFAULT);
  
  To mlock future on fault only:
  mlockall(MCL_FUTURE | MCL_ONFAULT);
  
  To lock everything on fault:
  mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT);
 
 Makes sense to me. The only remaining and still tricky part would be
 the munlock{all}(flags) behavior. What should munlock(MLOCK_ONFAULT)
 do? Keep locked and poppulate the range or simply ignore the flag an
 just unlock?
 
 I can see some sense to allow munlockall(MCL_FUTURE[|MLOCK_ONFAULT]),
 munlockall(MCL_CURRENT) resp. munlockall(MCL_CURRENT|MCL_FUTURE) but
 other combinations sound weird to me.
 
 Anyway munlock with flags opens new doors of trickiness.

In the current revision there are no new munlock[all] system calls
introduced.  munlockall() unconditionally cleared both MCL_CURRENT and
MCL_FUTURE before the set and now unconditionally clears all three.
munlock() does the same for VM_LOCK and VM_LOCKONFAULT.  If the user
wants to adjust mlockall flags today, they need to call mlockall a
second time with the new flags, this remains true for mlockall after
this set and the same behavior is mirrored in mlock2.  The only
remaining question I have is should we have 2 new mlockall flags so that
the caller can explicitly set VM_LOCKONFAULT in the mm-def_flags vs
locking all current VMAs on fault.  I ask because if the user wants to
lock all current VMAs the old way, but all future VMAs on fault they
have to call mlockall() twice:

mlockall(MCL_CURRENT);
mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT);

This has the side effect of converting all the current VMAs to
VM_LOCKONFAULT, but because they were all made present and locked in the
first call, this should not matter in most cases.  The catch is that,
like mmap(MAP_LOCKED), mlockall() does not communicate if mm_populate()
fails.  This has been true of mlockall() from the beginning so I don't
know if it needs more than an entry in the man page to clarify (which I
will add when I add documentation for MCL_ONFAULT).  In a much less
likely corner case, it is not possible in the current setup to request
all current VMAs be VM_LOCKONFAULT and all future be VM_LOCKED.



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V5 0/7] Allow user to request memory to be locked on page fault

2015-07-28 Thread Eric B Munson

On Tue, 28 Jul 2015, Vlastimil Babka wrote:

 On 07/28/2015 03:49 PM, Eric B Munson wrote:
 On Tue, 28 Jul 2015, Michal Hocko wrote:
 
 
 [...]
 
 The only
 remaining question I have is should we have 2 new mlockall flags so that
 the caller can explicitly set VM_LOCKONFAULT in the mm-def_flags vs
 locking all current VMAs on fault.  I ask because if the user wants to
 lock all current VMAs the old way, but all future VMAs on fault they
 have to call mlockall() twice:
 
  mlockall(MCL_CURRENT);
  mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT);
 
 This has the side effect of converting all the current VMAs to
 VM_LOCKONFAULT, but because they were all made present and locked in the
 first call, this should not matter in most cases.
 
 Shouldn't the user be able to do this?
 
 mlockall(MCL_CURRENT)
 mlockall(MCL_FUTURE | MCL_ONFAULT);
 
 Note that the second call shouldn't change (i.e. munlock) existing
 vma's just because MCL_CURRENT is not present. The current
 implementation doesn't do that thanks to the following in
 do_mlockall():
 
 if (flags == MCL_FUTURE)
 goto out;
 
 before current vma's are processed and MCL_CURRENT is checked. This
 is probably so that do_mlockall() can also handle the munlockall()
 syscall.
 So we should be careful not to break this, but otherwise there are
 no limitations by not having two MCL_ONFAULT flags. Having to do
 invoke syscalls instead of one is not an issue as this shouldn't be
 frequent syscall.

Good catch, my current implementation did break this and is now fixed.

 
 The catch is that,
 like mmap(MAP_LOCKED), mlockall() does not communicate if mm_populate()
 fails.  This has been true of mlockall() from the beginning so I don't
 know if it needs more than an entry in the man page to clarify (which I
 will add when I add documentation for MCL_ONFAULT).
 
 Good point.
 
 In a much less
 likely corner case, it is not possible in the current setup to request
 all current VMAs be VM_LOCKONFAULT and all future be VM_LOCKED.
 
 So again this should work:
 
 mlockall(MCL_CURRENT | MCL_ONFAULT)
 mlockall(MCL_FUTURE);
 
 But the order matters here, as current implementation of
 do_mlockall() will clear VM_LOCKED from def_flags if MCL_FUTURE is
 not passed. So *it's different* from how it handles MCL_CURRENT (as
 explained above). And not documented in manpage. Oh crap, this API
 is a closet full of skeletons. Maybe it was an unnoticed regression
 and we can restore some sanity?

I will add a note about the ordering problem to the manpage as well.
Unfortunately, the basic idea of clearing VM_LOCKED from mm-def_flags
if MCL_FUTURE is not specified but not doing the same for MCL_CURRENT
predates the move to git, so I am not sure if it was ever different.



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V5 0/7] Allow user to request memory to be locked on page fault

2015-07-27 Thread Eric B Munson

On Mon, 27 Jul 2015, Vlastimil Babka wrote:

 On 07/24/2015 11:28 PM, Eric B Munson wrote:
 
 ...
 
 Changes from V4:
 Drop all architectures for new sys call entries except x86[_64] and MIPS
 Drop munlock2 and munlockall2
 Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
 Adjust tests to match
 
 Hi, thanks for considering my suggestions. Well, I do hope there
 were correct as API's are hard and I'm no API expert. But since
 API's are also impossible to change after merging, I'm sorry but
 I'll keep pestering for one last thing. Thanks again for persisting,
 I do believe it's for the good thing!
 
 The thing is that I still don't like that one has to call
 mlock2(MLOCK_LOCKED) to get the equivalent of the old mlock(). Why
 is that flag needed? We have two modes of locking now, and v5 no
 longer treats them separately in vma flags. But having two flags
 gives us four possible combinations, so two of them would serve
 nothing but to confuse the programmer IMHO. What will mlock2()
 without flags do? What will mlock2(MLOCK_LOCKED | MLOCK_ONFAULT) do?
 (Note I haven't studied the code yet, as having agreed on the API
 should come first. But I did suggest documenting these things more
 thoroughly too...)
 OK I checked now and both cases above seem to return EINVAL.
 
 So about the only point I see in MLOCK_LOCKED flag is parity with
 MAP_LOCKED for mmap(). But as Kirill said (and me before as well)
 MAP_LOCKED is broken anyway so we shouldn't twist the rest just of
 the API to keep the poor thing happier in its misery.
 
 Also note that AFAICS you don't have MCL_LOCKED for mlockall() so
 there's no full parity anyway. But please don't fix that by adding
 MCL_LOCKED :)
 
 Thanks!


I have an MLOCK_LOCKED flag because I prefer an interface to be
explicit.  The caller of mlock2() will be required to fill in the flags
argument regardless.  I can drop the MLOCK_LOCKED flag with 0 being the
value for LOCKED, but I thought it easier to make clear what was going
on at any call to mlock2().  If user space defines a MLOCK_LOCKED that
happens to be 0, I suppose that would be okay.

We do actually have an MCL_LOCKED, we just call it MCL_CURRENT.  Would
you prefer that I match the name in mlock2() (add MLOCK_CURRENT
instead)?

Finally, on the question of MAP_LOCKONFAULT, do you just dislike
MAP_LOCKED and do not want to see it extended, or is this a NAK on the
set if that patch is included.  I ask because I have to spin a V6 to get
the MLOCK flag declarations right, but I would prefer not to do a V7+.
If this is a NAK with, I can drop that patch and rework the tests to
cover without the mmap flag.  Otherwise I want to keep it, I have an
internal user that would like to see it added.



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V5 5/7] mm: mmap: Add mmap flag to request VM_LOCKONFAULT

2015-07-27 Thread Eric B Munson

On Mon, 27 Jul 2015, Kirill A. Shutemov wrote:

 On Fri, Jul 24, 2015 at 05:28:43PM -0400, Eric B Munson wrote:
  The cost of faulting in all memory to be locked can be very high when
  working with large mappings.  If only portions of the mapping will be
  used this can incur a high penalty for locking.
  
  Now that we have the new VMA flag for the locked but not present state,
  expose it as an mmap option like MAP_LOCKED - VM_LOCKED.
 
 As I mentioned before, I don't think this interface is justified.
 
 MAP_LOCKED has known issues[1]. The MAP_LOCKED problem is not necessary
 affects MAP_LOCKONFAULT, but still.
 
 Let's not add new interface unless it's demonstrably useful.
 
 [1] http://lkml.kernel.org/g/20150114095019.gc4...@dhcp22.suse.cz

I understand and should have been more explicit.  This patch is still
included becuase I have an internal user that wants to see it added.
The problem discussed in the thread you point out does not affect
MAP_LOCKONFAULT because we do not attempt to populate the region with
MAP_LOCKONFAULT.

As I told Vlastimil, if this is a hard NAK with the patch I can work
with that.  Otherwise I prefer it stays.



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V5 0/7] Allow user to request memory to be locked on page fault

2015-07-27 Thread Eric B Munson

On Mon, 27 Jul 2015, Vlastimil Babka wrote:

 On 07/27/2015 03:35 PM, Eric B Munson wrote:
 On Mon, 27 Jul 2015, Vlastimil Babka wrote:
 
 On 07/24/2015 11:28 PM, Eric B Munson wrote:
 
 ...
 
 Changes from V4:
 Drop all architectures for new sys call entries except x86[_64] and MIPS
 Drop munlock2 and munlockall2
 Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
 Adjust tests to match
 
 Hi, thanks for considering my suggestions. Well, I do hope there
 were correct as API's are hard and I'm no API expert. But since
 API's are also impossible to change after merging, I'm sorry but
 I'll keep pestering for one last thing. Thanks again for persisting,
 I do believe it's for the good thing!
 
 The thing is that I still don't like that one has to call
 mlock2(MLOCK_LOCKED) to get the equivalent of the old mlock(). Why
 is that flag needed? We have two modes of locking now, and v5 no
 longer treats them separately in vma flags. But having two flags
 gives us four possible combinations, so two of them would serve
 nothing but to confuse the programmer IMHO. What will mlock2()
 without flags do? What will mlock2(MLOCK_LOCKED | MLOCK_ONFAULT) do?
 (Note I haven't studied the code yet, as having agreed on the API
 should come first. But I did suggest documenting these things more
 thoroughly too...)
 OK I checked now and both cases above seem to return EINVAL.
 
 So about the only point I see in MLOCK_LOCKED flag is parity with
 MAP_LOCKED for mmap(). But as Kirill said (and me before as well)
 MAP_LOCKED is broken anyway so we shouldn't twist the rest just of
 the API to keep the poor thing happier in its misery.
 
 Also note that AFAICS you don't have MCL_LOCKED for mlockall() so
 there's no full parity anyway. But please don't fix that by adding
 MCL_LOCKED :)
 
 Thanks!
 
 
 I have an MLOCK_LOCKED flag because I prefer an interface to be
 explicit.
 
 I think it's already explicit enough that the user calls mlock2(),
 no? He obviously wants the range mlocked. An optional flag says that
 there should be no pre-fault.
 
 The caller of mlock2() will be required to fill in the flags
 argument regardless.
 
 I guess users not caring about MLOCK_ONFAULT will continue using
 plain mlock() without flags anyway.
 
 I can drop the MLOCK_LOCKED flag with 0 being the
 value for LOCKED, but I thought it easier to make clear what was going
 on at any call to mlock2().  If user space defines a MLOCK_LOCKED that
 happens to be 0, I suppose that would be okay.
 
 Yeah that would remove the weird 4-states-of-which-2-are-invalid
 problem I mentioned, but at the cost of glibc wrapper behaving
 differently than the kernel syscall itself. For little gain.
 
 We do actually have an MCL_LOCKED, we just call it MCL_CURRENT.  Would
 you prefer that I match the name in mlock2() (add MLOCK_CURRENT
 instead)?
 
 Hm it's similar but not exactly the same, because MCL_FUTURE is not
 the same as MLOCK_ONFAULT :) So MLOCK_CURRENT would be even more
 confusing. Especially if mlockall(MCL_CURRENT | MCL_FUTURE) is OK,
 but mlock2(MLOCK_LOCKED | MLOCK_ONFAULT) is invalid.

MLOCK_ONFAULT isn't meant to be the same as MCL_FUTURE, rather it is
meant to be the same as MCL_ONFAULT.  MCL_FUTURE only controls if the
locking policy will be applied to any new mappings made by this process,
not the locking policy itself.  The better comparison is MCL_CURRENT to
MLOCK_LOCK and MCL_ONFAULT to MLOCK_ONFAULT.  MCL_CURRENT and
MLOCK_LOCK do the same thing, only one requires a specific range of
addresses while the other works process wide.  This is why I suggested
changing MLOCK_LOCK to MLOCK_CURRENT.  It is an error to call
mlock2(MLOCK_LOCK | MLOCK_ONFAULT) just like it is an error to call
mlockall(MCL_CURRENT | MCL_ONFAULT).  The combinations do no make sense.

This was all decided when VM_LOCKONFAULT was a separate state from
VM_LOCKED.  Now that VM_LOCKONFAULT is a modifier to VM_LOCKED and
cannot be specified independentally, it might make more sense to mirror
that relationship to userspace.  Which would lead to soemthing like the
following:

To lock and populate a region:
mlock2(start, len, 0);

To lock on fault a region:
mlock2(start, len, MLOCK_ONFAULT);

If LOCKONFAULT is seen as a modifier to mlock, then having the flags
argument as 0 mean do mlock classic makes more sense to me.

To mlock current on fault only:
mlockall(MCL_CURRENT | MCL_ONFAULT);

To mlock future on fault only:
mlockall(MCL_FUTURE | MCL_ONFAULT);

To lock everything on fault:
mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT);

I think I have talked myself into rewriting the set again :/

 
 Finally, on the question of MAP_LOCKONFAULT, do you just dislike
 MAP_LOCKED and do not want to see it extended, or is this a NAK on the
 set if that patch is included.  I ask because I have to spin a V6 to get
 the MLOCK flag declarations right, but I would prefer not to do a V7+.
 If this is a NAK with, I can drop that patch and rework the tests to
 cover without the mmap

Re: [PATCH V5 5/7] mm: mmap: Add mmap flag to request VM_LOCKONFAULT

2015-07-27 Thread Eric B Munson

On Mon, 27 Jul 2015, Kirill A. Shutemov wrote:

 On Mon, Jul 27, 2015 at 09:41:26AM -0400, Eric B Munson wrote:
  On Mon, 27 Jul 2015, Kirill A. Shutemov wrote:
  
   On Fri, Jul 24, 2015 at 05:28:43PM -0400, Eric B Munson wrote:
The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.

Now that we have the new VMA flag for the locked but not present state,
expose it as an mmap option like MAP_LOCKED - VM_LOCKED.
   
   As I mentioned before, I don't think this interface is justified.
   
   MAP_LOCKED has known issues[1]. The MAP_LOCKED problem is not necessary
   affects MAP_LOCKONFAULT, but still.
   
   Let's not add new interface unless it's demonstrably useful.
   
   [1] http://lkml.kernel.org/g/20150114095019.gc4...@dhcp22.suse.cz
  
  I understand and should have been more explicit.  This patch is still
  included becuase I have an internal user that wants to see it added.
  The problem discussed in the thread you point out does not affect
  MAP_LOCKONFAULT because we do not attempt to populate the region with
  MAP_LOCKONFAULT.
  
  As I told Vlastimil, if this is a hard NAK with the patch I can work
  with that.  Otherwise I prefer it stays.
 
 That's not how it works.

I am not sure what you mean here.  I have a user that will find this
useful and MAP_LOCKONFAULT does not suffer from the problem you point
out.  I do not understand your NAK but thank you for explicit about it.

 
 Once an ABI added to the kernel it stays there practically forever.
 Therefore it must be useful to justify maintenance cost. I don't see it
 demonstrated.

I understand this, and I get that you do not like MAP_LOCKED, but I do
not see how your dislike for MAP_LOCKED means that this would not be
useful.

 
 So, NAK.
 

V6 will not have the new mmap flag unless there is someone else that
speaks up in favor of keeping it.



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4 2/6] mm: mlock: Add new mlock, munlock, and munlockall system calls

2015-07-24 Thread Eric B Munson

On Fri, 24 Jul 2015, Guenter Roeck wrote:

 On 07/24/2015 07:39 AM, Eric B Munson wrote:
 On Thu, 23 Jul 2015, Ralf Baechle wrote:
 
 On Wed, Jul 22, 2015 at 10:15:01AM -0400, Eric B Munson wrote:
 
 
 You haven't wired it up properly on powerpc, but I haven't mentioned it 
 because
 I'd rather we did it.
 
 cheers
 
 It looks like I will be spinning a V5, so I will drop all but the x86
 system calls additions in that version.
 
 The MIPS bits are looking good however, so
 
 Acked-by: Ralf Baechle r...@linux-mips.org
 
 With my ack, will you keep them or maybe carry them as a separate patch?
 
 I will keep the MIPS additions as a separate patch in the series, though
 I have dropped two of the new syscalls after some discussion.  So I will
 not include your ack on the new patch.
 
 Eric
 
 
 Hi Eric,
 
 next-20150724 still has some failures due to this patch set. Are those
 being looked at (I know parisc builds fail, but there may be others) ?
 
 Thanks,
 Guenter

Guenter,

Yes, the next respin will drop all new arch syscall entries except
x86[_64] and MIPS.  I will leave it up to arch maintainers to add the
entries.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V5 0/7] Allow user to request memory to be locked on page fault

2015-07-24 Thread Eric B Munson

 if it
reaches far beyond the end of the mapping.

These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise MAP_LOCKONFAULT is significantly faster.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).  The following are the average
values across 20 runs of stream and 10 runs of kernbench after a warmup
run whose results were discarded.

Avg throughput in MB/s from stream using 100 element arrays
Test 4.2-rc1  4.2-rc1+lock-on-fault
Copy:10,566.5 10,421
Scale:   10,685   10,503.5
Add: 12,044.1 11,814.2
Triad:   12,064.8 11,846.3

Kernbench optimal load
 4.2-rc1  4.2-rc1+lock-on-fault
Elapsed Time 78.453   78.991
User Time64.2395  65.2355
System Time  9.7335   9.7085
Context Switches 22211.5  22412.1
Sleeps   14965.3  14956.1

---
Changes from V4:
Drop all architectures for new sys call entries except x86[_64] and MIPS
Drop munlock2 and munlockall2
Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
Adjust tests to match

Changes from V3:
Ensure that pages present when mlock2(MLOCK_ONFAULT) is called are locked
Ensure that VM_LOCKONFAULT is handled in cases that used to only check VM_LOCKED
Add tests for new system calls
Add missing syscall entries, fix NR_syscalls on multiple arch's
Add missing MAP_LOCKONFAULT for tile

Changes from V2:
Added new system calls for mlock, munlock, and munlockall with added
flags arguments for controlling how memory is locked or unlocked.


Eric B Munson (7):
  mm: mlock: Refactor mlock, munlock, and munlockall code
  mm: mlock: Add new mlock system call
  mm: Introduce VM_LOCKONFAULT
  mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage
  mm: mmap: Add mmap flag to request VM_LOCKONFAULT
  selftests: vm: Add tests for lock on fault
  mips: Add entry for new mlock2 syscall

 arch/alpha/include/uapi/asm/mman.h  |   5 +
 arch/mips/include/uapi/asm/mman.h   |   8 +
 arch/mips/include/uapi/asm/unistd.h |  15 +-
 arch/mips/kernel/scall32-o32.S  |   1 +
 arch/mips/kernel/scall64-64.S   |   1 +
 arch/mips/kernel/scall64-n32.S  |   1 +
 arch/mips/kernel/scall64-o32.S  |   1 +
 arch/parisc/include/uapi/asm/mman.h |   5 +
 arch/powerpc/include/uapi/asm/mman.h|   5 +
 arch/sparc/include/uapi/asm/mman.h  |   5 +
 arch/tile/include/uapi/asm/mman.h   |   9 +
 arch/x86/entry/syscalls/syscall_32.tbl  |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl  |   1 +
 arch/xtensa/include/uapi/asm/mman.h |   8 +
 drivers/gpu/drm/drm_vm.c|   8 +-
 fs/proc/task_mmu.c  |   1 +
 include/linux/mm.h  |   2 +
 include/linux/mman.h|   3 +-
 include/linux/syscalls.h|   2 +
 include/uapi/asm-generic/mman.h |   5 +
 include/uapi/asm-generic/unistd.h   |   4 +-
 kernel/events/core.c|   3 +-
 kernel/fork.c   |   2 +-
 kernel/sys_ni.c |   1 +
 mm/debug.c  |   1 +
 mm/gup.c|  10 +-
 mm/huge_memory.c|   2 +-
 mm/hugetlb.c|   4 +-
 mm/mlock.c  |  77 +++--
 mm/mmap.c   |  10 +-
 mm/rmap.c   |   4 +-
 tools/testing/selftests/vm/Makefile |   3 +
 tools/testing/selftests/vm/lock-on-fault.c  | 344 +++
 tools/testing/selftests/vm/mlock2-tests.c   | 507 
 tools/testing/selftests/vm/on-fault-limit.c |  47 +++
 tools/testing/selftests/vm/run_vmtests  |  33 ++
 36 files changed, 1093 insertions(+), 46 deletions(-)
 create mode 100644 tools/testing/selftests/vm/lock-on-fault.c
 create mode 100644 tools/testing/selftests/vm/mlock2-tests.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan shua...@osg.samsung.com
Cc: Michal Hocko mho...@suse.cz
Cc: Michael Kerrisk mtk.manpa...@gmail.com
Cc: Vlastimil Babka vba...@suse.cz
Cc: Jonathan Corbet cor...@lwn.net
Cc: Ralf Baechle r...@linux-mips.org
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org


-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V5 5/7] mm: mmap: Add mmap flag to request VM_LOCKONFAULT

2015-07-24 Thread Eric B Munson

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.

Now that we have the new VMA flag for the locked but not present state,
expose it as an mmap option like MAP_LOCKED - VM_LOCKED.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Paul Gortmaker paul.gortma...@windriver.com
Cc: Chris Metcalf cmetc...@ezchip.com
Cc: Guenter Roeck li...@roeck-us.net
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 arch/alpha/include/uapi/asm/mman.h   | 1 +
 arch/mips/include/uapi/asm/mman.h| 1 +
 arch/parisc/include/uapi/asm/mman.h  | 1 +
 arch/powerpc/include/uapi/asm/mman.h | 1 +
 arch/sparc/include/uapi/asm/mman.h   | 1 +
 arch/tile/include/uapi/asm/mman.h| 1 +
 arch/xtensa/include/uapi/asm/mman.h  | 1 +
 include/linux/mman.h | 3 ++-
 include/uapi/asm-generic/mman.h  | 1 +
 kernel/events/core.c | 3 ++-
 mm/mmap.c| 8 ++--
 11 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 77ae8db..3f80ca4 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -30,6 +30,7 @@
 #define MAP_NONBLOCK   0x4 /* do not block on IO */
 #define MAP_STACK  0x8 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x10/* create a huge page mapping */
+#define MAP_LOCKONFAULT0x20/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_SYNC2   /* synchronous memory sync */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 71ed81d..905c1ea 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -48,6 +48,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 /*
  * Flags for msync
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index c0871ce..c4695f6 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -24,6 +24,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_SYNC1   /* synchronous memory sync */
 #define MS_ASYNC   2   /* sync memory asynchronously */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index f93f7eb..40a3fda 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -31,5 +31,6 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 8cd2ebc..f66efa6 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -26,6 +26,7 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/tile/include/uapi/asm/mman.h 
b/arch/tile/include/uapi/asm/mman.h
index acdd013..800e5c3 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -29,6 +29,7 @@
 #define MAP_DENYWRITE  0x0800  /* ETXTBSY */
 #define

[PATCH V5 4/7] mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage

2015-07-24 Thread Eric B Munson

The previous patch introduced a flag that specified pages in a VMA
should be placed on the unevictable LRU, but they should not be made
present when the area is created.  This patch adds the ability to set
this state via the new mlock system calls.

We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT flag as well as the VM_LOCKED
flag for the target region.  MCL_CURRENT and MCL_ONFAULT are used to
lock current mappings.  With MCL_CURRENT all pages are made present and
with MCL_ONFAULT they are locked when faulted in.  When specified with
MCL_FUTURE all new mappings will be marked with VM_LOCKONFAULT.

Currently, mlockall() clears all VMA lock flags and then sets the
requested flags.  For instance, if a process has MCL_FUTURE and
MCL_CURRENT set, but they want to clear MCL_FUTURE this would be
accomplished by calling mlockall(MCL_CURRENT).  This still holds with
the introduction of MCL_ONFAULT.  Each call to mlockall() resets all
VMA flags to the values specified in the current call.  The new mlock2
system call behaves in the same way.  If a region is locked with
MLOCK_ONFAULT and a user wants to force it to be populated now, a second
call to mlock2(MLOCK_LOCKED) will accomplish this.

munlock() will unconditionally clear both vma flags.  munlockall()
unconditionally clears for VMA flags on all VMAs and in the
mm-def_flags field.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Jonathan Corbet cor...@lwn.net
Cc: Kirill A. Shutemov kir...@shutemov.name
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
---
Changes from V4:
* Split addition of VMA flag

Changes from V3:
* Do extensive search for VM_LOCKED and ensure that VM_LOCKONFAULT is also 
handled
 where appropriate
 arch/alpha/include/uapi/asm/mman.h   |  2 ++
 arch/mips/include/uapi/asm/mman.h|  2 ++
 arch/parisc/include/uapi/asm/mman.h  |  2 ++
 arch/powerpc/include/uapi/asm/mman.h |  2 ++
 arch/sparc/include/uapi/asm/mman.h   |  2 ++
 arch/tile/include/uapi/asm/mman.h|  3 +++
 arch/xtensa/include/uapi/asm/mman.h  |  2 ++
 include/uapi/asm-generic/mman.h  |  2 ++
 mm/mlock.c   | 41 
 9 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index ec72436..77ae8db 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -37,8 +37,10 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ONFAULT32768   /* lock all pages that are faulted in */
 
 #define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+#define MLOCK_ONFAULT  0x02/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 67c1cdf..71ed81d 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -61,11 +61,13 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
 
 /*
  * Flags for mlock
  */
 #define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+#define MLOCK_ONFAULT  0x02/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index daab994..c0871ce 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -31,8 +31,10 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
 
 #define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+#define MLOCK_ONFAULT  0x02/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random

Re: [PATCH V4 2/6] mm: mlock: Add new mlock, munlock, and munlockall system calls

2015-07-24 Thread Eric B Munson

On Thu, 23 Jul 2015, Ralf Baechle wrote:

 On Wed, Jul 22, 2015 at 10:15:01AM -0400, Eric B Munson wrote:
 
   
   You haven't wired it up properly on powerpc, but I haven't mentioned it 
   because
   I'd rather we did it.
   
   cheers
  
  It looks like I will be spinning a V5, so I will drop all but the x86
  system calls additions in that version.
 
 The MIPS bits are looking good however, so
 
 Acked-by: Ralf Baechle r...@linux-mips.org
 
 With my ack, will you keep them or maybe carry them as a separate patch?

I will keep the MIPS additions as a separate patch in the series, though
I have dropped two of the new syscalls after some discussion.  So I will
not include your ack on the new patch.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V5 2/7] mm: mlock: Add new mlock system call

2015-07-24 Thread Eric B Munson

With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added.  mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Geert Uytterhoeven ge...@linux-m68k.org
Cc: Catalin Marinas catalin.mari...@arm.com
Cc: Stephen Rothwell s...@canb.auug.org.au
Cc: Guenter Roeck li...@roeck-us.net
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: adi-buildroot-de...@lists.sourceforge.net
Cc: linux-cris-ker...@axis.com
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-am33-l...@redhat.com
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
---
Changes from V4:
* Drop all architectures except x86[_64] from this patch, MIPS is added
  later in the series.  All others will be left to their maintainers.

Changes from V3:
* Do a (hopefully) complete job of adding the new system calls
 arch/alpha/include/uapi/asm/mman.h | 2 ++
 arch/mips/include/uapi/asm/mman.h  | 5 +
 arch/parisc/include/uapi/asm/mman.h| 2 ++
 arch/powerpc/include/uapi/asm/mman.h   | 2 ++
 arch/sparc/include/uapi/asm/mman.h | 2 ++
 arch/tile/include/uapi/asm/mman.h  | 5 +
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 arch/xtensa/include/uapi/asm/mman.h| 5 +
 include/linux/syscalls.h   | 2 ++
 include/uapi/asm-generic/mman.h| 2 ++
 include/uapi/asm-generic/unistd.h  | 4 +++-
 kernel/sys_ni.c| 1 +
 mm/mlock.c | 9 +
 14 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..ec72436 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -38,6 +38,8 @@
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
 
+#define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
 #define MADV_SEQUENTIAL2   /* expect sequential page 
references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index cfcb876..67c1cdf 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -62,6 +62,11 @@
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
 
+/*
+ * Flags for mlock
+ */
+#define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
 #define MADV_SEQUENTIAL 2  /* expect sequential page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 294d251..daab994 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -32,6 +32,8 @@
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
 
+#define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
 #define MADV_SEQUENTIAL 2   /* expect sequential page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..189e85f 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -23,6 +23,8 @@
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
 
+#define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc

Re: [PATCH V4 4/6] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it

2015-07-23 Thread Eric B Munson

On Thu, 23 Jul 2015, Vlastimil Babka wrote:

 On 07/22/2015 08:43 PM, Eric B Munson wrote:
  On Wed, 22 Jul 2015, Vlastimil Babka wrote:
  
  
  Hi,
  
  I think you should include a complete description of which
  transitions for vma states and mlock2/munlock2 flags applied on them
  are valid and what they do. It will also help with the manpages.
  You explained some to Jon in the last thread, but I think there
  should be a canonical description in changelog (if not also
  Documentation, if mlock is covered there).
  
  For example the scenario Jon asked, what happens after a
  mlock2(MLOCK_ONFAULT) followed by mlock2(MLOCK_LOCKED), and that the
  answer is nothing. Your promised code comment for
  apply_vma_flags() doesn't suffice IMHO (and I'm not sure it's there,
  anyway?).
  
  I missed adding that comment to the code, will be there in V5 along with
  the description in the changelog.
 
 Thanks!
 
  
  But the more I think about the scenario and your new VM_LOCKONFAULT
  vma flag, it seems awkward to me. Why should munlocking at all care
  if the vma was mlocked with MLOCK_LOCKED or MLOCK_ONFAULT? In either
  case the result is that all pages currently populated are munlocked.
  So the flags for munlock2 should be unnecessary.
  
  Say a user has a large area of interleaved MLOCK_LOCK and MLOCK_ONFAULT
  mappings and they want to unlock only the ones with MLOCK_LOCK.  With
  the current implementation, this is possible in a single system call
  that spans the entire region.  With your suggestion, the user would have
  to know what regions where locked with MLOCK_LOCK and call munlock() on
  each of them.  IMO, the way munlock2() works better mirrors the way
  munlock() currently works when called on a large area of interleaved
  locked and unlocked areas.
 
 Um OK, that scenario is possible in theory. But I have a hard time imagining
 that somebody would really want to do that. I think much more people would
 benefit from a simpler API.

It wasn't about imagining a scenario, more about keeping parity with
something that currently works (unlocking a large area of interleaved
locked and unlocked regions).  However, there is no reason we can't add
the new munlock2 later if it is desired.

 
  
  
  I also think VM_LOCKONFAULT is unnecessary. VM_LOCKED should be
  enough - see how you had to handle the new flag in all places that
  had to handle the old flag? I think the information whether mlock
  was supposed to fault the whole vma is obsolete at the moment mlock
  returns. VM_LOCKED should be enough for both modes, and the flag to
  mlock2 could just control whether the pre-faulting is done.
  
  So what should be IMHO enough:
  - munlock can stay without flags
  - mlock2 has only one new flag MLOCK_ONFAULT. If specified,
  pre-faulting is not done, just set VM_LOCKED and mlock pages already
  present.
  - same with mmap(MAP_LOCKONFAULT) (need to define what happens when
  both MAP_LOCKED and MAP_LOCKONFAULT are specified).
  
  Now mlockall(MCL_FUTURE) muddles the situation in that it stores the
  information for future VMA's in current-mm-def_flags, and this
  def_flags would need to distinguish VM_LOCKED with population and
  without. But that could be still solvable without introducing a new
  vma flag everywhere.
  
  With you right up until that last paragraph.  I have been staring at
  this a while and I cannot come up a way to handle the
  mlockall(MCL_ONFAULT) without introducing a new vm flag.  It doesn't
  have to be VM_LOCKONFAULT, we could use the model that Michal Hocko
  suggested with something like VM_FAULTPOPULATE.  However, we can't
  really use this flag anywhere except the mlock code becuase we have to
  be able to distinguish a caller that wants to use MLOCK_LOCK with
  whatever control VM_FAULTPOPULATE might grant outside of mlock and a
  caller that wants MLOCK_ONFAULT.  That was a long way of saying we need
  an extra vma flag regardless.  However, if that flag only controls if
  mlock pre-populates it would work and it would do away with most of the
  places I had to touch to handle VM_LOCKONFAULT properly.
 
 Yes, it would be a good way. Adding a new vma flag is probably cleanest after
 all, but the flag would be set *in addition* to VM_LOCKED, *just* to prevent
 pre-faulting. The places that check VM_LOCKED for the actual page mlocking 
 (i.e.
 try_to_unmap_one) would just keep checking VM_LOCKED. The places where 
 VM_LOCKED
 is checked to trigger prepopulation, would skip that if VM_LOCKONFAULT is also
 set. Having VM_LOCKONFAULT set without also VM_LOCKED itself would be invalid 
 state.
 
 This should work fine with the simplified API as I proposed so let me 
 reiterate
 and try fill in the blanks:
 
 - mlock2 has only one new flag MLOCK_ONFAULT. If specified, VM_LOCKONFAULT is
 set in addition to VM_LOCKED and no prefaulting is done
   - old mlock syscall naturally behaves as mlock2 without MLOCK_ONFAULT
   - calling mlock/mlock2 on an already-mlocked area

Re: [PATCH V4 4/6] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it

2015-07-22 Thread Eric B Munson

On Wed, 22 Jul 2015, Vlastimil Babka wrote:

 On 07/21/2015 09:59 PM, Eric B Munson wrote:
 The cost of faulting in all memory to be locked can be very high when
 working with large mappings.  If only portions of the mapping will be
 used this can incur a high penalty for locking.
 
 For the example of a large file, this is the usage pattern for a large
 statical language model (probably applies to other statical or graphical
 models as well).  For the security example, any application transacting
 in data that cannot be swapped out (credit card data, medical records,
 etc).
 
 This patch introduces the ability to request that pages are not
 pre-faulted, but are placed on the unevictable LRU when they are finally
 faulted in.  This can be done area at a time via the
 mlock2(MLOCK_ONFAULT) or the mlockall(MCL_ONFAULT) system calls.  These
 calls can be undone via munlock2(MLOCK_ONFAULT) or
 munlockall2(MCL_ONFAULT).
 
 Applying the VM_LOCKONFAULT flag to a mapping with pages that are
 already present required the addition of a function in gup.c to pin all
 pages which are present in an address range.  It borrows heavily from
 __mm_populate().
 
 To keep accounting checks out of the page fault path, users are billed
 for the entire mapping lock as if MLOCK_LOCKED was used.
 
 Hi,
 
 I think you should include a complete description of which
 transitions for vma states and mlock2/munlock2 flags applied on them
 are valid and what they do. It will also help with the manpages.
 You explained some to Jon in the last thread, but I think there
 should be a canonical description in changelog (if not also
 Documentation, if mlock is covered there).
 
 For example the scenario Jon asked, what happens after a
 mlock2(MLOCK_ONFAULT) followed by mlock2(MLOCK_LOCKED), and that the
 answer is nothing. Your promised code comment for
 apply_vma_flags() doesn't suffice IMHO (and I'm not sure it's there,
 anyway?).

I missed adding that comment to the code, will be there in V5 along with
the description in the changelog.

 
 But the more I think about the scenario and your new VM_LOCKONFAULT
 vma flag, it seems awkward to me. Why should munlocking at all care
 if the vma was mlocked with MLOCK_LOCKED or MLOCK_ONFAULT? In either
 case the result is that all pages currently populated are munlocked.
 So the flags for munlock2 should be unnecessary.

Say a user has a large area of interleaved MLOCK_LOCK and MLOCK_ONFAULT
mappings and they want to unlock only the ones with MLOCK_LOCK.  With
the current implementation, this is possible in a single system call
that spans the entire region.  With your suggestion, the user would have
to know what regions where locked with MLOCK_LOCK and call munlock() on
each of them.  IMO, the way munlock2() works better mirrors the way
munlock() currently works when called on a large area of interleaved
locked and unlocked areas.

 
 I also think VM_LOCKONFAULT is unnecessary. VM_LOCKED should be
 enough - see how you had to handle the new flag in all places that
 had to handle the old flag? I think the information whether mlock
 was supposed to fault the whole vma is obsolete at the moment mlock
 returns. VM_LOCKED should be enough for both modes, and the flag to
 mlock2 could just control whether the pre-faulting is done.
 
 So what should be IMHO enough:
 - munlock can stay without flags
 - mlock2 has only one new flag MLOCK_ONFAULT. If specified,
 pre-faulting is not done, just set VM_LOCKED and mlock pages already
 present.
 - same with mmap(MAP_LOCKONFAULT) (need to define what happens when
 both MAP_LOCKED and MAP_LOCKONFAULT are specified).
 
 Now mlockall(MCL_FUTURE) muddles the situation in that it stores the
 information for future VMA's in current-mm-def_flags, and this
 def_flags would need to distinguish VM_LOCKED with population and
 without. But that could be still solvable without introducing a new
 vma flag everywhere.

With you right up until that last paragraph.  I have been staring at
this a while and I cannot come up a way to handle the
mlockall(MCL_ONFAULT) without introducing a new vm flag.  It doesn't
have to be VM_LOCKONFAULT, we could use the model that Michal Hocko
suggested with something like VM_FAULTPOPULATE.  However, we can't
really use this flag anywhere except the mlock code becuase we have to
be able to distinguish a caller that wants to use MLOCK_LOCK with
whatever control VM_FAULTPOPULATE might grant outside of mlock and a
caller that wants MLOCK_ONFAULT.  That was a long way of saying we need
an extra vma flag regardless.  However, if that flag only controls if
mlock pre-populates it would work and it would do away with most of the
places I had to touch to handle VM_LOCKONFAULT properly.

I picked VM_LOCKONFAULT because it is explicit about what it is for and
there is little risk of someone coming along in 5 years and saying why
not overload this flag to do this other thing completely unrelated to
mlock?.  A flag for controling speculative population is more

Re: [PATCH V4 2/6] mm: mlock: Add new mlock, munlock, and munlockall system calls

2015-07-22 Thread Eric B Munson

On Wed, 22 Jul 2015, Michael Ellerman wrote:

 On Tue, 2015-07-21 at 13:44 -0700, Andrew Morton wrote:
  On Tue, 21 Jul 2015 15:59:37 -0400 Eric B Munson emun...@akamai.com wrote:
  
   With the refactored mlock code, introduce new system calls for mlock,
   munlock, and munlockall.  The new calls will allow the user to specify
   what lock states are being added or cleared.  mlock2 and munlock2 are
   trivial at the moment, but a follow on patch will add a new mlock state
   making them useful.
   
   munlock2 addresses a limitation of the current implementation.  If a
   user calls mlockall(MCL_CURRENT | MCL_FUTURE) and then later decides
   that MCL_FUTURE should be removed, they would have to call munlockall()
   followed by mlockall(MCL_CURRENT) which could potentially be very
   expensive.  The new munlockall2 system call allows a user to simply
   clear the MCL_FUTURE flag.
  
  This is hard.  Maybe we shouldn't have wired up anything other than
  x86.  That's what we usually do with new syscalls.
 
 Yeah I think so.
 
 You haven't wired it up properly on powerpc, but I haven't mentioned it 
 because
 I'd rather we did it.
 
 cheers

It looks like I will be spinning a V5, so I will drop all but the x86
system calls additions in that version.


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4 5/6] mm: mmap: Add mmap flag to request VM_LOCKONFAULT

2015-07-22 Thread Eric B Munson

On Wed, 22 Jul 2015, Kirill A. Shutemov wrote:

 On Tue, Jul 21, 2015 at 03:59:40PM -0400, Eric B Munson wrote:
  The cost of faulting in all memory to be locked can be very high when
  working with large mappings.  If only portions of the mapping will be
  used this can incur a high penalty for locking.
  
  Now that we have the new VMA flag for the locked but not present state,
  expose it as an mmap option like MAP_LOCKED - VM_LOCKED.
 
 What is advantage over mmap() + mlock(MLOCK_ONFAULT)?

There isn't one, it was added to maintain parity with the
mlock(MLOCK_LOCK) - mmap(MAP_LOCKED) set.  I think not having will lead
to confusion because we have MAP_LOCKED so why don't we support
LOCKONFAULT from mmap as well.



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4 2/6] mm: mlock: Add new mlock, munlock, and munlockall system calls

2015-07-22 Thread Eric B Munson

On Wed, 22 Jul 2015, Vlastimil Babka wrote:

 On 07/21/2015 09:59 PM, Eric B Munson wrote:
 With the refactored mlock code, introduce new system calls for mlock,
 munlock, and munlockall.  The new calls will allow the user to specify
 what lock states are being added or cleared.  mlock2 and munlock2 are
 trivial at the moment, but a follow on patch will add a new mlock state
 making them useful.
 
 munlock2 addresses a limitation of the current implementation.  If a
 
   ^ munlockall2?

Fixed, thanks.



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V4 4/6] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it

2015-07-21 Thread Eric B Munson

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.

For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well).  For the security example, any application transacting
in data that cannot be swapped out (credit card data, medical records,
etc).

This patch introduces the ability to request that pages are not
pre-faulted, but are placed on the unevictable LRU when they are finally
faulted in.  This can be done area at a time via the
mlock2(MLOCK_ONFAULT) or the mlockall(MCL_ONFAULT) system calls.  These
calls can be undone via munlock2(MLOCK_ONFAULT) or
munlockall2(MCL_ONFAULT).

Applying the VM_LOCKONFAULT flag to a mapping with pages that are
already present required the addition of a function in gup.c to pin all
pages which are present in an address range.  It borrows heavily from
__mm_populate().

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MLOCK_LOCKED was used.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Jonathan Corbet cor...@lwn.net
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
Changes from V3:
Do extensive search for VM_LOCKED and ensure that VM_LOCKONFAULT is also handled
 where appropriate

 arch/alpha/include/uapi/asm/mman.h   |  2 +
 arch/mips/include/uapi/asm/mman.h|  2 +
 arch/parisc/include/uapi/asm/mman.h  |  2 +
 arch/powerpc/include/uapi/asm/mman.h |  2 +
 arch/sparc/include/uapi/asm/mman.h   |  2 +
 arch/tile/include/uapi/asm/mman.h|  3 ++
 arch/xtensa/include/uapi/asm/mman.h  |  2 +
 drivers/gpu/drm/drm_vm.c |  8 ++-
 fs/proc/task_mmu.c   |  3 +-
 include/linux/mm.h   |  2 +
 include/uapi/asm-generic/mman.h  |  2 +
 kernel/events/uprobes.c  |  2 +-
 kernel/fork.c|  2 +-
 mm/debug.c   |  1 +
 mm/gup.c |  3 +-
 mm/huge_memory.c |  3 +-
 mm/hugetlb.c |  4 +-
 mm/internal.h|  5 +-
 mm/ksm.c |  2 +-
 mm/madvise.c |  4 +-
 mm/memory.c  |  5 +-
 mm/mlock.c   | 98 +---
 mm/mmap.c| 28 +++
 mm/mremap.c  |  6 +--
 mm/msync.c   |  2 +-
 mm/rmap.c| 12 ++---
 mm/shmem.c   |  2 +-
 mm/swap.c|  3 +-
 mm/vmscan.c  |  2 +-
 29 files changed, 145 insertions(+), 69 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index ec72436..77ae8db 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -37,8 +37,10 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ONFAULT32768   /* lock all pages that are faulted in */
 
 #define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+#define MLOCK_ONFAULT  0x02/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 67c1cdf..71ed81d 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -61,11 +61,13 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
 
 /*
  * Flags for mlock
  */
 #define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+#define MLOCK_ONFAULT  0x02/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index daab994..c0871ce 100644
--- a/arch/parisc

[PATCH V4 0/6] Allow user to request memory to be locked on page fault

2015-07-21 Thread Eric B Munson

not find it.  It should be noted that with a large enough batch size
this two step fault handler can still cause the program to crash if it
reaches far beyond the end of the mapping.

These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise MAP_LOCKONFAULT is significantly faster.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).  The following are the average
values across 20 runs of stream and 10 runs of kernbench after a warmup
run whose results were discarded.

Avg throughput in MB/s from stream using 100 element arrays
Test 4.2-rc1  4.2-rc1+lock-on-fault
Copy:10,566.5 10,421
Scale:   10,685   10,503.5
Add: 12,044.1 11,814.2
Triad:   12,064.8 11,846.3

Kernbench optimal load
 4.2-rc1  4.2-rc1+lock-on-fault
Elapsed Time 78.453   78.991
User Time64.2395  65.2355
System Time  9.7335   9.7085
Context Switches 22211.5  22412.1
Sleeps   14965.3  14956.1

---
Changes from V3:
Ensure that pages present when mlock2(MLOCK_ONFAULT) is called are locked
Ensure that VM_LOCKONFAULT is handled in cases that used to only check VM_LOCKED
Add tests for new system calls
Add missing syscall entries, fix NR_syscalls on multiple arch's
Add missing MAP_LOCKONFAULT for tile

Changes from V2:
Added new system calls for mlock, munlock, and munlockall with added
flags arguments for controlling how memory is locked or unlocked.

Eric B Munson (6):
  mm: mlock: Refactor mlock, munlock, and munlockall code
  mm: mlock: Add new mlock, munlock, and munlockall system calls
  mm: gup: Add mm_lock_present()
  mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it
  mm: mmap: Add mmap flag to request VM_LOCKONFAULT
  selftests: vm: Add tests for lock on fault

 arch/alpha/include/asm/unistd.h |   2 +-
 arch/alpha/include/uapi/asm/mman.h  |   5 +
 arch/alpha/include/uapi/asm/unistd.h|   3 +
 arch/alpha/kernel/systbls.S |   3 +
 arch/arm/include/asm/unistd.h   |   2 +-
 arch/arm/include/uapi/asm/unistd.h  |   3 +
 arch/arm/kernel/calls.S |   3 +
 arch/arm64/include/asm/unistd32.h   |   6 +
 arch/avr32/include/uapi/asm/unistd.h|   3 +
 arch/avr32/kernel/syscall_table.S   |   3 +
 arch/blackfin/include/uapi/asm/unistd.h |   3 +
 arch/blackfin/mach-common/entry.S   |   3 +
 arch/cris/arch-v10/kernel/entry.S   |   3 +
 arch/cris/arch-v32/kernel/entry.S   |   3 +
 arch/frv/kernel/entry.S |   3 +
 arch/ia64/include/asm/unistd.h  |   2 +-
 arch/ia64/include/uapi/asm/unistd.h |   3 +
 arch/ia64/kernel/entry.S|   3 +
 arch/m32r/kernel/entry.S|   3 +
 arch/m32r/kernel/syscall_table.S|   3 +
 arch/m68k/include/asm/unistd.h  |   2 +-
 arch/m68k/include/uapi/asm/unistd.h |   3 +
 arch/m68k/kernel/syscalltable.S |   3 +
 arch/microblaze/include/uapi/asm/unistd.h   |   3 +
 arch/microblaze/kernel/syscall_table.S  |   3 +
 arch/mips/include/uapi/asm/mman.h   |   8 +
 arch/mips/include/uapi/asm/unistd.h |  21 +-
 arch/mips/kernel/scall32-o32.S  |   3 +
 arch/mips/kernel/scall64-64.S   |   3 +
 arch/mips/kernel/scall64-n32.S  |   3 +
 arch/mips/kernel/scall64-o32.S  |   3 +
 arch/mn10300/kernel/entry.S |   3 +
 arch/parisc/include/uapi/asm/mman.h |   5 +
 arch/parisc/include/uapi/asm/unistd.h   |   5 +-
 arch/powerpc/include/uapi/asm/mman.h|   5 +
 arch/powerpc/include/uapi/asm/unistd.h  |   3 +
 arch/s390/include/uapi/asm/unistd.h |   5 +-
 arch/s390/kernel/compat_wrapper.c   |   3 +
 arch/s390/kernel/syscalls.S |   3 +
 arch/sh/kernel/syscalls_32.S|   3 +
 arch/sparc/include/uapi/asm/mman.h  |   5 +
 arch/sparc/include/uapi/asm/unistd.h|   5 +-
 arch/sparc/kernel/systbls_32.S  |   2 +-
 arch/sparc/kernel/systbls_64.S  |   4 +-
 arch/tile/include/uapi/asm/mman.h   |   9 +
 arch/x86/entry/syscalls/syscall_32.tbl  |   3 +
 arch/x86/entry/syscalls/syscall_64.tbl  |   3 +
 arch/xtensa/include/uapi/asm/mman.h |   8 +
 arch/xtensa/include/uapi/asm/unistd.h   |  10 +-
 drivers/gpu/drm/drm_vm.c|   8 +-
 fs/proc/task_mmu.c  |   3 +-
 include/linux/mm.h  |   2 +
 include/linux/mman.h|   3 +-
 include/linux/syscalls.h|   4 +
 include/uapi/asm-generic/mman.h |   5 +
 include/uapi/asm-generic/unistd.h   |   8 +-
 kernel/events/core.c|   2 +
 kernel/events/uprobes.c |   2 +-
 kernel/fork.c

[PATCH V4 5/6] mm: mmap: Add mmap flag to request VM_LOCKONFAULT

2015-07-21 Thread Eric B Munson

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.

Now that we have the new VMA flag for the locked but not present state,
expose it as an mmap option like MAP_LOCKED - VM_LOCKED.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Paul Gortmaker paul.gortma...@windriver.com
Cc: Chris Metcalf cmetc...@ezchip.com
Cc: Guenter Roeck li...@roeck-us.net
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
Changes from V3:
 Add missing MAP_LOCKONFAULT to tile

 arch/alpha/include/uapi/asm/mman.h   | 1 +
 arch/mips/include/uapi/asm/mman.h| 1 +
 arch/parisc/include/uapi/asm/mman.h  | 1 +
 arch/powerpc/include/uapi/asm/mman.h | 1 +
 arch/sparc/include/uapi/asm/mman.h   | 1 +
 arch/tile/include/uapi/asm/mman.h| 1 +
 arch/xtensa/include/uapi/asm/mman.h  | 1 +
 include/linux/mman.h | 3 ++-
 include/uapi/asm-generic/mman.h  | 1 +
 kernel/events/core.c | 2 ++
 mm/mmap.c| 6 --
 11 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 77ae8db..3f80ca4 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -30,6 +30,7 @@
 #define MAP_NONBLOCK   0x4 /* do not block on IO */
 #define MAP_STACK  0x8 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x10/* create a huge page mapping */
+#define MAP_LOCKONFAULT0x20/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_SYNC2   /* synchronous memory sync */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 71ed81d..905c1ea 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -48,6 +48,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 /*
  * Flags for msync
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index c0871ce..c4695f6 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -24,6 +24,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_SYNC1   /* synchronous memory sync */
 #define MS_ASYNC   2   /* sync memory asynchronously */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index f93f7eb..40a3fda 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -31,5 +31,6 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 8cd2ebc..3d74ab7 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -26,6 +26,7 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8000  /* Lock pages after they are 
faulted in, do not prefault */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/tile/include/uapi/asm/mman.h 
b/arch/tile/include/uapi/asm/mman.h
index acdd013..800e5c3 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -29,6 +29,7 @@
 #define

[PATCH V4 2/6] mm: mlock: Add new mlock, munlock, and munlockall system calls

2015-07-21 Thread Eric B Munson

With the refactored mlock code, introduce new system calls for mlock,
munlock, and munlockall.  The new calls will allow the user to specify
what lock states are being added or cleared.  mlock2 and munlock2 are
trivial at the moment, but a follow on patch will add a new mlock state
making them useful.

munlock2 addresses a limitation of the current implementation.  If a
user calls mlockall(MCL_CURRENT | MCL_FUTURE) and then later decides
that MCL_FUTURE should be removed, they would have to call munlockall()
followed by mlockall(MCL_CURRENT) which could potentially be very
expensive.  The new munlockall2 system call allows a user to simply
clear the MCL_FUTURE flag.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Geert Uytterhoeven ge...@linux-m68k.org
Cc: Catalin Marinas catalin.mari...@arm.com
Cc: Stephen Rothwell s...@canb.auug.org.au
Cc: Guenter Roeck li...@roeck-us.net
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: adi-buildroot-de...@lists.sourceforge.net
Cc: linux-cris-ker...@axis.com
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@linux-mips.org
Cc: linux-am33-l...@redhat.com
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
---
Changes from V3:
* Do a (hopefully) complete job of adding the new system calls

 arch/alpha/include/asm/unistd.h   |  2 +-
 arch/alpha/include/uapi/asm/mman.h|  2 ++
 arch/alpha/include/uapi/asm/unistd.h  |  3 +++
 arch/alpha/kernel/systbls.S   |  3 +++
 arch/arm/include/asm/unistd.h |  2 +-
 arch/arm/include/uapi/asm/unistd.h|  3 +++
 arch/arm/kernel/calls.S   |  3 +++
 arch/arm64/include/asm/unistd32.h |  6 ++
 arch/avr32/include/uapi/asm/unistd.h  |  3 +++
 arch/avr32/kernel/syscall_table.S |  3 +++
 arch/blackfin/include/uapi/asm/unistd.h   |  3 +++
 arch/blackfin/mach-common/entry.S |  3 +++
 arch/cris/arch-v10/kernel/entry.S |  3 +++
 arch/cris/arch-v32/kernel/entry.S |  3 +++
 arch/frv/kernel/entry.S   |  3 +++
 arch/ia64/include/asm/unistd.h|  2 +-
 arch/ia64/include/uapi/asm/unistd.h   |  3 +++
 arch/ia64/kernel/entry.S  |  3 +++
 arch/m32r/kernel/entry.S  |  3 +++
 arch/m32r/kernel/syscall_table.S  |  3 +++
 arch/m68k/include/asm/unistd.h|  2 +-
 arch/m68k/include/uapi/asm/unistd.h   |  3 +++
 arch/m68k/kernel/syscalltable.S   |  3 +++
 arch/microblaze/include/uapi/asm/unistd.h |  3 +++
 arch/microblaze/kernel/syscall_table.S|  3 +++
 arch/mips/include/uapi/asm/mman.h |  5 +
 arch/mips/include/uapi/asm/unistd.h   | 21 +++--
 arch/mips/kernel/scall32-o32.S|  3 +++
 arch/mips/kernel/scall64-64.S |  3 +++
 arch/mips/kernel/scall64-n32.S|  3 +++
 arch/mips/kernel/scall64-o32.S|  3 +++
 arch/mn10300/kernel/entry.S   |  3 +++
 arch/parisc/include/uapi/asm/mman.h   |  2 ++
 arch/parisc/include/uapi/asm/unistd.h |  5 -
 arch/powerpc/include/uapi/asm/mman.h  |  2 ++
 arch/powerpc/include/uapi/asm/unistd.h|  3 +++
 arch/s390/include/uapi/asm/unistd.h   |  5 -
 arch/s390/kernel/compat_wrapper.c |  3 +++
 arch/s390/kernel/syscalls.S   |  3 +++
 arch/sh/kernel/syscalls_32.S  |  3 +++
 arch/sparc/include/uapi/asm/mman.h|  2 ++
 arch/sparc/include/uapi/asm/unistd.h  |  5 -
 arch/sparc/kernel/systbls_32.S|  2 +-
 arch/sparc/kernel/systbls_64.S|  4 ++--
 arch/tile/include/uapi/asm/mman.h |  5 +
 arch/x86/entry/syscalls/syscall_32.tbl|  3 +++
 arch/x86/entry/syscalls/syscall_64.tbl|  3 +++
 arch/xtensa/include/uapi/asm/mman.h   |  5 +
 arch/xtensa/include/uapi/asm/unistd.h | 10 --
 include/linux/syscalls.h  |  4 
 include/uapi/asm-generic/mman.h   |  2 ++
 include/uapi/asm-generic/unistd.h |  8 +++-
 kernel/sys_ni.c   |  3 +++
 mm/mlock.c| 28 
 54 files changed, 205 insertions(+), 19 deletions(-)

diff --git a/arch/alpha/include/asm/unistd.h b/arch/alpha/include/asm/unistd.h
index a56e608..1d09392 100644
--- a/arch/alpha/include/asm/unistd.h
+++ b/arch/alpha/include/asm/unistd.h
@@ -3,7 +3,7 @@
 
 #include uapi/asm/unistd.h
 
-#define NR_SYSCALLS514
+#define NR_SYSCALLS517
 
 #define __ARCH_WANT_OLD_READDIR
 #define __ARCH_WANT_STAT64
diff --git

Re: [PATCH V3 3/5] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it

2015-07-10 Thread Eric B Munson

On Fri, 10 Jul 2015, Jonathan Corbet wrote:

 On Thu, 9 Jul 2015 14:46:35 -0400
 Eric B Munson emun...@akamai.com wrote:
 
   One other question...if I call mlock2(MLOCK_ONFAULT) on a range that
   already has resident pages, I believe that those pages will not be locked
   until they are reclaimed and faulted back in again, right?  I suspect that
   could be surprising to users.  
  
  That is the case.  I am looking into what it would take to find only the
  present pages in a range and lock them, if that is the behavior that is
  preferred I can include it in the updated series.
 
 For whatever my $0.02 is worth, I think that should be done.  Otherwise
 the mlock2() interface is essentially nondeterministic; you'll never
 really know if a specific page is locked or not.
 
 Thanks,
 
 jon

Okay, I likely won't have the new set out today then.  This change is
more invasive.  IIUC, I need an equivalent to __get_user_page() skips
pages which are not present instead of faulting in and the call chain to
get to it.  Unless there is an easier way that I am missing.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 3/5] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it

2015-07-09 Thread Eric B Munson

On Wed, 08 Jul 2015, Jonathan Corbet wrote:

 On Wed, 8 Jul 2015 16:34:56 -0400
 Eric B Munson emun...@akamai.com wrote:
 
   Quick, possibly dumb question: I've been beating my head against these for
   a little bit, and I can't figure out what's supposed to happen in this
   case:
   
 mlock2(addr, len, MLOCK_ONFAULT);
 munlock2(addr, len, MLOCK_LOCKED);
   
   It looks to me like it will clear VM_LOCKED without actually unlocking any
   pages.  Is that the intended result?  
  
  This is not quite right, what happens when you call munlock2(addr, len,
  MLOCK_LOCKED); is we call apply_vma_flags(addr, len, VM_LOCKED, false).
 
 From your explanation, it looks like what I said *was* right...what I was
 missing was the fact that VM_LOCKED isn't set in the first place.  So that
 call would be a no-op, clearing a flag that's already cleared.

Sorry, I misread the original.  You are correct with the addition that
the call to munlock2(MLOCK_LOCKED) is a noop in this case.

 
 One other question...if I call mlock2(MLOCK_ONFAULT) on a range that
 already has resident pages, I believe that those pages will not be locked
 until they are reclaimed and faulted back in again, right?  I suspect that
 could be surprising to users.

That is the case.  I am looking into what it would take to find only the
present pages in a range and lock them, if that is the behavior that is
preferred I can include it in the updated series.

 
 Thanks,
 
 jon


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 0/5] Allow user to request memory to be locked on page fault

2015-07-08 Thread Eric B Munson

On Tue, 07 Jul 2015, Andrew Morton wrote:

 On Tue,  7 Jul 2015 13:03:38 -0400 Eric B Munson emun...@akamai.com wrote:
 
  mlock() allows a user to control page out of program memory, but this
  comes at the cost of faulting in the entire mapping when it is
  allocated.  For large mappings where the entire area is not necessary
  this is not ideal.  Instead of forcing all locked pages to be present
  when they are allocated, this set creates a middle ground.  Pages are
  marked to be placed on the unevictable LRU (locked) when they are first
  used, but they are not faulted in by the mlock call.
  
  This series introduces a new mlock() system call that takes a flags
  argument along with the start address and size.  This flags argument
  gives the caller the ability to request memory be locked in the
  traditional way, or to be locked after the page is faulted in.  New
  calls are added for munlock() and munlockall() which give the called a
  way to specify which flags are supposed to be cleared.  A new MCL flag
  is added to mirror the lock on fault behavior from mlock() in
  mlockall().  Finally, a flag for mmap() is added that allows a user to
  specify that the covered are should not be paged out, but only after the
  memory has been used the first time.
 
 Thanks for sticking with this.  Adding new syscalls is a bit of a
 hassle but I do think we end up with a better interface - the existing
 mlock/munlock/mlockall interfaces just aren't appropriate for these
 things.
 
 I don't know whether these syscalls should be documented via new
 manpages, or if we should instead add them to the existing
 mlock/munlock/mlockall manpages.  Michael, could you please advise?
 

Thanks for adding the series.  I owe you several updates (getting the
new syscall right for all architectures and a set of tests for the new
syscalls).  Would you prefer a new pair of patches or I update this set?

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 3/5] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it

2015-07-08 Thread Eric B Munson

On Wed, 08 Jul 2015, Jonathan Corbet wrote:

 On Tue,  7 Jul 2015 13:03:41 -0400
 Eric B Munson emun...@akamai.com wrote:
 
  This patch introduces the ability to request that pages are not
  pre-faulted, but are placed on the unevictable LRU when they are finally
  faulted in.  This can be done area at a time via the
  mlock2(MLOCK_ONFAULT) or the mlockall(MCL_ONFAULT) system calls.  These
  calls can be undone via munlock2(MLOCK_ONFAULT) or
  munlockall2(MCL_ONFAULT).
 
 Quick, possibly dumb question: I've been beating my head against these for
 a little bit, and I can't figure out what's supposed to happen in this
 case:
 
   mlock2(addr, len, MLOCK_ONFAULT);
   munlock2(addr, len, MLOCK_LOCKED);
 
 It looks to me like it will clear VM_LOCKED without actually unlocking any
 pages.  Is that the intended result?

This is not quite right, what happens when you call munlock2(addr, len,
MLOCK_LOCKED); is we call apply_vma_flags(addr, len, VM_LOCKED, false).
The false argument means that we intend to clear the specified flags.
Here is the relevant snippet:
...
newflags = vma-vm_flags;
if (add_flags) {
newflags = ~(VM_LOCKED | VM_LOCKONFAULT);
newflags |= flags;
} else {
newflags = ~flags;
}
...

Note that when we are adding flags, we first clear both VM_LOCKED and
VM_LOCKONFAULT.  This was done to match the behavior found in
mlockall().  When we are remove flags, we simply clear the specified
flag(s).

So in your example the state of the VMAs covered by addr and len would
remain unchanged.

It sounds like apply_vma_flags() needs a comment covering this topic, I
will include that in the set I am working on now.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 3/5] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it

2015-07-07 Thread Eric B Munson

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.

For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well).  For the security example, any application transacting
in data that cannot be swapped out (credit card data, medical records,
etc).

This patch introduces the ability to request that pages are not
pre-faulted, but are placed on the unevictable LRU when they are finally
faulted in.  This can be done area at a time via the
mlock2(MLOCK_ONFAULT) or the mlockall(MCL_ONFAULT) system calls.  These
calls can be undone via munlock2(MLOCK_ONFAULT) or
munlockall2(MCL_ONFAULT).

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MLOCK_LOCKED was used.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 arch/alpha/include/uapi/asm/mman.h   |  2 +
 arch/mips/include/uapi/asm/mman.h|  2 +
 arch/parisc/include/uapi/asm/mman.h  |  2 +
 arch/powerpc/include/uapi/asm/mman.h |  2 +
 arch/sparc/include/uapi/asm/mman.h   |  2 +
 arch/tile/include/uapi/asm/mman.h|  3 ++
 arch/xtensa/include/uapi/asm/mman.h  |  2 +
 fs/proc/task_mmu.c   |  1 +
 include/linux/mm.h   |  1 +
 include/uapi/asm-generic/mman.h  |  2 +
 mm/mlock.c   | 72 ++--
 mm/mmap.c|  4 +-
 mm/swap.c|  3 +-
 13 files changed, 75 insertions(+), 23 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index ec72436..77ae8db 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -37,8 +37,10 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ONFAULT32768   /* lock all pages that are faulted in */
 
 #define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+#define MLOCK_ONFAULT  0x02/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 67c1cdf..71ed81d 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -61,11 +61,13 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
 
 /*
  * Flags for mlock
  */
 #define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+#define MLOCK_ONFAULT  0x02/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index daab994..c0871ce 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -31,8 +31,10 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
 
 #define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+#define MLOCK_ONFAULT  0x02/* Lock pages in range after they are 
faulted in, do not prefault */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 189e85f..f93f7eb 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,8 +22,10 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ONFAULT0x8000  /* lock all pages that are faulted in */
 
 #define MLOCK_LOCKED   0x01

[PATCH V3 4/5] mm: mmap: Add mmap flag to request VM_LOCKONFAULT

2015-07-07 Thread Eric B Munson

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.

Now that we have the new VMA flag for the locked but not present state,
expose it  as an mmap option like MAP_LOCKED - VM_LOCKED.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 arch/alpha/include/uapi/asm/mman.h   | 1 +
 arch/mips/include/uapi/asm/mman.h| 1 +
 arch/parisc/include/uapi/asm/mman.h  | 1 +
 arch/powerpc/include/uapi/asm/mman.h | 1 +
 arch/sparc/include/uapi/asm/mman.h   | 1 +
 arch/xtensa/include/uapi/asm/mman.h  | 1 +
 include/linux/mman.h | 3 ++-
 include/uapi/asm-generic/mman.h  | 1 +
 mm/mmap.c| 2 +-
 9 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 77ae8db..3f80ca4 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -30,6 +30,7 @@
 #define MAP_NONBLOCK   0x4 /* do not block on IO */
 #define MAP_STACK  0x8 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x10/* create a huge page mapping */
+#define MAP_LOCKONFAULT0x20/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_SYNC2   /* synchronous memory sync */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 71ed81d..905c1ea 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -48,6 +48,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 /*
  * Flags for msync
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index c0871ce..c4695f6 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -24,6 +24,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_SYNC1   /* synchronous memory sync */
 #define MS_ASYNC   2   /* sync memory asynchronously */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index f93f7eb..40a3fda 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -31,5 +31,6 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 8cd2ebc..3d74ab7 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -26,6 +26,7 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8000  /* Lock pages after they are 
faulted in, do not prefault */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/xtensa/include/uapi/asm/mman.h 
b/arch/xtensa/include/uapi/asm/mman.h
index 5725a15..689e1f2 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -55,6 +55,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT

[PATCH V3 2/5] mm: mlock: Add new mlock, munlock, and munlockall system calls

2015-07-07 Thread Eric B Munson

With the refactored mlock code, introduce new system calls for mlock,
munlock, and munlockall.  The new calls will allow the user to specify
what lock states are being added or cleared.  mlock2 and munlock2 are
trivial at the moment, but a follow on patch will add a new mlock state
making them useful.

munlock2 addresses a limitation of the current implementation.  If a
user calls mlockall(MCL_CURRENT | MCL_FUTURE) and then later decides
that MCL_FUTURE should be removed, they would have to call munlockall()
followed by mlockall(MCL_CURRENT) which could potentially be very
expensive.  The new munlockall2 system call allows a user to simply
clear the MCL_FUTURE flag.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Vlastimil Babka vba...@suse.cz
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: adi-buildroot-de...@lists.sourceforge.net
Cc: linux-cris-ker...@axis.com
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@linux-mips.org
Cc: linux-am33-l...@redhat.com
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/alpha/include/asm/unistd.h|  2 +-
 arch/alpha/include/uapi/asm/mman.h |  2 ++
 arch/alpha/kernel/systbls.S|  3 +++
 arch/arm/kernel/calls.S|  3 +++
 arch/arm64/include/asm/unistd32.h  |  6 ++
 arch/avr32/kernel/syscall_table.S  |  3 +++
 arch/blackfin/mach-common/entry.S  |  3 +++
 arch/cris/arch-v10/kernel/entry.S  |  3 +++
 arch/cris/arch-v32/kernel/entry.S  |  3 +++
 arch/frv/kernel/entry.S|  3 +++
 arch/ia64/kernel/entry.S   |  3 +++
 arch/m32r/kernel/entry.S   |  3 +++
 arch/m32r/kernel/syscall_table.S   |  3 +++
 arch/m68k/kernel/syscalltable.S|  3 +++
 arch/microblaze/kernel/syscall_table.S |  3 +++
 arch/mips/include/uapi/asm/mman.h  |  5 +
 arch/mips/kernel/scall32-o32.S |  3 +++
 arch/mips/kernel/scall64-64.S  |  3 +++
 arch/mips/kernel/scall64-n32.S |  3 +++
 arch/mips/kernel/scall64-o32.S |  3 +++
 arch/mn10300/kernel/entry.S|  3 +++
 arch/parisc/include/uapi/asm/mman.h|  2 ++
 arch/powerpc/include/uapi/asm/mman.h   |  2 ++
 arch/s390/kernel/syscalls.S|  3 +++
 arch/sh/kernel/syscalls_32.S   |  3 +++
 arch/sparc/include/uapi/asm/mman.h |  2 ++
 arch/sparc/kernel/systbls_32.S |  2 +-
 arch/sparc/kernel/systbls_64.S |  4 ++--
 arch/tile/include/uapi/asm/mman.h  |  5 +
 arch/x86/entry/syscalls/syscall_32.tbl |  3 +++
 arch/x86/entry/syscalls/syscall_64.tbl |  3 +++
 arch/xtensa/include/uapi/asm/mman.h|  5 +
 arch/xtensa/include/uapi/asm/unistd.h  | 10 --
 include/linux/syscalls.h   |  4 
 include/uapi/asm-generic/mman.h|  2 ++
 include/uapi/asm-generic/unistd.h  |  8 +++-
 kernel/sys_ni.c|  3 +++
 mm/mlock.c | 28 
 38 files changed, 148 insertions(+), 7 deletions(-)

diff --git a/arch/alpha/include/asm/unistd.h b/arch/alpha/include/asm/unistd.h
index a56e608..1d09392 100644
--- a/arch/alpha/include/asm/unistd.h
+++ b/arch/alpha/include/asm/unistd.h
@@ -3,7 +3,7 @@
 
 #include uapi/asm/unistd.h
 
-#define NR_SYSCALLS514
+#define NR_SYSCALLS517
 
 #define __ARCH_WANT_OLD_READDIR
 #define __ARCH_WANT_STAT64
diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..ec72436 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -38,6 +38,8 @@
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
 
+#define MLOCK_LOCKED   0x01/* Lock and populate the specified 
range */
+
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
 #define MADV_SEQUENTIAL2   /* expect sequential page 
references */
diff --git a/arch/alpha/kernel/systbls.S b/arch/alpha/kernel/systbls.S
index 9b62e3f..04d1cce 100644
--- a/arch/alpha/kernel/systbls.S
+++ b/arch/alpha/kernel/systbls.S
@@ -532,6 +532,9 @@ sys_call_table:
.quad sys_getrandom
.quad sys_memfd_create
.quad sys_execveat
+   .quad sys_mlock2
+   .quad sys_munlock2  /* 515 */
+   .quad sys_munlockall2
 
.size sys_call_table, . - sys_call_table
.type sys_call_table, @object
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 05745eb

[PATCH V3 0/5] Allow user to request memory to be locked on page fault

2015-07-07 Thread Eric B Munson

not find it.  It should be noted that with a large enough batch size
this two step fault handler can still cause the program to crash if it
reaches far beyond the end of the mapping.

These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise MAP_LOCKONFAULT is significantly faster.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).  The following are the average
values across 20 runs of stream and 10 runs of kernbench after a warmup
run whose results were discarded.

Avg throughput in MB/s from stream using 100 element arrays
Test 4.2-rc1  4.2-rc1+lock-on-fault
Copy:10,566.5 10,421
Scale:   10,685   10,503.5
Add: 12,044.1 11,814.2
Triad:   12,064.8 11,846.3

Kernbench optimal load
 4.2-rc1  4.2-rc1+lock-on-fault
Elapsed Time 78.453   78.991
User Time64.2395  65.2355
System Time  9.7335   9.7085
Context Switches 22211.5  22412.1
Sleeps   14965.3  14956.1

---

Changes from V2:

Added new system calls for mlock, munlock, and munlockall with added
flags arguments for controlling how memory is locked or unlocked.

Eric B Munson (5):
  mm: mlock: Refactor mlock, munlock, and munlockall code
  mm: mlock: Add new mlock, munlock, and munlockall system calls
  mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it
  mm: mmap: Add mmap flag to request VM_LOCKONFAULT
  selftests: vm: Add tests for lock on fault

 arch/alpha/include/asm/unistd.h |   2 +-
 arch/alpha/include/uapi/asm/mman.h  |   5 +
 arch/alpha/kernel/systbls.S |   3 +
 arch/arm/kernel/calls.S |   3 +
 arch/arm64/include/asm/unistd32.h   |   6 +
 arch/avr32/kernel/syscall_table.S   |   3 +
 arch/blackfin/mach-common/entry.S   |   3 +
 arch/cris/arch-v10/kernel/entry.S   |   3 +
 arch/cris/arch-v32/kernel/entry.S   |   3 +
 arch/frv/kernel/entry.S |   3 +
 arch/ia64/kernel/entry.S|   3 +
 arch/m32r/kernel/entry.S|   3 +
 arch/m32r/kernel/syscall_table.S|   3 +
 arch/m68k/kernel/syscalltable.S |   3 +
 arch/microblaze/kernel/syscall_table.S  |   3 +
 arch/mips/include/uapi/asm/mman.h   |   8 +
 arch/mips/kernel/scall32-o32.S  |   3 +
 arch/mips/kernel/scall64-64.S   |   3 +
 arch/mips/kernel/scall64-n32.S  |   3 +
 arch/mips/kernel/scall64-o32.S  |   3 +
 arch/mn10300/kernel/entry.S |   3 +
 arch/parisc/include/uapi/asm/mman.h |   5 +
 arch/powerpc/include/uapi/asm/mman.h|   5 +
 arch/s390/kernel/syscalls.S |   3 +
 arch/sh/kernel/syscalls_32.S|   3 +
 arch/sparc/include/uapi/asm/mman.h  |   5 +
 arch/sparc/kernel/systbls_32.S  |   2 +-
 arch/sparc/kernel/systbls_64.S  |   4 +-
 arch/tile/include/uapi/asm/mman.h   |   8 +
 arch/x86/entry/syscalls/syscall_32.tbl  |   3 +
 arch/x86/entry/syscalls/syscall_64.tbl  |   3 +
 arch/xtensa/include/uapi/asm/mman.h |   8 +
 arch/xtensa/include/uapi/asm/unistd.h   |  10 +-
 fs/proc/task_mmu.c  |   1 +
 include/linux/mm.h  |   1 +
 include/linux/mman.h|   3 +-
 include/linux/syscalls.h|   4 +
 include/uapi/asm-generic/mman.h |   5 +
 include/uapi/asm-generic/unistd.h   |   8 +-
 kernel/sys_ni.c |   3 +
 mm/mlock.c  | 135 +--
 mm/mmap.c   |   6 +-
 mm/swap.c   |   3 +-
 tools/testing/selftests/vm/Makefile |   2 +
 tools/testing/selftests/vm/lock-on-fault.c  | 342 
 tools/testing/selftests/vm/on-fault-limit.c |  47 
 tools/testing/selftests/vm/run_vmtests  |  22 ++
 47 files changed, 681 insertions(+), 32 deletions(-)
 create mode 100644 tools/testing/selftests/vm/lock-on-fault.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan shua...@osg.samsung.com
Cc: Michal Hocko mho...@suse.cz
Cc: Michael Kerrisk mtk.manpa...@gmail.com
Cc: Vlastimil Babka vba...@suse.cz
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org

-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-25 Thread Eric B Munson

On Tue, 23 Jun 2015, Vlastimil Babka wrote:

 On 06/15/2015 04:43 PM, Eric B Munson wrote:
 Note that the semantic of MAP_LOCKED can be subtly surprising:
 
 mlock(2) fails if the memory range cannot get populated to guarantee
 that no future major faults will happen on the range.
 mmap(MAP_LOCKED) on the other hand silently succeeds even if the
 range was populated only
 partially.
 
 ( from http://marc.info/?l=linux-mmm=143152790412727w=2 )
 
 So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While
 MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's
 sufficient reason not to extend mmap by new mlock() flags that can
 be instead applied to the VMA after mmapping, using the proposed
 mlock2() with flags. So I think instead we could deprecate
 MAP_LOCKED more prominently. I doubt the overhead of calling the
 extra syscall matters here?
 
 We could talk about retiring the MAP_LOCKED flag but I suspect that
 would get significantly more pushback than adding a new mmap flag.
 
 Oh no we can't retire as in remove the flag, ever. Just not
 continue the way of mmap() flags related to mlock().
 
 Likely that the overhead does not matter in most cases, but presumably
 there are cases where it does (as we have a MAP_LOCKED flag today).
 Even with the proposed new system calls I think we should have the
 MAP_LOCKONFAULT for parity with MAP_LOCKED.
 
 I'm not convinced, but it's not a major issue.
 
 
 - mlock() takes a `flags' argument.  Presently that's
MLOCK_LOCKED|MLOCK_LOCKONFAULT.
 
 - munlock() takes a `flags' arument.  MLOCK_LOCKED|MLOCK_LOCKONFAULT
to specify which flags are being cleared.
 
 - mlockall() and munlockall() ditto.
 
 
 IOW, LOCKED and LOCKEDONFAULT are treated identically and independently.
 
 Now, that's how we would have designed all this on day one.  And I
 think we can do this now, by adding new mlock2() and munlock2()
 syscalls.  And we may as well deprecate the old mlock() and munlock(),
 not that this matters much.
 
 *should* we do this?  I'm thinking yes - it's all pretty simple
 boilerplate and wrappers and such, and it gets the interface correct,
 and extensible.
 
 If the new LOCKONFAULT functionality is indeed desired (I haven't
 still decided myself) then I agree that would be the cleanest way.
 
 Do you disagree with the use cases I have listed or do you think there
 is a better way of addressing those cases?
 
 I'm somewhat sceptical about the security one. Are security
 sensitive buffers that large to matter? The performance one is more
 convincing and I don't see a better way, so OK.

They can be, the two that come to mind are medical images and high
resolution sensor data.

 
 
 
 What do others think?
 


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-25 Thread Eric B Munson

On Wed, 24 Jun 2015, Michal Hocko wrote:

 On Mon 22-06-15 10:18:06, Eric B Munson wrote:
  On Mon, 22 Jun 2015, Michal Hocko wrote:
  
   On Fri 19-06-15 12:43:33, Eric B Munson wrote:
 [...]
Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
new MAP_LOCKONFAULT flag (or both)? 
   
   I thought the MAP_FAULTPOPULATE (or any other better name) would
   directly translate into VM_FAULTPOPULATE and wouldn't be tight to the
   locked semantic. We already have VM_LOCKED for that. The direct effect
   of the flag would be to prevent from population other than the direct
   page fault - including any speculative actions like fault around or
   read-ahead.
  
  I like the ability to control other speculative population, but I am not
  sure about overloading it with the VM_LOCKONFAULT case.  Here is my
  concern.  If we are using VM_FAULTPOPULATE | VM_LOCKED to denote
  LOCKONFAULT, how can we tell the difference between someone that wants
  to avoid read-ahead and wants to use mlock()?
 
 Not sure I understand. Something like?
 addr = mmap(VM_FAULTPOPULATE) # To prevent speculative mappings into the vma
 [...]
 mlock(addr, len) # Now I want the full mlock semantic

So this leaves us without the LOCKONFAULT semantics?  That is not at all
what I am looking for.  What I want is a way to express 3 possible
states of a VMA WRT locking, locked (populated and all pages on the
unevictable LRU), lock on fault (populated by page fault, pages that are
present are on the unevictable LRU, newly faulted pages are added to
same), and not locked.

 
 and the later to have the full mlock semantic and populate the given
 area regardless of VM_FAULTPOPULATE being set on the vma? This would
 be an interesting question because mlock man page clearly states the
 semantic and that is to _always_ populate or fail. So I originally
 thought that it would obey VM_FAULTPOPULATE but this needs a more
 thinking.
 
  This might lead to some
  interesting states with mlock() and munlock() that take flags.  For
  instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by
  munlock(MLOCK_LOCKED) leaves the VMAs in the same state with
  VM_LOCKONFAULT set. 
 
 This is really confusing. Let me try to rephrase that. So you have
 mlock(addr, len, MLOCK_ONFAULT)
 munlock(addr, len, MLOCK_LOCKED)
 
 IIUC you would expect the vma still being MLOCK_ONFAULT, right? Isn't
 that behavior strange and unexpected? First of all, munlock has
 traditionally dropped the lock on the address range (e.g. what should
 happen if you did plain old munlock(addr, len)). But even without
 that. You are trying to unlock something that hasn't been locked the
 same way. So I would expect -EINVAL at least, if the two modes should be
 really represented by different flags.

I would expect it to remain MLOCK_LOCKONFAULT because the user requested
munlock(addr, len, MLOCK_LOCKED).  It is not currently an error to
unlock memory that is not locked.  We do this because we do not require
the user track what areas are locked.  It is acceptable to have a mostly
locked area with holes unlocked with a single call to munlock that spans
the entire area.  The same semantics should hold for munlock with flags.
If I have an area with MLOCK_LOCKED and MLOCK_ONFAULT interleaved, it
should be acceptable to clear the MLOCK_ONFAULT flag from those areas
with a single munlock call that spans the area.

On top of continuing with munlock semantics, the implementation would
need the ability to rollback an munlock call if it failed after altering
VMAs.  If we have the same interleaved area as before and we go to
return -EINVAL the first time we hit an area that was MLOCK_LOCKED, how
do we restore the state of the VMAs we have already processed, and
possibly merged/split?
 
 Or did you mean the both types of lock like:
 mlock(addr, len, MLOCK_ONFAULT) | mmap(MAP_LOCKONFAULT)
 mlock(addr, len, MLOCK_LOCKED)
 munlock(addr, len, MLOCK_LOCKED)
 
 and that should keep MLOCK_ONFAULT?
 This sounds even more weird to me because that means that the vma in
 question would be locked by two different mechanisms. MLOCK_LOCKED with
 the always populate semantic would rule out MLOCK_ONFAULT so what
 would be the meaning of the other flag then? Also what should regular
 munlock(addr, len) without flags unlock? Both?

This is indeed confusing and not what I was trying to illustrate, but
since you bring it up.  mlockall() currently clears all flags and then
sets the new flags with each subsequent call.  mlock2 would use that
same behavior, if LOCKED was specified for a ONFAULT region, that region
would become LOCKED and vice versa.

I have the new system call set ready, I am waiting to post for rc1 so I
can run the benchmarks again on a base more stable than the middle of a
merge window.  We should wait to hash out implementations until the code
is up rather than talk past eachother here.

 
  If we use VM_FAULTPOPULATE, the same pair of calls
  would clear VM_LOCKED, but leave

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-22 Thread Eric B Munson

On Mon, 22 Jun 2015, Michal Hocko wrote:

 On Fri 19-06-15 12:43:33, Eric B Munson wrote:
  On Fri, 19 Jun 2015, Michal Hocko wrote:
  
   On Thu 18-06-15 16:30:48, Eric B Munson wrote:
On Thu, 18 Jun 2015, Michal Hocko wrote:
   [...]
 Wouldn't it be much more reasonable and straightforward to have
 MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
 explicitly disallow any form of pre-faulting? It would be usable for
 other usecases than with MAP_LOCKED combination.

I don't see a clear case for it being more reasonable, it is one
possible way to solve the problem.
   
   MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault
   around is all or nothing feature. Either all mappings (which support
   this) fault around or none. There is no way to tell the kernel that
   this particular mapping shouldn't fault around. I haven't seen such a
   request yet but we have seen requests to have a way to opt out from
   a global policy in the past (e.g. per-process opt out from THP). So
   I can imagine somebody will come with a request to opt out from any
   speculative operations on the mapped area in the future.
   
But I think it leaves us in an even
more akward state WRT VMA flags.  As you noted in your fix for the
mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
not present.  Having VM_LOCKONFAULT states that this was intentional, if
we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
longer set VM_LOCKONFAULT (unless we want to start mapping it to the
presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
populate failure state harder.
   
   I am not sure I understand your point here. Could you be more specific
   how would you check for that and what for?
  
  My thought on detecting was that someone might want to know if they had
  a VMA that was VM_LOCKED but had not been made present becuase of a
  failure in mmap.  We don't have a way today, but adding VM_LOCKONFAULT
  is at least explicit about what is happening which would make detecting
  the VM_LOCKED but not present state easier. 
 
 One could use /proc/pid/pagemap to query the residency.
 
  This assumes that
  MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like
  it would have to.
 
 Yes, it would have to have a VM flag for the vma.
 
   From my understanding MAP_LOCKONFAULT is essentially
   MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike
   single MAP_LOCKED unfortunately). I would love to also have
   MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really
   skeptical considering how my previous attempt to make MAP_POPULATE
   reasonable went.
  
  Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
  new MAP_LOCKONFAULT flag (or both)? 
 
 I thought the MAP_FAULTPOPULATE (or any other better name) would
 directly translate into VM_FAULTPOPULATE and wouldn't be tight to the
 locked semantic. We already have VM_LOCKED for that. The direct effect
 of the flag would be to prevent from population other than the direct
 page fault - including any speculative actions like fault around or
 read-ahead.

I like the ability to control other speculative population, but I am not
sure about overloading it with the VM_LOCKONFAULT case.  Here is my
concern.  If we are using VM_FAULTPOPULATE | VM_LOCKED to denote
LOCKONFAULT, how can we tell the difference between someone that wants
to avoid read-ahead and wants to use mlock()?  This might lead to some
interesting states with mlock() and munlock() that take flags.  For
instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by
munlock(MLOCK_LOCKED) leaves the VMAs in the same state with
VM_LOCKONFAULT set.  If we use VM_FAULTPOPULATE, the same pair of calls
would clear VM_LOCKED, but leave VM_FAULTPOPULATE.  It may not matter in
the end, but I am concerned about the subtleties here.

 
  If you prefer that MAP_LOCKED |
  MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that
  instead of introducing MAP_LOCKONFAULT.  I went with the new flag
  because to date, we have a one to one mapping of MAP_* to VM_* flags.
  
   
If this is the preferred path for mmap(), I am fine with that. 
   
However,
I would like to see the new system calls that Andrew mentioned (and that
I am testing patches for) go in as well. 
   
   mlock with flags sounds like a good step but I am not sure it will make
   sense in the future. POSIX has screwed that and I am not sure how many
   applications would use it. This ship has sailed long time ago.
  
  I don't know either, but the code is the question, right?  I know that
  we have at least one team that wants it here.
  
   
That way we give users the
ability to request VM_LOCKONFAULT for memory allocated using something
other than mmap.
   
   mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-19 Thread Eric B Munson

On Fri, 19 Jun 2015, Michal Hocko wrote:

 On Thu 18-06-15 16:30:48, Eric B Munson wrote:
  On Thu, 18 Jun 2015, Michal Hocko wrote:
 [...]
   Wouldn't it be much more reasonable and straightforward to have
   MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
   explicitly disallow any form of pre-faulting? It would be usable for
   other usecases than with MAP_LOCKED combination.
  
  I don't see a clear case for it being more reasonable, it is one
  possible way to solve the problem.
 
 MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault
 around is all or nothing feature. Either all mappings (which support
 this) fault around or none. There is no way to tell the kernel that
 this particular mapping shouldn't fault around. I haven't seen such a
 request yet but we have seen requests to have a way to opt out from
 a global policy in the past (e.g. per-process opt out from THP). So
 I can imagine somebody will come with a request to opt out from any
 speculative operations on the mapped area in the future.
 
  But I think it leaves us in an even
  more akward state WRT VMA flags.  As you noted in your fix for the
  mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
  not present.  Having VM_LOCKONFAULT states that this was intentional, if
  we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
  longer set VM_LOCKONFAULT (unless we want to start mapping it to the
  presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
  populate failure state harder.
 
 I am not sure I understand your point here. Could you be more specific
 how would you check for that and what for?

My thought on detecting was that someone might want to know if they had
a VMA that was VM_LOCKED but had not been made present becuase of a
failure in mmap.  We don't have a way today, but adding VM_LOCKONFAULT
is at least explicit about what is happening which would make detecting
the VM_LOCKED but not present state easier.  This assumes that
MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like
it would have to.

 
 From my understanding MAP_LOCKONFAULT is essentially
 MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike
 single MAP_LOCKED unfortunately). I would love to also have
 MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really
 skeptical considering how my previous attempt to make MAP_POPULATE
 reasonable went.

Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
new MAP_LOCKONFAULT flag (or both)?  If you prefer that MAP_LOCKED |
MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that
instead of introducing MAP_LOCKONFAULT.  I went with the new flag
because to date, we have a one to one mapping of MAP_* to VM_* flags.

 
  If this is the preferred path for mmap(), I am fine with that. 
 
  However,
  I would like to see the new system calls that Andrew mentioned (and that
  I am testing patches for) go in as well. 
 
 mlock with flags sounds like a good step but I am not sure it will make
 sense in the future. POSIX has screwed that and I am not sure how many
 applications would use it. This ship has sailed long time ago.

I don't know either, but the code is the question, right?  I know that
we have at least one team that wants it here.

 
  That way we give users the
  ability to request VM_LOCKONFAULT for memory allocated using something
  other than mmap.
 
 mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even
 without changing mlock syscall.

That is true as long as MAP_FAULTPOPULATE set a flag in the VMA(s).  It
doesn't cover the actual case I was asking about, which is how do I get
lock on fault on malloc'd memory?

  
This patch introduces the ability to request that pages are not
pre-faulted, but are placed on the unevictable LRU when they are finally
faulted in.

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MAP_LOCKED was used.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 [...]
 -- 
 Michal Hocko
 SUSE Labs


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-18 Thread Eric B Munson

On Thu, 18 Jun 2015, Michal Hocko wrote:

 [Sorry for the late reply - I meant to answer in the previous threads
  but something always preempted me from that]
 
 On Wed 10-06-15 09:26:48, Eric B Munson wrote:
  The cost of faulting in all memory to be locked can be very high when
  working with large mappings.  If only portions of the mapping will be
  used this can incur a high penalty for locking.
  
  For the example of a large file, this is the usage pattern for a large
  statical language model (probably applies to other statical or graphical
  models as well).  For the security example, any application transacting
  in data that cannot be swapped out (credit card data, medical records,
  etc).
 
 Such a use case makes some sense to me but I am not sure the way you
 implement it is the right one. This is another mlock related flag for
 mmap with a different semantic. You do not want to prefault but e.g. is
 the readahead or fault around acceptable? I do not see anything in your
 patch to handle those...

We haven't bumped into readahead or fault around causing performance
problems for us.  If they cause problems for users when LOCKONFAULT is
in use then we can address them.

 
 Wouldn't it be much more reasonable and straightforward to have
 MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
 explicitly disallow any form of pre-faulting? It would be usable for
 other usecases than with MAP_LOCKED combination.

I don't see a clear case for it being more reasonable, it is one
possible way to solve the problem.  But I think it leaves us in an even
more akward state WRT VMA flags.  As you noted in your fix for the
mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
not present.  Having VM_LOCKONFAULT states that this was intentional, if
we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
longer set VM_LOCKONFAULT (unless we want to start mapping it to the
presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
populate failure state harder.

If this is the preferred path for mmap(), I am fine with that.  However,
I would like to see the new system calls that Andrew mentioned (and that
I am testing patches for) go in as well.  That way we give users the
ability to request VM_LOCKONFAULT for memory allocated using something
other than mmap.

 
  This patch introduces the ability to request that pages are not
  pre-faulted, but are placed on the unevictable LRU when they are finally
  faulted in.
  
  To keep accounting checks out of the page fault path, users are billed
  for the entire mapping lock as if MAP_LOCKED was used.
  
  Signed-off-by: Eric B Munson emun...@akamai.com
  Cc: Michal Hocko mho...@suse.cz
  Cc: linux-al...@vger.kernel.org
  Cc: linux-ker...@vger.kernel.org
  Cc: linux-m...@linux-mips.org
  Cc: linux-par...@vger.kernel.org
  Cc: linuxppc-dev@lists.ozlabs.org
  Cc: sparcli...@vger.kernel.org
  Cc: linux-xte...@linux-xtensa.org
  Cc: linux...@kvack.org
  Cc: linux-a...@vger.kernel.org
  Cc: linux-...@vger.kernel.org
  ---
   arch/alpha/include/uapi/asm/mman.h   | 1 +
   arch/mips/include/uapi/asm/mman.h| 1 +
   arch/parisc/include/uapi/asm/mman.h  | 1 +
   arch/powerpc/include/uapi/asm/mman.h | 1 +
   arch/sparc/include/uapi/asm/mman.h   | 1 +
   arch/tile/include/uapi/asm/mman.h| 1 +
   arch/xtensa/include/uapi/asm/mman.h  | 1 +
   include/linux/mm.h   | 1 +
   include/linux/mman.h | 3 ++-
   include/uapi/asm-generic/mman.h  | 1 +
   mm/mmap.c| 4 ++--
   mm/swap.c| 3 ++-
   12 files changed, 15 insertions(+), 4 deletions(-)
  
  diff --git a/arch/alpha/include/uapi/asm/mman.h 
  b/arch/alpha/include/uapi/asm/mman.h
  index 0086b47..15e96e1 100644
  --- a/arch/alpha/include/uapi/asm/mman.h
  +++ b/arch/alpha/include/uapi/asm/mman.h
  @@ -30,6 +30,7 @@
   #define MAP_NONBLOCK   0x4 /* do not block on IO */
   #define MAP_STACK  0x8 /* give out an address that is best 
  suited for process/thread stacks */
   #define MAP_HUGETLB0x10/* create a huge page mapping */
  +#define MAP_LOCKONFAULT0x20/* Lock pages after they are 
  faulted in, do not prefault */
   
   #define MS_ASYNC   1   /* sync memory asynchronously */
   #define MS_SYNC2   /* synchronous memory sync */
  diff --git a/arch/mips/include/uapi/asm/mman.h 
  b/arch/mips/include/uapi/asm/mman.h
  index cfcb876..47846a5 100644
  --- a/arch/mips/include/uapi/asm/mman.h
  +++ b/arch/mips/include/uapi/asm/mman.h
  @@ -48,6 +48,7 @@
   #define MAP_NONBLOCK   0x2 /* do not block on IO */
   #define MAP_STACK  0x4 /* give out an address that is best 
  suited for process/thread stacks */
   #define MAP_HUGETLB0x8 /* create a huge page mapping */
  +#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
  faulted

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-15 Thread Eric B Munson

On Thu, 11 Jun 2015, Andrew Morton wrote:

 On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote:
 
   Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not sure
   that even makes sense but the behaviour should be understood and
   tested.
 
  I have extended the kselftest for lock-on-fault to try both of these
  scenarios and they work as expected.  The VMA is split and the VM
  flags are set appropriately for the resulting VMAs.
 
 munlock() should do vma merging as well.  I *think* we implemented
 that.  More tests for you to add ;)
 
 How are you testing the vma merging and splitting, btw?  Parsing
 the profcs files?

The lock-on-fault test now covers VMA splitting and merging by parsing
/proc/self/maps.  VMA splitting and merging works as it should with both
MAP_LOCKONFAULT and MCL_ONFAULT.

 
   What's missing here is a syscall to set VM_LOCKONFAULT on an
   arbitrary range of memory - mlock() for lock-on-fault.  It's a
   shame that mlock() didn't take a `mode' argument.  Perhaps we
   should add such a syscall - that would make the mmap flag unneeded
   but I suppose it should be kept for symmetry.
  
  Do you want such a system call as part of this set?  I would need some
  time to make sure I had thought through all the possible corners one
  could get into with such a call, so it would delay a V3 quite a bit.
  Otherwise I can send a V3 out immediately.
 
 I think the way to look at this is to pretend that mm/mlock.c doesn't
 exist and ask how should we design these features.
 
 And that would be:
 
 - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT.
 
 - mlock() takes a `flags' argument.  Presently that's
   MLOCK_LOCKED|MLOCK_LOCKONFAULT.
 
 - munlock() takes a `flags' arument.  MLOCK_LOCKED|MLOCK_LOCKONFAULT
   to specify which flags are being cleared.
 
 - mlockall() and munlockall() ditto.
 
 
 IOW, LOCKED and LOCKEDONFAULT are treated identically and independently.
 
 Now, that's how we would have designed all this on day one.  And I
 think we can do this now, by adding new mlock2() and munlock2()
 syscalls.  And we may as well deprecate the old mlock() and munlock(),
 not that this matters much.
 
 *should* we do this?  I'm thinking yes - it's all pretty simple
 boilerplate and wrappers and such, and it gets the interface correct,
 and extensible.
 
 What do others think?

I am working on V3 which will introduce the new system calls.


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-15 Thread Eric B Munson

On Fri, 12 Jun 2015, Vlastimil Babka wrote:

 On 06/11/2015 09:34 PM, Andrew Morton wrote:
 On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson emun...@akamai.com wrote:
 
 Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not sure
 that even makes sense but the behaviour should be understood and
 tested.
 
 I have extended the kselftest for lock-on-fault to try both of these
 scenarios and they work as expected.  The VMA is split and the VM
 flags are set appropriately for the resulting VMAs.
 
 munlock() should do vma merging as well.  I *think* we implemented
 that.  More tests for you to add ;)
 
 How are you testing the vma merging and splitting, btw?  Parsing
 the profcs files?
 
 What's missing here is a syscall to set VM_LOCKONFAULT on an
 arbitrary range of memory - mlock() for lock-on-fault.  It's a
 shame that mlock() didn't take a `mode' argument.  Perhaps we
 should add such a syscall - that would make the mmap flag unneeded
 but I suppose it should be kept for symmetry.
 
 Do you want such a system call as part of this set?  I would need some
 time to make sure I had thought through all the possible corners one
 could get into with such a call, so it would delay a V3 quite a bit.
 Otherwise I can send a V3 out immediately.
 
 I think the way to look at this is to pretend that mm/mlock.c doesn't
 exist and ask how should we design these features.
 
 And that would be:
 
 - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT.
 
 Note that the semantic of MAP_LOCKED can be subtly surprising:
 
 mlock(2) fails if the memory range cannot get populated to guarantee
 that no future major faults will happen on the range.
 mmap(MAP_LOCKED) on the other hand silently succeeds even if the
 range was populated only
 partially.
 
 ( from http://marc.info/?l=linux-mmm=143152790412727w=2 )
 
 So MAP_LOCKED can silently behave like MAP_LOCKONFAULT. While
 MAP_LOCKONFAULT doesn't suffer from such problem, I wonder if that's
 sufficient reason not to extend mmap by new mlock() flags that can
 be instead applied to the VMA after mmapping, using the proposed
 mlock2() with flags. So I think instead we could deprecate
 MAP_LOCKED more prominently. I doubt the overhead of calling the
 extra syscall matters here?

We could talk about retiring the MAP_LOCKED flag but I suspect that
would get significantly more pushback than adding a new mmap flag.

Likely that the overhead does not matter in most cases, but presumably
there are cases where it does (as we have a MAP_LOCKED flag today).
Even with the proposed new system calls I think we should have the
MAP_LOCKONFAULT for parity with MAP_LOCKED.

 
 - mlock() takes a `flags' argument.  Presently that's
MLOCK_LOCKED|MLOCK_LOCKONFAULT.
 
 - munlock() takes a `flags' arument.  MLOCK_LOCKED|MLOCK_LOCKONFAULT
to specify which flags are being cleared.
 
 - mlockall() and munlockall() ditto.
 
 
 IOW, LOCKED and LOCKEDONFAULT are treated identically and independently.
 
 Now, that's how we would have designed all this on day one.  And I
 think we can do this now, by adding new mlock2() and munlock2()
 syscalls.  And we may as well deprecate the old mlock() and munlock(),
 not that this matters much.
 
 *should* we do this?  I'm thinking yes - it's all pretty simple
 boilerplate and wrappers and such, and it gets the interface correct,
 and extensible.
 
 If the new LOCKONFAULT functionality is indeed desired (I haven't
 still decided myself) then I agree that would be the cleanest way.

Do you disagree with the use cases I have listed or do you think there
is a better way of addressing those cases?

 
 What do others think?


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-11 Thread Eric B Munson

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/10/2015 05:59 PM, Andrew Morton wrote:
 On Wed, 10 Jun 2015 09:26:47 -0400 Eric B Munson
 emun...@akamai.com wrote:
 
 mlock() allows a user to control page out of program memory, but
 this comes at the cost of faulting in the entire mapping when it
 is
 
 s/mapping/locked area/

Done.

 
 allocated.  For large mappings where the entire area is not
 necessary this is not ideal.
 
 This series introduces new flags for mmap() and mlockall() that
 allow a user to specify that the covered are should not be paged
 out, but only after the memory has been used the first time.
 
 The comparison with MCL_FUTURE is hiding over in the 2/3 changelog.
  It's important so let's copy it here.
 
 : MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases
 enumerated : in the previous patch becuase MCL_FUTURE will behave
 as if each mapping : was made with MAP_LOCKED, causing the entire
 mapping to be faulted in : when new space is allocated or mapped.
 MCL_ONFAULT allows the user to : delay the fault in cost of any
 given page until it is actually needed, : but then guarantees that
 that page will always be resident.

Done

 
 I *think* it all looks OK.  I'd like someone else to go over it
 also if poss.
 
 
 I guess the 2/3 changelog should have something like
 
 : munlockall() will clear MCL_ONFAULT on all vma's in the process's
 VM.

Done

 
 It's pretty obvious, but the manpage delta should make this clear
 also.

Done

 
 
 Also the changelog(s) and manpage delta should explain that
 munlock() clears MCL_ONFAULT.

Done

 
 And now I'm wondering what happens if userspace does 
 mmap(MAP_LOCKONFAULT) and later does munlock() on just part of
 that region.  Does the vma get split?  Is this tested?  Should also
 be in the changelogs and manpage.
 
 Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not sure
 that even makes sense but the behaviour should be understood and
 tested.

I have extended the kselftest for lock-on-fault to try both of these
scenarios and they work as expected.  The VMA is split and the VM
flags are set appropriately for the resulting VMAs.

 
 
 What's missing here is a syscall to set VM_LOCKONFAULT on an
 arbitrary range of memory - mlock() for lock-on-fault.  It's a
 shame that mlock() didn't take a `mode' argument.  Perhaps we
 should add such a syscall - that would make the mmap flag unneeded
 but I suppose it should be kept for symmetry.

Do you want such a system call as part of this set?  I would need some
time to make sure I had thought through all the possible corners one
could get into with such a call, so it would delay a V3 quite a bit.
Otherwise I can send a V3 out immediately.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJVed+3AAoJELbVsDOpoOa9eHwP+gO8QmNdUKN55wiTLxXdFTRo
TTm62MJ3Yk45+JJ+8xI1POMSUVEBAX7pxnL8TpNPmwp+UF6IQT/hAnnEFNud8/aQ
5bAxU9a5fRO6Q5533woaVpYfXZXwXAla+37MGQziL7O0VEi2aQ9abX7AKnkjmXwq
e1Fc3vutAycNCzSxg42GwZxqHw83TYztyv3C4Cc7lShbCezABYvaDvXcUZkGwhjG
KJxSPYS2E0nv0MEy995P0L0H1A/KHq6mCOFFKQw6aVbPDs8J/0RhvQIlp/BBCPMV
TqDVxMBpTpdWs6reJnUZpouKBTA11KTvUA2HBVn5B14u2V7Np+NBpLKH2DUqAP2v
Gyg4Nj0MknqB1rutaBjHjI0ZefrWK5o+zWAVKZs+wtq9WkmCvTYWp505XnlJO+qo
1CEnab2kX8P74UYcsJUrJxAtxc94t6oLh305KnJheQUdcx/ZNKboB2vl1+np10jj
oZLmP2RfajZoPojPZ/bI6mj9Ffqf/Ptau+kLQ56G1IuVmQRi4ZgQ9D1+BILXyKHi
uycKovcHVffiQ+z1Ama2b4wP1t5yjNdxBH0oV1KMeScCxfyYHPFuDBe36Krjo8FO
dDMyibNIRJMX6SeYNIRni40Eafon5h21I95/yWxUaq0FGBZ1NuuSTofxAA53wJJz
f0FUI7f53Oxk9EKk8nfg
=gfVJ
-END PGP SIGNATURE-
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-11 Thread Eric B Munson

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/11/2015 03:34 PM, Andrew Morton wrote:
 On Thu, 11 Jun 2015 15:21:30 -0400 Eric B Munson
 emun...@akamai.com wrote:
 
 Ditto mlockall(MCL_ONFAULT) followed by munlock().  I'm not
 sure that even makes sense but the behaviour should be
 understood and tested.
 
 I have extended the kselftest for lock-on-fault to try both of
 these scenarios and they work as expected.  The VMA is split and
 the VM flags are set appropriately for the resulting VMAs.
 
 munlock() should do vma merging as well.  I *think* we implemented 
 that.  More tests for you to add ;)

I will add a test for this as well.  But the code is in place to merge
VMAs IIRC.

 
 How are you testing the vma merging and splitting, btw?  Parsing 
 the profcs files?

To show the VMA split happened, I dropped a printk in mlock_fixup()
and the user space test simply checks that unlocked pages are not
marked as unevictable.  The test does not parse maps or smaps for
actual VMA layout.  Given that we want to check the merging of VMAs as
well I will add this.

 
 What's missing here is a syscall to set VM_LOCKONFAULT on an 
 arbitrary range of memory - mlock() for lock-on-fault.  It's a 
 shame that mlock() didn't take a `mode' argument.  Perhaps we 
 should add such a syscall - that would make the mmap flag
 unneeded but I suppose it should be kept for symmetry.
 
 Do you want such a system call as part of this set?  I would need
 some time to make sure I had thought through all the possible
 corners one could get into with such a call, so it would delay a
 V3 quite a bit. Otherwise I can send a V3 out immediately.
 
 I think the way to look at this is to pretend that mm/mlock.c
 doesn't exist and ask how should we design these features.
 
 And that would be:
 
 - mmap() takes a `flags' argument: MAP_LOCKED|MAP_LOCKONFAULT.
 
 - mlock() takes a `flags' argument.  Presently that's 
 MLOCK_LOCKED|MLOCK_LOCKONFAULT.
 
 - munlock() takes a `flags' arument.
 MLOCK_LOCKED|MLOCK_LOCKONFAULT to specify which flags are being
 cleared.
 
 - mlockall() and munlockall() ditto.
 
 
 IOW, LOCKED and LOCKEDONFAULT are treated identically and
 independently.
 
 Now, that's how we would have designed all this on day one.  And I 
 think we can do this now, by adding new mlock2() and munlock2() 
 syscalls.  And we may as well deprecate the old mlock() and
 munlock(), not that this matters much.
 
 *should* we do this?  I'm thinking yes - it's all pretty simple 
 boilerplate and wrappers and such, and it gets the interface
 correct, and extensible.
 
 What do others think?
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJVeefAAAoJELbVsDOpoOa9930P/j32OhsgPdxt8pmlYddpHBJg
PJ4EOYZLoNJ0bWAoePRAQvb9Rd0UumXukkQKVdFCFW72QfMPkjqyMWWOA5BZ6dYl
q3h3FTzcnAtVHG7bqFheV+Ie9ZX0dplTmuGlqTZzEIVePry9VXzqp9BADbWn3bVR
ucq1CFikyEB2yu8pMtykJmEaz4CO7fzCHz6oB7RNX5oHElWmi9AieuUr5eAw6enQ
6ofuNy/N3rTCwcjeRfdL7Xhs6vn62u4nw1Jey6l9hBQUx/ujMktKcn4VwkDXIYCi
+h7lfXWruqOuC+lspBRJO7OL2e6nRdedpDWJypeUGcKXokxB2FEB25Yu31K9sk/8
jDfaKNqmcfgOseLHb+DjJqG6nq9lsUhozg8C17SJpT8qFwQ8q7iJe+1GhUF1EBsL
+DpqLU56geBY6fyIfurOfp/4Hsx2u1KzezkEnMYT/8LkbGwqbq7Zj4rquLMSHCUt
uG5j0MuhmP8/Fuf8OMsIHHUMjBHRjH4rTyaCKxNj3T8uSuLfcnIqEZiJu2qaSA8l
PxpQ6yy2szw9lDxPvxLnh8Rkx+SGEc1ciamyppDTI4LQRiCjMQ7bHAKo0RwAaPJL
ZSHrdlDnUHrYTnd0EZwg0peh8AgkROgxna/pLpfQTeW1g3erqPfbI0Ab8N0cu5j0
8+qA5C+DeSjaMAoMskTG
=82B8
-END PGP SIGNATURE-
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RESEND PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-10 Thread Eric B Munson

:   12,505.7712,418.78

Kernbench optimal load
 4.1-rc2  4.1-rc2+lock-on-fault
Elapsed Time 71.046   71.324
User Time62.117   62.352
System Time  8.9268.969
Context Switches 14531.9  14542.5
Sleeps   14935.9  14939

Eric B Munson (3):
  Add mmap flag to request pages are locked after page fault
  Add mlockall flag for locking pages on fault
  Add tests for lock on fault

 arch/alpha/include/uapi/asm/mman.h  |   2 +
 arch/mips/include/uapi/asm/mman.h   |   2 +
 arch/parisc/include/uapi/asm/mman.h |   2 +
 arch/powerpc/include/uapi/asm/mman.h|   2 +
 arch/sparc/include/uapi/asm/mman.h  |   2 +
 arch/tile/include/uapi/asm/mman.h   |   2 +
 arch/xtensa/include/uapi/asm/mman.h |   2 +
 include/linux/mm.h  |   1 +
 include/linux/mman.h|   3 +-
 include/uapi/asm-generic/mman.h |   2 +
 mm/mlock.c  |  13 ++-
 mm/mmap.c   |   4 +-
 mm/swap.c   |   3 +-
 tools/testing/selftests/vm/Makefile |   8 +-
 tools/testing/selftests/vm/lock-on-fault.c  | 145 
 tools/testing/selftests/vm/on-fault-limit.c |  47 +
 tools/testing/selftests/vm/run_vmtests  |  23 +
 17 files changed, 254 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/vm/lock-on-fault.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan shua...@osg.samsung.com
Cc: Michal Hocko mho...@suse.cz
Cc: Michael Kerrisk mtk.manpa...@gmail.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org

-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-10 Thread Eric B Munson

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.

For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well).  For the security example, any application transacting
in data that cannot be swapped out (credit card data, medical records,
etc).

This patch introduces the ability to request that pages are not
pre-faulted, but are placed on the unevictable LRU when they are finally
faulted in.

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MAP_LOCKED was used.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 arch/alpha/include/uapi/asm/mman.h   | 1 +
 arch/mips/include/uapi/asm/mman.h| 1 +
 arch/parisc/include/uapi/asm/mman.h  | 1 +
 arch/powerpc/include/uapi/asm/mman.h | 1 +
 arch/sparc/include/uapi/asm/mman.h   | 1 +
 arch/tile/include/uapi/asm/mman.h| 1 +
 arch/xtensa/include/uapi/asm/mman.h  | 1 +
 include/linux/mm.h   | 1 +
 include/linux/mman.h | 3 ++-
 include/uapi/asm-generic/mman.h  | 1 +
 mm/mmap.c| 4 ++--
 mm/swap.c| 3 ++-
 12 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..15e96e1 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -30,6 +30,7 @@
 #define MAP_NONBLOCK   0x4 /* do not block on IO */
 #define MAP_STACK  0x8 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x10/* create a huge page mapping */
+#define MAP_LOCKONFAULT0x20/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_SYNC2   /* synchronous memory sync */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index cfcb876..47846a5 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -48,6 +48,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 /*
  * Flags for msync
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 294d251..1514cd7 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -24,6 +24,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_SYNC1   /* synchronous memory sync */
 #define MS_ASYNC   2   /* sync memory asynchronously */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..fce74fe 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -27,5 +27,6 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 0b14df3..12425d8 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do

[RESEND PATCH V2 2/3] Add mlockall flag for locking pages on fault

2015-06-10 Thread Eric B Munson

Building on the previous patch, extend mlockall() to give a process a
way to specify that pages should be locked when they are faulted in, but
that pre-faulting is not needed.

MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases enumerated
in the previous patch becuase MCL_FUTURE will behave as if each mapping
was made with MAP_LOCKED, causing the entire mapping to be faulted in
when new space is allocated or mapped.  MCL_ONFAULT allows the user to
delay the fault in cost of any given page until it is actually needed,
but then guarantees that that page will always be resident.

As with the mmap(MAP_LOCKONFAULT) case, the user is charged for the
mapping against the RLIMIT_MEMLOCK when the address space is allocated,
not when the page is faulted in.  This decision was made to keep the
accounting checks out of the page fault path.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/alpha/include/uapi/asm/mman.h   |  1 +
 arch/mips/include/uapi/asm/mman.h|  1 +
 arch/parisc/include/uapi/asm/mman.h  |  1 +
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h|  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  1 +
 include/uapi/asm-generic/mman.h  |  1 +
 mm/mlock.c   | 13 +
 9 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 15e96e1..dfdaecf 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -38,6 +38,7 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ONFAULT32768   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 47846a5..f0705ff 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -62,6 +62,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 1514cd7..7c2eb85 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -32,6 +32,7 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index fce74fe..761137a 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ONFAULT0x8000  /* lock all pages that are faulted in */
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 12425d8..dd027b8 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -17,6 +17,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ONFAULT0x8000  /* lock all pages that are faulted in */
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
diff --git a/arch/tile/include/uapi/asm/mman.h 
b/arch/tile/include/uapi/asm/mman.h
index ec04eaf..0f7ae45 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -37,6 +37,7 @@
  */
 #define MCL_CURRENT1

[PATCH V2 2/3] Add mlockall flag for locking pages on fault

2015-06-02 Thread Eric B Munson

Building on the previous patch, extend mlockall() to give a process a
way to specify that pages should be locked when they are faulted in, but
that pre-faulting is not needed.

MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases enumerated
in the previous patch becuase MCL_FUTURE will behave as if each mapping
was made with MAP_LOCKED, causing the entire mapping to be faulted in
when new space is allocated or mapped.  MCL_ONFAULT allows the user to
delay the fault in cost of any given page until it is actually needed,
but then guarantees that that page will always be resident.

As with the mmap(MAP_LOCKONFAULT) case, the user is charged for the
mapping against the RLIMIT_MEMLOCK when the address space is allocated,
not when the page is faulted in.  This decision was made to keep the
accounting checks out of the page fault path.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Michael Kerrisk mtk.manpa...@gmail.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/alpha/include/uapi/asm/mman.h   |  1 +
 arch/mips/include/uapi/asm/mman.h|  1 +
 arch/parisc/include/uapi/asm/mman.h  |  1 +
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h|  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  1 +
 include/uapi/asm-generic/mman.h  |  1 +
 mm/mlock.c   | 13 +
 9 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 15e96e1..dfdaecf 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -38,6 +38,7 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ONFAULT32768   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 47846a5..f0705ff 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -62,6 +62,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 1514cd7..7c2eb85 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -32,6 +32,7 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ONFAULT4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index fce74fe..0109937 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ONFAULT0x8 /* lock all pages that are faulted in */
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 12425d8..f2986f7 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -17,6 +17,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ONFAULT0x8 /* lock all pages that are faulted in */
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
diff --git a/arch/tile/include/uapi/asm/mman.h 
b/arch/tile/include/uapi/asm/mman.h
index ec04eaf..0f7ae45 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -37,6 +37,7

[PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

2015-06-02 Thread Eric B Munson

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.

For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well).  For the security example, any application transacting
in data that cannot be swapped out (credit card data, medical records,
etc).

This patch introduces the ability to request that pages are not
pre-faulted, but are placed on the unevictable LRU when they are finally
faulted in.

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MAP_LOCKED was used.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: Michal Hocko mho...@suse.cz
Cc: Michael Kerrisk mtk.manpa...@gmail.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 arch/alpha/include/uapi/asm/mman.h   | 1 +
 arch/mips/include/uapi/asm/mman.h| 1 +
 arch/parisc/include/uapi/asm/mman.h  | 1 +
 arch/powerpc/include/uapi/asm/mman.h | 1 +
 arch/sparc/include/uapi/asm/mman.h   | 1 +
 arch/tile/include/uapi/asm/mman.h| 1 +
 arch/xtensa/include/uapi/asm/mman.h  | 1 +
 include/linux/mm.h   | 1 +
 include/linux/mman.h | 3 ++-
 include/uapi/asm-generic/mman.h  | 1 +
 mm/mmap.c| 4 ++--
 mm/swap.c| 3 ++-
 12 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..15e96e1 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -30,6 +30,7 @@
 #define MAP_NONBLOCK   0x4 /* do not block on IO */
 #define MAP_STACK  0x8 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x10/* create a huge page mapping */
+#define MAP_LOCKONFAULT0x20/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_SYNC2   /* synchronous memory sync */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index cfcb876..47846a5 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -48,6 +48,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 /*
  * Flags for msync
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 294d251..1514cd7 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -24,6 +24,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_SYNC1   /* synchronous memory sync */
 #define MS_ASYNC   2   /* sync memory asynchronously */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..fce74fe 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -27,5 +27,6 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 0b14df3..12425d8 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock

[PATCH V2 0/3] Allow user to request memory to be locked on page fault

2015-06-02 Thread Eric B Munson

:   12,505.7712,418.78

Kernbench optimal load
 4.1-rc2  4.1-rc2+lock-on-fault
Elapsed Time 71.046   71.324
User Time62.117   62.352
System Time  8.9268.969
Context Switches 14531.9  14542.5
Sleeps   14935.9  14939

Eric B Munson (3):
  Add mmap flag to request pages are locked after page fault
  Add mlockall flag for locking pages on fault
  Add tests for lock on fault

 arch/alpha/include/uapi/asm/mman.h  |   2 +
 arch/mips/include/uapi/asm/mman.h   |   2 +
 arch/parisc/include/uapi/asm/mman.h |   2 +
 arch/powerpc/include/uapi/asm/mman.h|   2 +
 arch/sparc/include/uapi/asm/mman.h  |   2 +
 arch/tile/include/uapi/asm/mman.h   |   2 +
 arch/xtensa/include/uapi/asm/mman.h |   2 +
 include/linux/mm.h  |   1 +
 include/linux/mman.h|   3 +-
 include/uapi/asm-generic/mman.h |   2 +
 mm/mlock.c  |  13 ++-
 mm/mmap.c   |   4 +-
 mm/swap.c   |   3 +-
 tools/testing/selftests/vm/Makefile |   8 +-
 tools/testing/selftests/vm/lock-on-fault.c  | 145 
 tools/testing/selftests/vm/on-fault-limit.c |  47 +
 tools/testing/selftests/vm/run_vmtests  |  23 +
 17 files changed, 254 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/vm/lock-on-fault.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan shua...@osg.samsung.com
Cc: Michal Hocko mho...@suse.cz
Cc: Michael Kerrisk mtk.manpa...@gmail.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org

-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH 0/3] Allow user to request memory to be locked on page fault

2015-06-02 Thread Eric B Munson

On Mon, 01 Jun 2015, Andrew Morton wrote:

 On Fri, 29 May 2015 10:13:25 -0400 Eric B Munson emun...@akamai.com wrote:
 
  mlock() allows a user to control page out of program memory, but this
  comes at the cost of faulting in the entire mapping when it is
  allocated.  For large mappings where the entire area is not necessary
  this is not ideal.
  
  This series introduces new flags for mmap() and mlockall() that allow a
  user to specify that the covered are should not be paged out, but only
  after the memory has been used the first time.
 
 I almost applied these, but the naming issue (below) stopped me.
 
 A few things...
 
 - The 0/n changelog should reveal how MAP_LOCKONFAULT interacts with
   rlimit(RLIMIT_MEMLOCK).
 
   I see the implementation is as if the entire mapping will be
   faulted in (for mmap) and as if it was MCL_FUTURE (for mlockall)
   which seems fine.  Please include changelog text explaining and
   justifying these decisions.  This stuff will need to be in the
   manpage updates as well.

Change logs are updated, and this will be included in the man page
update as well.

 
 - I think I already asked why not just use MCL_FUTURE but I forget
   the answer ;) In general it is a good idea to update changelogs in
   response to reviewer questions, because other people will be
   wondering the same things.  Or maybe I forgot to ask.  Either way,
   please address this in the changelogs.

I must have missed that question.  Here is the text from the updated
mlockall changelog:

MCL_ONFAULT is preferrable to MCL_FUTURE for the use cases enumerated
in the previous patch becuase MCL_FUTURE will behave as if each mapping
was made with MAP_LOCKED, causing the entire mapping to be faulted in
when new space is allocated or mapped.  MCL_ONFAULT allows the user to
delay the fault in cost of any given page until it is actually needed,
but then guarantees that that page will always be resident.

 
 - I can perhaps see the point in mmap(MAP_LOCKONFAULT) (other
   mappings don't get lock-in-memory treatment), but what's the benefit
   in mlockall(MCL_ON_FAULT) over MCL_FUTURE?  (Add to changelog also,
   please).
 
 - Is there a manpage update?

I will send one out when I post V2

 
 - Can we rename patch 1/3 from add flag to ... to add mmap flag to
   ..., to distinguish from 2/3 add mlockall flag ...?

Done

 
 - The MAP_LOCKONFAULT versus MCL_ON_FAULT inconsistency is
   irritating!  Can we get these consistent please: switch to either
   MAP_LOCK_ON_FAULT or MCL_ONFAULT.

Yes, will do for V2.

 


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RESEND PATCH 1/3] Add flag to request pages are locked after page fault

2015-05-29 Thread Eric B Munson

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.  This patch introduces
the ability to request that pages are not pre-faulted, but are placed on
the unevictable LRU when they are finally faulted in.

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MAP_LOCKED was used.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 arch/alpha/include/uapi/asm/mman.h   | 1 +
 arch/mips/include/uapi/asm/mman.h| 1 +
 arch/parisc/include/uapi/asm/mman.h  | 1 +
 arch/powerpc/include/uapi/asm/mman.h | 1 +
 arch/sparc/include/uapi/asm/mman.h   | 1 +
 arch/tile/include/uapi/asm/mman.h| 1 +
 arch/xtensa/include/uapi/asm/mman.h  | 1 +
 include/linux/mm.h   | 1 +
 include/linux/mman.h | 3 ++-
 include/uapi/asm-generic/mman.h  | 1 +
 mm/mmap.c| 4 ++--
 mm/swap.c| 3 ++-
 12 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..15e96e1 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -30,6 +30,7 @@
 #define MAP_NONBLOCK   0x4 /* do not block on IO */
 #define MAP_STACK  0x8 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x10/* create a huge page mapping */
+#define MAP_LOCKONFAULT0x20/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_SYNC2   /* synchronous memory sync */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index cfcb876..47846a5 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -48,6 +48,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 /*
  * Flags for msync
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 294d251..1514cd7 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -24,6 +24,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_SYNC1   /* synchronous memory sync */
 #define MS_ASYNC   2   /* sync memory asynchronously */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..fce74fe 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -27,5 +27,6 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 0b14df3..12425d8 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/tile/include/uapi/asm/mman.h 
b/arch/tile/include/uapi/asm/mman.h
index 81b8fc3..ec04eaf 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -29,6 +29,7 @@
 #define MAP_DENYWRITE  0x0800  /* ETXTBSY */
 #define

[RESEND PATCH 2/3] Add mlockall flag for locking pages on fault

2015-05-29 Thread Eric B Munson

Building on the previous patch, extend mlockall() to give a process a
way to specify that pages should be locked when they are faulted in, but
that pre-faulting is not needed.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/alpha/include/uapi/asm/mman.h   |  1 +
 arch/mips/include/uapi/asm/mman.h|  1 +
 arch/parisc/include/uapi/asm/mman.h  |  1 +
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h|  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  1 +
 include/uapi/asm-generic/mman.h  |  1 +
 mm/mlock.c   | 13 +
 9 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 15e96e1..3120dfb 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -38,6 +38,7 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ON_FAULT   32768   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 47846a5..82aec3c 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -62,6 +62,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 1514cd7..f4601f3 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -32,6 +32,7 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index fce74fe..0a28efc 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ON_FAULT   0x8 /* lock all pages that are faulted in */
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 12425d8..119be80 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -17,6 +17,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ON_FAULT   0x8 /* lock all pages that are faulted in */
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
diff --git a/arch/tile/include/uapi/asm/mman.h 
b/arch/tile/include/uapi/asm/mman.h
index ec04eaf..66ea935 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -37,6 +37,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 
 #endif /* _ASM_TILE_MMAN_H */
diff --git a/arch/xtensa/include/uapi/asm/mman.h 
b/arch/xtensa/include/uapi/asm/mman.h
index 42d43cc..9abcc29 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -75,6 +75,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0

[RESEND PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-29 Thread Eric B Munson

mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is
allocated.  For large mappings where the entire area is not necessary
this is not ideal.

This series introduces new flags for mmap() and mlockall() that allow a
user to specify that the covered are should not be paged out, but only
after the memory has been used the first time.

There are two main use cases that this set covers.  The first is the
security focussed mlock case.  A buffer is needed that cannot be written
to swap.  The maximum size is known, but on average the memory used is
significantly less than this maximum.  With lock on fault, the buffer
is guaranteed to never be paged out without consuming the maximum size
every time such a buffer is created.

The second use case is focussed on performance.  Portions of a large
file are needed and we want to keep the used portions in memory once
accessed.  This is the case for large graphical models where the path
through the graph is not known until run time.  The entire graph is
unlikely to be used in a given invocation, but once a node has been
used it needs to stay resident for further processing.  Given these
constraints we have a number of options.  We can potentially waste a
large amount of memory by mlocking the entire region (this can also
cause a significant stall at startup as the entire file is read in).
We can mlock every page as we access them without tracking if the page
is already resident but this introduces large overhead for each access.
The third option is mapping the entire region with PROT_NONE and using
a signal handler for SIGSEGV to mprotect(PROT_READ) and mlock() the
needed page.  Doing this page at a time adds a significant performance
penalty.  Batching can be used to mitigate this overhead, but in order
to safely avoid trying to mprotect pages outside of the mapping, the
boundaries of each mapping to be used in this way must be tracked and
available to the signal handler.  This is precisely what the mm system
in the kernel should already be doing.

To illustrate the benefit of this patch I wrote a test program that
mmaps a 5 GB file filled with random data and then makes 15,000,000
accesses to random addresses in that mapping.  The test program was run
20 times for each setup.  Results are reported for two program portions,
setup and execution.  The setup phase is calling mmap and optionally
mlock on the entire region.  For most experiments this is trivial, but
it highlights the cost of faulting in the entire region.  Results are
averages across the 20 runs in milliseconds.

mmap with MAP_LOCKED:
Setup avg:  11821.193
Processing avg: 3404.286

mmap with mlock() before each access:
Setup avg:  0.054
Processing avg: 34263.201

mmap with PROT_NONE and signal handler and batch size of 1 page:
With the default value in max_map_count, this gets ENOMEM as I attempt
to change the permissions, after upping the sysctl significantly I get:
Setup avg:  0.050
Processing avg: 67690.625

mmap with PROT_NONE and signal handler and batch size of 8 pages:
Setup avg:  0.098
Processing avg: 37344.197

mmap with PROT_NONE and signal handler and batch size of 16 pages:
Setup avg:  0.0548
Processing avg: 29295.669

mmap with MAP_LOCKONFAULT:
Setup avg:  0.073
Processing avg: 18392.136

The signal handler in the batch cases faulted in memory in two steps to
avoid having to know the start and end of the faulting mapping.  The
first step covers the page that caused the fault as we know that it will
be possible to lock.  The second step speculatively tries to mlock and
mprotect the batch size - 1 pages that follow.  There may be a clever
way to avoid this without having the program track each mapping to be
covered by this handeler in a globally accessible structure, but I could
not find it.  It should be noted that with a large enough batch size
this two step fault handler can still cause the program to crash if it
reaches far beyond the end of the mapping.

These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise MAP_LOCKONFAULT is significantly faster.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).  The following are the average
values across 20 runs of each benchmark after a warmup run whose
results were discarded.

Avg throughput in MB/s from stream using 100 element arrays
Test 4.1-rc2  4.1-rc2+lock-on-fault
Copy:10,979.0810,917.34
Scale:   11,094.4511,023.01
Add: 12,487.2912,388.65
Triad:   12,505.7712,418.78

Kernbench optimal load
 4.1-rc2  4.1-rc2+lock-on-fault
Elapsed Time 71.046   71.324
User Time62.117   62.352
System Time  8.9268.969
Context Switches 14531.9  14542.5
Sleeps   14935.9  14939

Eric B Munson (3):
  Add flag to request pages are locked

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-19 Thread Eric B Munson

On Fri, 15 May 2015, Eric B Munson wrote:

 On Thu, 14 May 2015, Michal Hocko wrote:
 
  On Wed 13-05-15 11:00:36, Eric B Munson wrote:
   On Mon, 11 May 2015, Eric B Munson wrote:
   
On Fri, 08 May 2015, Andrew Morton wrote:

 On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com 
 wrote:
 
  mlock() allows a user to control page out of program memory, but 
  this
  comes at the cost of faulting in the entire mapping when it is
  allocated.  For large mappings where the entire area is not 
  necessary
  this is not ideal.
  
  This series introduces new flags for mmap() and mlockall() that 
  allow a
  user to specify that the covered are should not be paged out, but 
  only
  after the memory has been used the first time.
 
 Please tell us much much more about the value of these changes: the 
 use
 cases, the behavioural improvements and performance results which the
 patchset brings to those use cases, etc.
 

To illustrate the proposed use case I wrote a quick program that mmaps
a 5GB file which is filled with random data and accesses 150,000 pages
from that mapping.  Setup and processing were timed separately to
illustrate the differences between the three tested approaches.  the
setup portion is simply the call to mmap, the processing is the
accessing of the various locations in  that mapping.  The following
values are in milliseconds and are the averages of 20 runs each with a
call to echo 3  /proc/sys/vm/drop_caches between each run.

The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
Startup average:9476.506
Processing average: 3.573

The second mapping was simply MAP_PRIVATE but each page was passed to
mlock() before being read:
Startup average:0.051
Processing average: 721.859

The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
Startup average:0.084
Processing average: 42.125

   
   Michal's suggestion of changing protections and locking in a signal
   handler was better than the locking as needed, but still significantly
   more work required than the LOCKONFAULT case.
   
   Startup average:0.047
   Processing average: 86.431
  
  Have you played with batching? Has it helped? Anyway it is to be
  expected that the overhead will be higher than a single mmap call. The
  question is whether you can live with it because adding a new semantic
  to mlock sounds trickier and MAP_LOCKED is tricky enough already...
  
 
 I reworked the experiment to better cover the batching solution.  The
 same 5GB data file is used, however instead of 150,000 accesses at
 regular intervals, the test program now does 15,000,000 accesses to
 random pages in the mapping.  The rest of the setup remains the same.
 
 mmap with MAP_LOCKED:
 Setup avg:  11821.193
 Processing avg: 3404.286
 
 mmap with mlock() before each access:
 Setup avg:  0.054
 Processing avg: 34263.201
 
 mmap with PROT_NONE and signal handler and batch size of 1 page:
 With the default value in max_map_count, this gets ENOMEM as I attempt
 to change the permissions, after upping the sysctl significantly I get:
 Setup avg:  0.050
 Processing avg: 67690.625
 
 mmap with PROT_NONE and signal handler and batch size of 8 pages:
 Setup avg:  0.098
 Processing avg: 37344.197
 
 mmap with PROT_NONE and signal handler and batch size of 16 pages:
 Setup avg:  0.0548
 Processing avg: 29295.669
 
 mmap with MAP_LOCKONFAULT:
 Setup avg:  0.073
 Processing avg: 18392.136
 
 The signal handler in the batch cases faulted in memory in two steps to
 avoid having to know the start and end of the faulting mapping.  The
 first step covers the page that caused the fault as we know that it will
 be possible to lock.  The second step speculatively tries to mlock and
 mprotect the batch size - 1 pages that follow.  There may be a clever
 way to avoid this without having the program track each mapping to be
 covered by this handeler in a globally accessible structure, but I could
 not find it.
 
 These results show that if the developer knows that a majority of the
 mapping will be used, it is better to try and fault it in at once,
 otherwise MAP_LOCKONFAULT is significantly faster.
 
 Eric

Is there anything else I can add to the discussion here?



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-15 Thread Eric B Munson

On Thu, 14 May 2015, Michal Hocko wrote:

 On Wed 13-05-15 11:00:36, Eric B Munson wrote:
  On Mon, 11 May 2015, Eric B Munson wrote:
  
   On Fri, 08 May 2015, Andrew Morton wrote:
   
On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com 
wrote:

 mlock() allows a user to control page out of program memory, but this
 comes at the cost of faulting in the entire mapping when it is
 allocated.  For large mappings where the entire area is not necessary
 this is not ideal.
 
 This series introduces new flags for mmap() and mlockall() that allow 
 a
 user to specify that the covered are should not be paged out, but only
 after the memory has been used the first time.

Please tell us much much more about the value of these changes: the use
cases, the behavioural improvements and performance results which the
patchset brings to those use cases, etc.

   
   To illustrate the proposed use case I wrote a quick program that mmaps
   a 5GB file which is filled with random data and accesses 150,000 pages
   from that mapping.  Setup and processing were timed separately to
   illustrate the differences between the three tested approaches.  the
   setup portion is simply the call to mmap, the processing is the
   accessing of the various locations in  that mapping.  The following
   values are in milliseconds and are the averages of 20 runs each with a
   call to echo 3  /proc/sys/vm/drop_caches between each run.
   
   The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
   Startup average:9476.506
   Processing average: 3.573
   
   The second mapping was simply MAP_PRIVATE but each page was passed to
   mlock() before being read:
   Startup average:0.051
   Processing average: 721.859
   
   The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
   Startup average:0.084
   Processing average: 42.125
   
  
  Michal's suggestion of changing protections and locking in a signal
  handler was better than the locking as needed, but still significantly
  more work required than the LOCKONFAULT case.
  
  Startup average:0.047
  Processing average: 86.431
 
 Have you played with batching? Has it helped? Anyway it is to be
 expected that the overhead will be higher than a single mmap call. The
 question is whether you can live with it because adding a new semantic
 to mlock sounds trickier and MAP_LOCKED is tricky enough already...
 

I reworked the experiment to better cover the batching solution.  The
same 5GB data file is used, however instead of 150,000 accesses at
regular intervals, the test program now does 15,000,000 accesses to
random pages in the mapping.  The rest of the setup remains the same.

mmap with MAP_LOCKED:
Setup avg:  11821.193
Processing avg: 3404.286

mmap with mlock() before each access:
Setup avg:  0.054
Processing avg: 34263.201

mmap with PROT_NONE and signal handler and batch size of 1 page:
With the default value in max_map_count, this gets ENOMEM as I attempt
to change the permissions, after upping the sysctl significantly I get:
Setup avg:  0.050
Processing avg: 67690.625

mmap with PROT_NONE and signal handler and batch size of 8 pages:
Setup avg:  0.098
Processing avg: 37344.197

mmap with PROT_NONE and signal handler and batch size of 16 pages:
Setup avg:  0.0548
Processing avg: 29295.669

mmap with MAP_LOCKONFAULT:
Setup avg:  0.073
Processing avg: 18392.136

The signal handler in the batch cases faulted in memory in two steps to
avoid having to know the start and end of the faulting mapping.  The
first step covers the page that caused the fault as we know that it will
be possible to lock.  The second step speculatively tries to mlock and
mprotect the batch size - 1 pages that follow.  There may be a clever
way to avoid this without having the program track each mapping to be
covered by this handeler in a globally accessible structure, but I could
not find it.

These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise MAP_LOCKONFAULT is significantly faster.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-14 Thread Eric B Munson

On Thu, 14 May 2015, Michal Hocko wrote:

 On Wed 13-05-15 11:00:36, Eric B Munson wrote:
  On Mon, 11 May 2015, Eric B Munson wrote:
  
   On Fri, 08 May 2015, Andrew Morton wrote:
   
On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com 
wrote:

 mlock() allows a user to control page out of program memory, but this
 comes at the cost of faulting in the entire mapping when it is
 allocated.  For large mappings where the entire area is not necessary
 this is not ideal.
 
 This series introduces new flags for mmap() and mlockall() that allow 
 a
 user to specify that the covered are should not be paged out, but only
 after the memory has been used the first time.

Please tell us much much more about the value of these changes: the use
cases, the behavioural improvements and performance results which the
patchset brings to those use cases, etc.

   
   To illustrate the proposed use case I wrote a quick program that mmaps
   a 5GB file which is filled with random data and accesses 150,000 pages
   from that mapping.  Setup and processing were timed separately to
   illustrate the differences between the three tested approaches.  the
   setup portion is simply the call to mmap, the processing is the
   accessing of the various locations in  that mapping.  The following
   values are in milliseconds and are the averages of 20 runs each with a
   call to echo 3  /proc/sys/vm/drop_caches between each run.
   
   The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
   Startup average:9476.506
   Processing average: 3.573
   
   The second mapping was simply MAP_PRIVATE but each page was passed to
   mlock() before being read:
   Startup average:0.051
   Processing average: 721.859
   
   The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
   Startup average:0.084
   Processing average: 42.125
   
  
  Michal's suggestion of changing protections and locking in a signal
  handler was better than the locking as needed, but still significantly
  more work required than the LOCKONFAULT case.
  
  Startup average:0.047
  Processing average: 86.431
 
 Have you played with batching? Has it helped? Anyway it is to be
 expected that the overhead will be higher than a single mmap call. The
 question is whether you can live with it because adding a new semantic
 to mlock sounds trickier and MAP_LOCKED is tricky enough already...
 

The test code I have been using is a pathalogical test case that only
touches pages once and they are fairly far apart.

On the face batching sounds like a good idea, but I have a couple of
questions.  In order to batch fault in pages the seg fault handler needs
to know about the mapping in question.  Specifically it needs to know
where it ends so that it doesn't try and mprotect()/mlock() past the
end.  So now the program has to start tracking its maps in some globally
accessible structure and this sounds more like implementing memory
management in userspace.  How could this batching be implemented without
requiring the signal handler to know about mapping that is being
accessed?  Also, how much memory management policy is it reasonable to
expect user space to implement in these cases?

Eric



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-13 Thread Eric B Munson

On Wed, 13 May 2015, Michal Hocko wrote:

 On Fri 08-05-15 16:06:10, Eric B Munson wrote:
  On Fri, 08 May 2015, Andrew Morton wrote:
  
   On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com 
   wrote:
   
mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is
allocated.  For large mappings where the entire area is not necessary
this is not ideal.

This series introduces new flags for mmap() and mlockall() that allow a
user to specify that the covered are should not be paged out, but only
after the memory has been used the first time.
   
   Please tell us much much more about the value of these changes: the use
   cases, the behavioural improvements and performance results which the
   patchset brings to those use cases, etc.
   
  
  The primary use case is for mmaping large files read only.  The process
  knows that some of the data is necessary, but it is unlikely that the
  entire file will be needed.  The developer only wants to pay the cost to
  read the data in once.  Unfortunately developer must choose between
  allowing the kernel to page in the memory as needed and guaranteeing
  that the data will only be read from disk once.  The first option runs
  the risk of having the memory reclaimed if the system is under memory
  pressure, the second forces the memory usage and startup delay when
  faulting in the entire file.
 
 Is there any reason you cannot do this from the userspace? Start by
 mmap(PROT_NONE) and do 
 mmap(MAP_FIXED|MAP_LOCKED|MAP_READ|other_flags_you_need)
 from the SIGSEGV handler?
 You can generate a lot of vmas that way but you can mitigate that to a
 certain level by mapping larger than PAGE_SIZE chunks in the fault
 handler. Would that work in your usecase?

This might work for the use cases I have laid out (I am not sure about
the anonymous mmap one, but I will try it).  I am concerned about how
much memory management policy these suggestions push into userspace.
I am also concerned about the number of system calls required to do the
same thing.  This will require a new call to mmap() for every new page
accessed in the file (or for every file_size/map_size in the multiple
page chunk).  The simple case of calling mlock() on the every time the
file was accessed was significantly slower than the LOCKONFAULT flag.
Your suggestion will be better in that it avoids the extra mlock call
for pages already locked, but there still significantly more system
calls.  I will add this to the program I have been using to measure
executuion times and see how it compares to the other options.

Eric



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-13 Thread Eric B Munson

On Mon, 11 May 2015, Eric B Munson wrote:

 On Fri, 08 May 2015, Andrew Morton wrote:
 
  On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com wrote:
  
   mlock() allows a user to control page out of program memory, but this
   comes at the cost of faulting in the entire mapping when it is
   allocated.  For large mappings where the entire area is not necessary
   this is not ideal.
   
   This series introduces new flags for mmap() and mlockall() that allow a
   user to specify that the covered are should not be paged out, but only
   after the memory has been used the first time.
  
  Please tell us much much more about the value of these changes: the use
  cases, the behavioural improvements and performance results which the
  patchset brings to those use cases, etc.
  
 
 To illustrate the proposed use case I wrote a quick program that mmaps
 a 5GB file which is filled with random data and accesses 150,000 pages
 from that mapping.  Setup and processing were timed separately to
 illustrate the differences between the three tested approaches.  the
 setup portion is simply the call to mmap, the processing is the
 accessing of the various locations in  that mapping.  The following
 values are in milliseconds and are the averages of 20 runs each with a
 call to echo 3  /proc/sys/vm/drop_caches between each run.
 
 The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
 Startup average:9476.506
 Processing average: 3.573
 
 The second mapping was simply MAP_PRIVATE but each page was passed to
 mlock() before being read:
 Startup average:0.051
 Processing average: 721.859
 
 The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
 Startup average:0.084
 Processing average: 42.125
 

Michal's suggestion of changing protections and locking in a signal
handler was better than the locking as needed, but still significantly
more work required than the LOCKONFAULT case.

Startup average:0.047
Processing average: 86.431



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-11 Thread Eric B Munson

On Fri, 08 May 2015, Andrew Morton wrote:

 On Fri, 8 May 2015 16:06:10 -0400 Eric B Munson emun...@akamai.com wrote:
 
  On Fri, 08 May 2015, Andrew Morton wrote:
  
   On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com 
   wrote:
   
mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is
allocated.  For large mappings where the entire area is not necessary
this is not ideal.

This series introduces new flags for mmap() and mlockall() that allow a
user to specify that the covered are should not be paged out, but only
after the memory has been used the first time.
   
   Please tell us much much more about the value of these changes: the use
   cases, the behavioural improvements and performance results which the
   patchset brings to those use cases, etc.
   
  
  The primary use case is for mmaping large files read only.  The process
  knows that some of the data is necessary, but it is unlikely that the
  entire file will be needed.  The developer only wants to pay the cost to
  read the data in once.  Unfortunately developer must choose between
  allowing the kernel to page in the memory as needed and guaranteeing
  that the data will only be read from disk once.  The first option runs
  the risk of having the memory reclaimed if the system is under memory
  pressure, the second forces the memory usage and startup delay when
  faulting in the entire file.
 
 Why can't the application mmap only those parts of the file which it
 wants and mlock those?

There are a number of problems with this approach.  The first is it
presumes the program will know what portions are needed a head of time.
In many cases this is simply not true.  The second problem is the number
of syscalls required.  With my patches, a single mmap() or mlockall()
call is needed to setup the required locking.  Without it, a separate
mmap call must be made for each piece of data that is needed.  This also
opens up problems for data that is arranged assuming it is contiguous in
memory.  With the single mmap call, the user gets a contiguous VMA
without having to know about it.  mmap() with MAP_FIXED could address
the problem, but this introduces a new failure mode of your map
colliding with another that was placed by the kernel.

Another use case for the LOCKONFAULT flag is the security use of
mlock().  If an application will be using data that cannot be written
to swap, but the exact size is unknown until run time (all we have a
build time is the maximum size the buffer can be).  The LOCKONFAULT flag
allows the developer to create the buffer and guarantee that the
contents are never written to swap without ever consuming more memory
than is actually needed.

 
  I am working on getting startup times with and without this change for
  an application, I will post them as soon as I have them.
 


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-11 Thread Eric B Munson

On Mon, 11 May 2015, Andrew Morton wrote:

 On Mon, 11 May 2015 10:36:18 -0400 Eric B Munson emun...@akamai.com wrote:
 
  On Fri, 08 May 2015, Andrew Morton wrote:
  ...
 
   
   Why can't the application mmap only those parts of the file which it
   wants and mlock those?
  
  There are a number of problems with this approach.  The first is it
  presumes the program will know what portions are needed a head of time.
  In many cases this is simply not true.  The second problem is the number
  of syscalls required.  With my patches, a single mmap() or mlockall()
  call is needed to setup the required locking.  Without it, a separate
  mmap call must be made for each piece of data that is needed.  This also
  opens up problems for data that is arranged assuming it is contiguous in
  memory.  With the single mmap call, the user gets a contiguous VMA
  without having to know about it.  mmap() with MAP_FIXED could address
  the problem, but this introduces a new failure mode of your map
  colliding with another that was placed by the kernel.
  
  Another use case for the LOCKONFAULT flag is the security use of
  mlock().  If an application will be using data that cannot be written
  to swap, but the exact size is unknown until run time (all we have a
  build time is the maximum size the buffer can be).  The LOCKONFAULT flag
  allows the developer to create the buffer and guarantee that the
  contents are never written to swap without ever consuming more memory
  than is actually needed.
 
 What application(s) or class of applications are we talking about here?
 
 IOW, how generally applicable is this?  It sounds rather specialized.
 

For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well).  For the security example, any application transacting
in data that cannot be swapped out (credit card data, medical records,
etc).



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-11 Thread Eric B Munson

On Fri, 08 May 2015, Andrew Morton wrote:

 On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com wrote:
 
  mlock() allows a user to control page out of program memory, but this
  comes at the cost of faulting in the entire mapping when it is
  allocated.  For large mappings where the entire area is not necessary
  this is not ideal.
  
  This series introduces new flags for mmap() and mlockall() that allow a
  user to specify that the covered are should not be paged out, but only
  after the memory has been used the first time.
 
 Please tell us much much more about the value of these changes: the use
 cases, the behavioural improvements and performance results which the
 patchset brings to those use cases, etc.
 

To illustrate the proposed use case I wrote a quick program that mmaps
a 5GB file which is filled with random data and accesses 150,000 pages
from that mapping.  Setup and processing were timed separately to
illustrate the differences between the three tested approaches.  the
setup portion is simply the call to mmap, the processing is the
accessing of the various locations in  that mapping.  The following
values are in milliseconds and are the averages of 20 runs each with a
call to echo 3  /proc/sys/vm/drop_caches between each run.

The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
Startup average:9476.506
Processing average: 3.573

The second mapping was simply MAP_PRIVATE but each page was passed to
mlock() before being read:
Startup average:0.051
Processing average: 721.859

The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
Startup average:0.084
Processing average: 42.125




signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-08 Thread Eric B Munson

mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is
allocated.  For large mappings where the entire area is not necessary
this is not ideal.

This series introduces new flags for mmap() and mlockall() that allow a
user to specify that the covered are should not be paged out, but only
after the memory has been used the first time.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).

Avg throughput in MB/s from stream using 100 element arrays
Test 4.1-rc2  4.1-rc2+lock-on-fault
Copy:10,979.0810,917.34
Scale:   11,094.4511,023.01
Add: 12,487.2912,388.65
Triad:   12,505.7712,418.78

Kernbench optimal load
 4.1-rc2  4.1-rc2+lock-on-fault
Elapsed Time 71.046   71.324
User Time62.117   62.352
System Time  8.9268.969
Context Switches 14531.9  14542.5
Sleeps   14935.9  14939

Eric B Munson (3):
  Add flag to request pages are locked after page fault
  Add mlockall flag for locking pages on fault
  Add tests for lock on fault

 arch/alpha/include/uapi/asm/mman.h  |   2 +
 arch/mips/include/uapi/asm/mman.h   |   2 +
 arch/parisc/include/uapi/asm/mman.h |   2 +
 arch/powerpc/include/uapi/asm/mman.h|   2 +
 arch/sparc/include/uapi/asm/mman.h  |   2 +
 arch/tile/include/uapi/asm/mman.h   |   2 +
 arch/xtensa/include/uapi/asm/mman.h |   2 +
 include/linux/mm.h  |   1 +
 include/linux/mman.h|   3 +-
 include/uapi/asm-generic/mman.h |   2 +
 mm/mlock.c  |  13 ++-
 mm/mmap.c   |   4 +-
 mm/swap.c   |   3 +-
 tools/testing/selftests/vm/Makefile |   8 +-
 tools/testing/selftests/vm/lock-on-fault.c  | 145 
 tools/testing/selftests/vm/on-fault-limit.c |  47 +
 tools/testing/selftests/vm/run_vmtests  |  23 +
 17 files changed, 254 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/vm/lock-on-fault.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan shua...@osg.samsung.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org

-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] Add mlockall flag for locking pages on fault

2015-05-08 Thread Eric B Munson

Building on the previous patch, extend mlockall() to give a process a
way to specify that pages should be locked when they are faulted in, but
that pre-faulting is not needed.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/alpha/include/uapi/asm/mman.h   |  1 +
 arch/mips/include/uapi/asm/mman.h|  1 +
 arch/parisc/include/uapi/asm/mman.h  |  1 +
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h|  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  1 +
 include/uapi/asm-generic/mman.h  |  1 +
 mm/mlock.c   | 13 +
 9 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 15e96e1..3120dfb 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -38,6 +38,7 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ON_FAULT   32768   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 47846a5..82aec3c 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -62,6 +62,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 1514cd7..f4601f3 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -32,6 +32,7 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index fce74fe..0a28efc 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ON_FAULT   0x8 /* lock all pages that are faulted in */
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 12425d8..119be80 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -17,6 +17,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ON_FAULT   0x8 /* lock all pages that are faulted in */
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
diff --git a/arch/tile/include/uapi/asm/mman.h 
b/arch/tile/include/uapi/asm/mman.h
index ec04eaf..66ea935 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -37,6 +37,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 
 #endif /* _ASM_TILE_MMAN_H */
diff --git a/arch/xtensa/include/uapi/asm/mman.h 
b/arch/xtensa/include/uapi/asm/mman.h
index 42d43cc..9abcc29 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -75,6 +75,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-08 Thread Eric B Munson

On Fri, 08 May 2015, Andrew Morton wrote:

 On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com wrote:
 
  mlock() allows a user to control page out of program memory, but this
  comes at the cost of faulting in the entire mapping when it is
  allocated.  For large mappings where the entire area is not necessary
  this is not ideal.
  
  This series introduces new flags for mmap() and mlockall() that allow a
  user to specify that the covered are should not be paged out, but only
  after the memory has been used the first time.
 
 Please tell us much much more about the value of these changes: the use
 cases, the behavioural improvements and performance results which the
 patchset brings to those use cases, etc.
 

The primary use case is for mmaping large files read only.  The process
knows that some of the data is necessary, but it is unlikely that the
entire file will be needed.  The developer only wants to pay the cost to
read the data in once.  Unfortunately developer must choose between
allowing the kernel to page in the memory as needed and guaranteeing
that the data will only be read from disk once.  The first option runs
the risk of having the memory reclaimed if the system is under memory
pressure, the second forces the memory usage and startup delay when
faulting in the entire file.

I am working on getting startup times with and without this change for
an application, I will post them as soon as I have them.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] Add flag to request pages are locked after page fault

2015-05-08 Thread Eric B Munson

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.  This patch introduces
the ability to request that pages are not pre-faulted, but are placed on
the unevictable LRU when they are finally faulted in.

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MAP_LOCKED was used.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 arch/alpha/include/uapi/asm/mman.h   | 1 +
 arch/mips/include/uapi/asm/mman.h| 1 +
 arch/parisc/include/uapi/asm/mman.h  | 1 +
 arch/powerpc/include/uapi/asm/mman.h | 1 +
 arch/sparc/include/uapi/asm/mman.h   | 1 +
 arch/tile/include/uapi/asm/mman.h| 1 +
 arch/xtensa/include/uapi/asm/mman.h  | 1 +
 include/linux/mm.h   | 1 +
 include/linux/mman.h | 3 ++-
 include/uapi/asm-generic/mman.h  | 1 +
 mm/mmap.c| 4 ++--
 mm/swap.c| 3 ++-
 12 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..15e96e1 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -30,6 +30,7 @@
 #define MAP_NONBLOCK   0x4 /* do not block on IO */
 #define MAP_STACK  0x8 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x10/* create a huge page mapping */
+#define MAP_LOCKONFAULT0x20/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_SYNC2   /* synchronous memory sync */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index cfcb876..47846a5 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -48,6 +48,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 /*
  * Flags for msync
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 294d251..1514cd7 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -24,6 +24,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_SYNC1   /* synchronous memory sync */
 #define MS_ASYNC   2   /* sync memory asynchronously */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..fce74fe 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -27,5 +27,6 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 0b14df3..12425d8 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/tile/include/uapi/asm/mman.h 
b/arch/tile/include/uapi/asm/mman.h
index 81b8fc3..ec04eaf 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -29,6 +29,7 @@
 #define MAP_DENYWRITE  0x0800  /* ETXTBSY */
 #define

[PATCH] oprofile, powerpc: Handle events that raise an exception without overflowing

2011-05-23 Thread Eric B Munson

Commit 0837e3242c73566fc1c0196b4ec61779c25ffc93 fixes a situation on POWER7
where events can roll back if a specualtive event doesn't actually complete.
This can raise a performance monitor exception.  We need to catch this to ensure
that we reset the PMC.  In all cases the PMC will be less than 256 cycles from
overflow.

This patch lifts Anton's fix for the problem in perf and applies it to oprofile
as well.

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: sta...@kernel.org # as far back as it applies cleanly
---
 arch/powerpc/oprofile/op_model_power4.c |   24 +++-
 1 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/oprofile/op_model_power4.c 
b/arch/powerpc/oprofile/op_model_power4.c
index 8ee51a2..e6bec74 100644
--- a/arch/powerpc/oprofile/op_model_power4.c
+++ b/arch/powerpc/oprofile/op_model_power4.c
@@ -261,6 +261,28 @@ static int get_kernel(unsigned long pc, unsigned long 
mmcra)
return is_kernel;
 }
 
+static bool pmc_overflow(unsigned long val)
+{
+   if ((int)val  0)
+   return true;
+
+   /*
+* Events on POWER7 can roll back if a speculative event doesn't
+* eventually complete. Unfortunately in some rare cases they will
+* raise a performance monitor exception. We need to catch this to
+* ensure we reset the PMC. In all cases the PMC will be 256 or less
+* cycles from overflow.
+*
+* We only do this if the first pass fails to find any overflowing
+* PMCs because a user might set a period of less than 256 and we
+* don't want to mistakenly reset them.
+*/
+   if (__is_processor(PV_POWER7)  ((0x8000 - val) = 256))
+   return true;
+
+   return false;
+}
+
 static void power4_handle_interrupt(struct pt_regs *regs,
struct op_counter_config *ctr)
 {
@@ -281,7 +303,7 @@ static void power4_handle_interrupt(struct pt_regs *regs,
 
for (i = 0; i  cur_cpu_spec-num_pmcs; ++i) {
val = classic_ctr_read(i);
-   if (val  0) {
+   if (pmc_overflow(val)) {
if (oprofile_running  ctr[i].enabled) {
oprofile_add_ext_sample(pc, regs, i, is_kernel);
classic_ctr_write(i, reset_value[i]);
-- 
1.7.4.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] oprofile, powerpc: Handle events that raise an exception without overflowing

2011-05-23 Thread Eric B Munson

On Mon, 23 May 2011, Eric B Munson wrote:

 Commit 0837e3242c73566fc1c0196b4ec61779c25ffc93 fixes a situation on POWER7
 where events can roll back if a specualtive event doesn't actually complete.
 This can raise a performance monitor exception.  We need to catch this to 
 ensure
 that we reset the PMC.  In all cases the PMC will be less than 256 cycles from
 overflow.
 
 This patch lifts Anton's fix for the problem in perf and applies it to 
 oprofile
 as well.
 
 Signed-off-by: Eric B Munson emun...@mgebm.net
 Cc: sta...@kernel.org # as far back as it applies cleanly

I'd like to get this patch into mainline this merge window if at all possible.

 ---
  arch/powerpc/oprofile/op_model_power4.c |   24 +++-
  1 files changed, 23 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/oprofile/op_model_power4.c 
 b/arch/powerpc/oprofile/op_model_power4.c
 index 8ee51a2..e6bec74 100644
 --- a/arch/powerpc/oprofile/op_model_power4.c
 +++ b/arch/powerpc/oprofile/op_model_power4.c
 @@ -261,6 +261,28 @@ static int get_kernel(unsigned long pc, unsigned long 
 mmcra)
   return is_kernel;
  }
  
 +static bool pmc_overflow(unsigned long val)
 +{
 + if ((int)val  0)
 + return true;
 +
 + /*
 +  * Events on POWER7 can roll back if a speculative event doesn't
 +  * eventually complete. Unfortunately in some rare cases they will
 +  * raise a performance monitor exception. We need to catch this to
 +  * ensure we reset the PMC. In all cases the PMC will be 256 or less
 +  * cycles from overflow.
 +  *
 +  * We only do this if the first pass fails to find any overflowing
 +  * PMCs because a user might set a period of less than 256 and we
 +  * don't want to mistakenly reset them.
 +  */
 + if (__is_processor(PV_POWER7)  ((0x8000 - val) = 256))
 + return true;
 +
 + return false;
 +}
 +
  static void power4_handle_interrupt(struct pt_regs *regs,
   struct op_counter_config *ctr)
  {
 @@ -281,7 +303,7 @@ static void power4_handle_interrupt(struct pt_regs *regs,
  
   for (i = 0; i  cur_cpu_spec-num_pmcs; ++i) {
   val = classic_ctr_read(i);
 - if (val  0) {
 + if (pmc_overflow(val)) {
   if (oprofile_running  ctr[i].enabled) {
   oprofile_add_ext_sample(pc, regs, i, is_kernel);
   classic_ctr_write(i, reset_value[i]);
 -- 
 1.7.4.1
 


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-04-27 Thread Eric B Munson

On Wed, 27 Apr 2011, David Laight wrote:

 I keep telling Eric that the code below is incorrect
 modulo arithimetic...

But it isn't, and it doesn't have trouble with 2^32 - 1.  Here is one done by
hand:

Counter is at 0x and is rolled over to 0x101 (258 counted items so
that we miss the test for rollback).

 0x  (Remember counters are 32 bit, but we store them in 64)
-0x0101
=0x0102

After the mask we have 0x000102, the actual difference between the
counters.

 
  +static u64 check_and_compute_delta(u64 prev, u64 val)
  +{
  +   u64 delta = (val - prev)  0xul;
  +
  +   /*
  +* POWER7 can roll back counter values, if the new value is
 smaller
  +* than the previous value it will cause the delta and the
 counter to
  +* have bogus values unless we rolled a counter over.  If a
 coutner is
  +* rolled back, it will be smaller, but within 256, which is the
 maximum
  +* number of events to rollback at once.  If we dectect a
 rollback
  +* return 0.  This can lead to a small lack of precision in the
  +* counters.
  +*/
  +   if (prev  val  (prev - val)  256)
  +   delta = 0;
  +
  +   return delta;
 
 The code should detect rollback by looking at the value of 'delta'
 otherwise there are horrid end effects near 2^32-1.
 
 For instance:
   u32 delta = val - prev;
   return delta  0x8000 ? 0 : delta;
 
 
David
 
 
 
 


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-04-27 Thread Eric B Munson

On Wed, 27 Apr 2011, David Laight wrote:

  But it isn't, and it doesn't have trouble with 2^32 - 1.  
 
 what about:
 prev = 0x0001
 val  = 0x

Result is 0xfffe and we are fine.
 
   David
 
 


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-04-27 Thread Eric B Munson

On Wed, 27 Apr 2011, David Laight wrote:

  
  +   if (prev  val  (prev - val)  256)
  +   delta = 0;
  +
  +   return delta;
 
 Also, 'prev' is a true 64bit value, but 'val' is only ever 32bits.
 So once the 64bit 'prev' exceeds 2^32+256 both 'prev  val'
 and 'prev - val' are true regardless of the value of 'val'.
 This will lead to jumps in the value 

prev and val are both 64 bit variables holding 32 bit numbers, we do not
accumulate in either, they are both replaced by values directly from the
registers.  So prev  val will not always be true.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-04-27 Thread Eric B Munson

On Wed, 27 Apr 2011, David Laight wrote:

  
  prev and val are both 64 bit variables holding 32 bit numbers, we do
 not
  accumulate in either, they are both replaced by values directly from
 the
  registers.
  So prev  val will not always be true.
 
 The code seems to be:
 prev = local64_read(event-hw.prev_count);
 val = read_pmc(event-hw.idx);
 delta = check_and_compute_delta(prev, val);
 local64_add(delta, event-count);
 Which looks very much like 'prev' being a 64bit counter generated
 from the 32bit pmc register.
 

Which implies that it will only ever be 32 bits wide, just stored in 64.


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V4] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-04-15 Thread Eric B Munson

Because of speculative event roll back, it is possible for some event coutners
to decrease between reads on POWER7.  This causes a problem with the way that
counters are updated.  Delta calues are calculated in a 64 bit value and the
top 32 bits are masked.  If the register value has decreased, this leaves us
with a very large positive value added to the kernel counters.  This patch
protects against this by skipping the update if the delta would be negative.
This can lead to a lack of precision in the coutner values, but from my testing
the value is typcially fewer than 10 samples at a time.

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: sta...@kernel.org
---
Changes from V3:
 Fix delta checking so that only roll backs are discarded
Changes from V2:
 Create a helper that should handle counter roll back as well as registers that
might be allowed to roll over
Changes from V1:
 Updated patch leader
 Added stable CC
 Use an s32 to hold delta values and discard any values that are less than 0

 arch/powerpc/kernel/perf_event.c |   37 ++---
 1 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index c4063b7..822f630 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -398,6 +398,25 @@ static int check_excludes(struct perf_event **ctrs, 
unsigned int cflags[],
return 0;
 }
 
+static u64 check_and_compute_delta(u64 prev, u64 val)
+{
+   u64 delta = (val - prev)  0xul;
+
+   /*
+* POWER7 can roll back counter values, if the new value is smaller
+* than the previous value it will cause the delta and the counter to
+* have bogus values unless we rolled a counter over.  If a coutner is
+* rolled back, it will be smaller, but within 256, which is the maximum
+* number of events to rollback at once.  If we dectect a rollback
+* return 0.  This can lead to a small lack of precision in the
+* counters.
+*/
+   if (prev  val  (prev - val)  256)
+   delta = 0;
+
+   return delta;
+}
+
 static void power_pmu_read(struct perf_event *event)
 {
s64 val, delta, prev;
@@ -416,10 +435,11 @@ static void power_pmu_read(struct perf_event *event)
prev = local64_read(event-hw.prev_count);
barrier();
val = read_pmc(event-hw.idx);
+   delta = check_and_compute_delta(prev, val);
+   if (!delta)
+   return;
} while (local64_cmpxchg(event-hw.prev_count, prev, val) != prev);
 
-   /* The counters are only 32 bits wide */
-   delta = (val - prev)  0xul;
local64_add(delta, event-count);
local64_sub(delta, event-hw.period_left);
 }
@@ -449,8 +469,9 @@ static void freeze_limited_counters(struct cpu_hw_events 
*cpuhw,
val = (event-hw.idx == 5) ? pmc5 : pmc6;
prev = local64_read(event-hw.prev_count);
event-hw.idx = 0;
-   delta = (val - prev)  0xul;
-   local64_add(delta, event-count);
+   delta = check_and_compute_delta(prev, val);
+   if (delta)
+   local64_add(delta, event-count);
}
 }
 
@@ -458,14 +479,16 @@ static void thaw_limited_counters(struct cpu_hw_events 
*cpuhw,
  unsigned long pmc5, unsigned long pmc6)
 {
struct perf_event *event;
-   u64 val;
+   u64 val, prev;
int i;
 
for (i = 0; i  cpuhw-n_limited; ++i) {
event = cpuhw-limited_counter[i];
event-hw.idx = cpuhw-limited_hwidx[i];
val = (event-hw.idx == 5) ? pmc5 : pmc6;
-   local64_set(event-hw.prev_count, val);
+   prev = local64_read(event-hw.prev_count);
+   if (check_and_compute_delta(prev, val))
+   local64_set(event-hw.prev_count, val);
perf_event_update_userpage(event);
}
 }
@@ -1197,7 +1220,7 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
 
/* we don't have to worry about interrupts here */
prev = local64_read(event-hw.prev_count);
-   delta = (val - prev)  0xul;
+   delta = check_and_compute_delta(prev, val);
local64_add(delta, event-count);
 
/*
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-04-07 Thread Eric B Munson

On Thu, 07 Apr 2011, Benjamin Herrenschmidt wrote:

 
   Doesn't that mean that power_pmu_read() can only ever increase the value 
   of
   the perf_event and so will essentially -stop- once the counter rolls over 
   ?
   
   Similar comments every where you do this type of comparison.
   
   Cheers,
   Ben.
  
  Sorry for the nag, but am I missing something about the way the register and
  the previous values are reset in the overflow interrupt handler?
 
 Well, not all counters get interrupts right ? Some counters are just
 free running... I'm not sure when that power_pmu_read() function is
 actually used by the core, I'm not that familiar with perf, but I'd say
 better safe than sorry. When comparing counter values, doing in a way
 that is generally safe vs. wraparounds. Eventually do a helper for that.
 
 Cheers,
 Ben.

I am honestly not sure, I was under the assumption that all counters would
generate an interrupt if they overflowed.  I do not have the hardware docs to
prove this, so I will have a V3 that (I think/hope) addresses your concerns out
momentarily.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-04-07 Thread Eric B Munson

Because of speculative event roll back, it is possible for some event coutners
to decrease between reads on POWER7.  This causes a problem with the way that
counters are updated.  Delta calues are calculated in a 64 bit value and the
top 32 bits are masked.  If the register value has decreased, this leaves us
with a very large positive value added to the kernel counters.  This patch
protects against this by skipping the update if the delta would be negative.
This can lead to a lack of precision in the coutner values, but from my testing
the value is typcially fewer than 10 samples at a time.

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: sta...@kernel.org
---
Changes from V2:
 Create a helper that should handle counter roll back as well as registers that
might be allowed to roll over

Changes from V1:
 Updated patch leader
 Added stable CC
 Use an s32 to hold delta values and discard any values that are less than 0

 arch/powerpc/kernel/perf_event.c |   40 +++--
 1 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index 97e0ae4..78bf933 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -398,6 +398,28 @@ static int check_excludes(struct perf_event **ctrs, 
unsigned int cflags[],
return 0;
 }
 
+static u64 check_and_compute_delta(s64 prev, s64 val)
+{
+   /*
+* Because the PerfMon registers are only 32 bits wide, the delta
+* should not overflow.
+*/
+   u64 delta = 0;
+
+   /*
+* POWER7 can roll back counter values, if the new value is smaller
+* than the previous value it will cause the delta and the counter to
+* have bogus values unless we rolled a counter over.  If this is the
+* case or prev  val, calculate the delta nd return it, otherwise
+* return 0.  This can lead to a small lack of precision in the
+* counters.
+*/
+   if (((prev  0x8000)  !(val  0x8000)) || (val  prev))
+   delta = (val - prev)  0xul;
+
+   return delta;
+}
+
 static void power_pmu_read(struct perf_event *event)
 {
s64 val, delta, prev;
@@ -416,10 +438,11 @@ static void power_pmu_read(struct perf_event *event)
prev = local64_read(event-hw.prev_count);
barrier();
val = read_pmc(event-hw.idx);
+   delta = check_and_compute_delta(prev, val);
+   if (!delta)
+   return;
} while (local64_cmpxchg(event-hw.prev_count, prev, val) != prev);
 
-   /* The counters are only 32 bits wide */
-   delta = (val - prev)  0xul;
local64_add(delta, event-count);
local64_sub(delta, event-hw.period_left);
 }
@@ -449,8 +472,9 @@ static void freeze_limited_counters(struct cpu_hw_events 
*cpuhw,
val = (event-hw.idx == 5) ? pmc5 : pmc6;
prev = local64_read(event-hw.prev_count);
event-hw.idx = 0;
-   delta = (val - prev)  0xul;
-   local64_add(delta, event-count);
+   delta = check_and_compute_delta(prev, val);
+   if (delta)
+   local64_add(delta, event-count);
}
 }
 
@@ -458,14 +482,16 @@ static void thaw_limited_counters(struct cpu_hw_events 
*cpuhw,
  unsigned long pmc5, unsigned long pmc6)
 {
struct perf_event *event;
-   u64 val;
+   u64 val, prev;
int i;
 
for (i = 0; i  cpuhw-n_limited; ++i) {
event = cpuhw-limited_counter[i];
event-hw.idx = cpuhw-limited_hwidx[i];
val = (event-hw.idx == 5) ? pmc5 : pmc6;
-   local64_set(event-hw.prev_count, val);
+   prev = local64_read(event-hw.prev_count);
+   if (check_and_compute_delta(prev, val))
+   local64_set(event-hw.prev_count, val);
perf_event_update_userpage(event);
}
 }
@@ -1197,7 +1223,7 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
 
/* we don't have to worry about interrupts here */
prev = local64_read(event-hw.prev_count);
-   delta = (val - prev)  0xul;
+   delta = check_and_compute_delta(prev, val);
local64_add(delta, event-count);
 
/*
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-04-06 Thread Eric B Munson

On Thu, 31 Mar 2011, Benjamin Herrenschmidt wrote:

 On Wed, 2011-03-30 at 14:36 -0400, Eric B Munson wrote:
  On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote:
  
   On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote:
Here I made the assumption that the hardware would never remove more 
events in
a speculative roll back than it had added.  This is not a situation I
encoutered in my limited testing, so I didn't think underflow was 
possible.  I
will send out a V2 using the signed 32 bit delta and remeber to CC 
stable
this time. 
   
   I'm not thinking about underflow but rollover... or that isn't possible
   with those counters ? IE. They don't wrap back to 0 after hitting
    ?
   
  
  They do roll over to 0 after , but I thought that case was already
  covered by the perf_event_interrupt.  Are you concerned that we will reset a
  counter and speculative roll back will underflow that counter?
 
 No, but take this part of the patch:
 
  --- a/arch/powerpc/kernel/perf_event.c
  +++ b/arch/powerpc/kernel/perf_event.c
  @@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event)
  prev = local64_read(event-hw.prev_count);
  barrier();
  val = read_pmc(event-hw.idx);
  +   /*
  +* POWER7 can roll back counter values, if the new value is
  +* smaller than the previous value it will cause the delta
  +* and the counter to have bogus values.  If this is the
  +* case skip updating anything until the counter grows again.
  +* This can lead to a small lack of precision in the counters.
  +*/
  +   if (val  prev)
  +   return;
  } while (local64_cmpxchg(event-hw.prev_count, prev, val) != prev);
 
 Doesn't that mean that power_pmu_read() can only ever increase the value of
 the perf_event and so will essentially -stop- once the counter rolls over ?
 
 Similar comments every where you do this type of comparison.
 
 Cheers,
 Ben.

Sorry for the nag, but am I missing something about the way the register and
the previous values are reset in the overflow interrupt handler?

Thanks,
Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-03-31 Thread Eric B Munson

On Thu, 31 Mar 2011, Benjamin Herrenschmidt wrote:

 On Wed, 2011-03-30 at 14:36 -0400, Eric B Munson wrote:
  On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote:
  
   On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote:
Here I made the assumption that the hardware would never remove more 
events in
a speculative roll back than it had added.  This is not a situation I
encoutered in my limited testing, so I didn't think underflow was 
possible.  I
will send out a V2 using the signed 32 bit delta and remeber to CC 
stable
this time. 
   
   I'm not thinking about underflow but rollover... or that isn't possible
   with those counters ? IE. They don't wrap back to 0 after hitting
    ?
   
  
  They do roll over to 0 after , but I thought that case was already
  covered by the perf_event_interrupt.  Are you concerned that we will reset a
  counter and speculative roll back will underflow that counter?
 
 No, but take this part of the patch:
 
  --- a/arch/powerpc/kernel/perf_event.c
  +++ b/arch/powerpc/kernel/perf_event.c
  @@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event)
  prev = local64_read(event-hw.prev_count);
  barrier();
  val = read_pmc(event-hw.idx);
  +   /*
  +* POWER7 can roll back counter values, if the new value is
  +* smaller than the previous value it will cause the delta
  +* and the counter to have bogus values.  If this is the
  +* case skip updating anything until the counter grows again.
  +* This can lead to a small lack of precision in the counters.
  +*/
  +   if (val  prev)
  +   return;
  } while (local64_cmpxchg(event-hw.prev_count, prev, val) != prev);
 
 Doesn't that mean that power_pmu_read() can only ever increase the value of
 the perf_event and so will essentially -stop- once the counter rolls over ?
 
 Similar comments every where you do this type of comparison.
 

Sorry for being so dense on this, but I think that when a counter overflows
both the register value and the previous value are reset so we should continue
seeing new event counts after the overflow interrupt handler puts the counter
back into a sane state.  What am I not seeing?

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-03-30 Thread Eric B Munson

On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote:

 On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote:
  Here I made the assumption that the hardware would never remove more events 
  in
  a speculative roll back than it had added.  This is not a situation I
  encoutered in my limited testing, so I didn't think underflow was possible. 
   I
  will send out a V2 using the signed 32 bit delta and remeber to CC stable
  this time. 
 
 I'm not thinking about underflow but rollover... or that isn't possible
 with those counters ? IE. They don't wrap back to 0 after hitting
  ?
 

They do roll over to 0 after , but I thought that case was already
covered by the perf_event_interrupt.  Are you concerned that we will reset a
counter and speculative roll back will underflow that counter?


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-03-29 Thread Eric B Munson

On Tue, 29 Mar 2011, Benjamin Herrenschmidt wrote:

 On Fri, 2011-03-25 at 09:28 -0400, Eric B Munson wrote:
  It is possible on POWER7 for some perf events to have values decrease.  This
  causes a problem with the way the kernel counters are updated.  Deltas are
  computed and then stored in a 64 bit value while the registers are 32 bits
  wide so if new value is smaller than previous value, the delta is a very
  large positive value.  As a work around this patch skips updating the kernel
  counter in when the new value is smaller than the previous.  This can lead 
  to
  a lack of precision in the coutner values, but from my testing the value is
  typcially fewer than 10 samples at a time.
 
 Unfortunately the patch isn't 100% correct I believe:
 
 I think you don't deal with the rollover of the counters. The new value
 could be smaller than the previous one simply because the counter just
 rolled over.
 
 In cases like this:
 
  @@ -449,8 +458,10 @@ static void freeze_limited_counters(struct 
  cpu_hw_events *cpuhw,
  val = (event-hw.idx == 5) ? pmc5 : pmc6;
  prev = local64_read(event-hw.prev_count);
  event-hw.idx = 0;
  -   delta = (val - prev)  0xul;
  -   local64_add(delta, event-count);
  +   if (val = prev) {
  +   delta = (val - prev)  0xul;
  +   local64_add(delta, event-count);
  +   }
  }
   }
 
 I wonder if it isn't easier to just define delta to be a s32, get rid
 of the mask and test if delta is positive, something like:
 
   delta =  val - prev;
   if (delta  0)
   local64_add(delta, event-count);
 
 Wouldn't that be simpler ? Or do I miss a reason why it wouldn't work ?

Here I made the assumption that the hardware would never remove more events in
a speculative roll back than it had added.  This is not a situation I
encoutered in my limited testing, so I didn't think underflow was possible.  I
will send out a V2 using the signed 32 bit delta and remeber to CC stable
this time.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-03-29 Thread Eric B Munson

Because of speculative event roll back, it is possible for some event coutners
to decrease between reads on POWER7.  This causes a problem with the way that
counters are updated.  Delta calues are calculated in a 64 bit value and the
top 32 bits are masked.  If the register value has decreased, this leaves us
with a very large positive value added to the kernel counters.  This patch
protects against this by skipping the update if the delta would be negative.
This can lead to a lack of precision in the coutner values, but from my testing
the value is typcially fewer than 10 samples at a time.

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: sta...@kernel.org
---
Changes from V1:
 Updated patch leader
 Added stable CC
 Use an s32 to hold delta values and discard any values that are less than 0

 arch/powerpc/kernel/perf_event.c |   34 +++---
 1 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index 97e0ae4..0a5178f 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event)
prev = local64_read(event-hw.prev_count);
barrier();
val = read_pmc(event-hw.idx);
+   /*
+* POWER7 can roll back counter values, if the new value is
+* smaller than the previous value it will cause the delta
+* and the counter to have bogus values.  If this is the
+* case skip updating anything until the counter grows again.
+* This can lead to a small lack of precision in the counters.
+*/
+   if (val  prev)
+   return;
} while (local64_cmpxchg(event-hw.prev_count, prev, val) != prev);
 
/* The counters are only 32 bits wide */
@@ -439,7 +448,8 @@ static void freeze_limited_counters(struct cpu_hw_events 
*cpuhw,
unsigned long pmc5, unsigned long pmc6)
 {
struct perf_event *event;
-   u64 val, prev, delta;
+   u64 val, prev;
+   s32 delta;
int i;
 
for (i = 0; i  cpuhw-n_limited; ++i) {
@@ -449,8 +459,13 @@ static void freeze_limited_counters(struct cpu_hw_events 
*cpuhw,
val = (event-hw.idx == 5) ? pmc5 : pmc6;
prev = local64_read(event-hw.prev_count);
event-hw.idx = 0;
-   delta = (val - prev)  0xul;
-   local64_add(delta, event-count);
+   /*
+* The PerfMon registers are only 32 bits wide so the
+* delta should not overflow.
+*/
+   delta = val - prev;
+   if (delta  0)
+   local64_add(delta, event-count);
}
 }
 
@@ -458,14 +473,16 @@ static void thaw_limited_counters(struct cpu_hw_events 
*cpuhw,
  unsigned long pmc5, unsigned long pmc6)
 {
struct perf_event *event;
-   u64 val;
+   u64 val, prev;
int i;
 
for (i = 0; i  cpuhw-n_limited; ++i) {
event = cpuhw-limited_counter[i];
event-hw.idx = cpuhw-limited_hwidx[i];
val = (event-hw.idx == 5) ? pmc5 : pmc6;
-   local64_set(event-hw.prev_count, val);
+   prev = local64_read(event-hw.prev_count);
+   if (val  prev)
+   local64_set(event-hw.prev_count, val);
perf_event_update_userpage(event);
}
 }
@@ -1187,7 +1204,8 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
   struct pt_regs *regs, int nmi)
 {
u64 period = event-hw.sample_period;
-   s64 prev, delta, left;
+   s64 prev, left;
+   s32 delta;
int record = 0;
 
if (event-hw.state  PERF_HES_STOPPED) {
@@ -1197,7 +1215,9 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
 
/* we don't have to worry about interrupts here */
prev = local64_read(event-hw.prev_count);
-   delta = (val - prev)  0xul;
+   delta = val - prev;
+   if (delta  0)
+   delta = 0;
local64_add(delta, event-count);
 
/*
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] POWER: perf_event: Skip updating kernel counters if register value shrinks

2011-03-25 Thread Eric B Munson

It is possible on POWER7 for some perf events to have values decrease.  This
causes a problem with the way the kernel counters are updated.  Deltas are
computed and then stored in a 64 bit value while the registers are 32 bits
wide so if new value is smaller than previous value, the delta is a very
large positive value.  As a work around this patch skips updating the kernel
counter in when the new value is smaller than the previous.  This can lead to
a lack of precision in the coutner values, but from my testing the value is
typcially fewer than 10 samples at a time.

Signed-off-by: Eric B Munson emun...@mgebm.net
---
 arch/powerpc/kernel/perf_event.c |   26 +-
 1 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index 97e0ae4..6752dc1 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event)
prev = local64_read(event-hw.prev_count);
barrier();
val = read_pmc(event-hw.idx);
+   /*
+* POWER7 can roll back counter values, if the new value is
+* smaller than the previous value it will cause the delta
+* and the counter to have bogus values.  If this is the
+* case skip updating anything until the counter grows again.
+* This can lead to a small lack of precision in the counters.
+*/
+   if (val  prev)
+   return;
} while (local64_cmpxchg(event-hw.prev_count, prev, val) != prev);
 
/* The counters are only 32 bits wide */
@@ -449,8 +458,10 @@ static void freeze_limited_counters(struct cpu_hw_events 
*cpuhw,
val = (event-hw.idx == 5) ? pmc5 : pmc6;
prev = local64_read(event-hw.prev_count);
event-hw.idx = 0;
-   delta = (val - prev)  0xul;
-   local64_add(delta, event-count);
+   if (val = prev) {
+   delta = (val - prev)  0xul;
+   local64_add(delta, event-count);
+   }
}
 }
 
@@ -458,14 +469,16 @@ static void thaw_limited_counters(struct cpu_hw_events 
*cpuhw,
  unsigned long pmc5, unsigned long pmc6)
 {
struct perf_event *event;
-   u64 val;
+   u64 val, prev;
int i;
 
for (i = 0; i  cpuhw-n_limited; ++i) {
event = cpuhw-limited_counter[i];
event-hw.idx = cpuhw-limited_hwidx[i];
val = (event-hw.idx == 5) ? pmc5 : pmc6;
-   local64_set(event-hw.prev_count, val);
+   prev = local64_read(event-hw.prev_count);
+   if (val  prev)
+   local64_set(event-hw.prev_count, val);
perf_event_update_userpage(event);
}
 }
@@ -1197,7 +1210,10 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
 
/* we don't have to worry about interrupts here */
prev = local64_read(event-hw.prev_count);
-   delta = (val - prev)  0xul;
+   if (val  prev)
+   delta = 0;
+   else
+   delta = (val - prev)  0xul;
local64_add(delta, event-count);
 
/*
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

2008-07-30 Thread Eric B Munson

On Wed, 30 Jul 2008, Andrew Morton wrote:

 On Mon, 28 Jul 2008 12:17:10 -0700 Eric Munson [EMAIL PROTECTED] wrote:
 
  Certain workloads benefit if their data or text segments are backed by
  huge pages. The stack is no exception to this rule but there is no
  mechanism currently that allows the backing of a stack reliably with
  huge pages.  Doing this from userspace is excessively messy and has some
  awkward restrictions.  Particularly on POWER where 256MB of address space
  gets wasted if the stack is setup there.
  
  This patch stack introduces a personality flag that indicates the kernel
  should setup the stack as a hugetlbfs-backed region. A userspace utility
  may set this flag then exec a process whose stack is to be backed by
  hugetlb pages.
  
  Eric Munson (5):
Align stack boundaries based on personality
Add shared and reservation control to hugetlb_file_setup
Split boundary checking from body of do_munmap
Build hugetlb backed process stacks
[PPC] Setup stack memory segment for hugetlb pages
  
   arch/powerpc/mm/hugetlbpage.c |6 +
   arch/powerpc/mm/slice.c   |   11 ++
   fs/exec.c |  209 
  ++---
   fs/hugetlbfs/inode.c  |   52 +++
   include/asm-powerpc/hugetlb.h |3 +
   include/linux/hugetlb.h   |   22 -
   include/linux/mm.h|1 +
   include/linux/personality.h   |3 +
   ipc/shm.c |2 +-
   mm/mmap.c |   11 ++-
   10 files changed, 284 insertions(+), 36 deletions(-)
 
 That all looks surprisingly straightforward.
 
 Might there exist an x86 port which people can play with?
 

I have tested these patches on x86, x86_64, and ppc64, but not yet on ia64.
There is a user space utility that I have been using to test which would be
included in libhugetlbfs if this is merged into the kernel.  I will send it
out as a reply to this thread, performance numbers are also on the way.

-- 
Eric B Munson
IBM Linux Technology Center
[EMAIL PROTECTED]



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

2008-07-30 Thread Eric B Munson

/***
 *   User front end for using huge pages Copyright (C) 2008, IBM   *
 * *
 *   This program is free software; you can redistribute it and/or modify  *
 *   it under the terms of the Lesser GNU General Public License as*
 *   published by the Free Software Foundation; either version 2.1 of the  *
 *   License, or at your option) any later version.*
 * *
 *   This program is distributed in the hope that it will be useful,   *
 *   but WITHOUT ANY WARRANTY; without even the implied warranty of*
 *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the *
 *   GNU Lesser General Public License for more details.   *
 * *
 *   You should have received a copy of the Lesser GNU General Public  *
 *   License along with this program; if not, write to the *
 *   Free Software Foundation, Inc.,   *
 *   59 Temple Place - Suite 330, Boston, MA  02111-1307, USA. *
 ***/

#include stdlib.h
#include stdio.h
#include errno.h
#include string.h

#define _GNU_SOURCE /* for getopt_long */
#include unistd.h
#include getopt.h
#include sys/personality.h

/* Peronsality bit for huge page backed stack */
#ifndef HUGETLB_STACK
#define HUGETLB_STACK 0x002
#endif

extern int errno;
extern int optind;
extern char *optarg;

void print_usage()
{
fprintf(stderr, hugectl [options] target\n);
fprintf(stderr, options:\n);
fprintf(stderr,  --help,  -h  Prints this message.\n);
fprintf(stderr,
 --stack, -s  Attempts to execute target program with a hugtlb 
page backed stack.\n);
}

void set_huge_stack()
{
char * err;
unsigned long curr_per = personality(0x);
if (personality(curr_per | HUGETLB_STACK) == -1) {
err = strerror(errno);
fprintf(stderr,
Error setting HUGE_STACK personality flag: '%s'\n,
err);
exit(-1);
}
}

int main(int argc, char** argv)
{
char opts [] = +hs;
int ret = 0, index = 0;
struct option long_opts [] = {
{help,  0, 0, 'h'},
{stack, 0, 0, 's'},
{0,   0, 0, 0},
};

if (argc  2) {
print_usage();
return 0;
}

while (ret != -1) {
ret = getopt_long(argc, argv, opts, long_opts, index);
switch (ret) {
case 's':
set_huge_stack();
break;

case '?':
case 'h':
print_usage();
return 0;

case -1:
break;

default:
ret = -1;
break;
}
}
index = optind;

if (execvp(argv[index], argv[index]) == -1) {
ret = errno;
fprintf(stderr, Error calling execvp: '%s'\n, strerror(ret));
return ret;
}

return 0;
}



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH V2] Keep 3 high personality bytes across exec

2008-06-30 Thread Eric B Munson

On Mon, 30 Jun 2008, Paul Mackerras wrote:

 Eric B Munson writes:
 
  --- a/include/asm-powerpc/elf.h
  +++ b/include/asm-powerpc/elf.h
  @@ -257,7 +257,8 @@ do {
  \
  else\
  clear_thread_flag(TIF_ABI_PENDING); \
  if (personality(current-personality) != PER_LINUX32)   \
  -   set_personality(PER_LINUX); \
  +   set_personality(PER_LINUX | \
  +   (current-personality  PER_INHERIT));  \
 
 Couldn't we use ~PER_MASK here instead of PER_INHERIT?  That would
 mean we wouldn't have to modify include/linux/personality.h, and we
 wouldn't have to keep updating PER_INHERIT as more flags get added.
 
 (Nice patch description, BTW.  Thanks.)
 
 Paul.
 

Yeah, ~PER_MASK will work fine.  I used PER_INHERIT first because I
was not sure if there were values that should not be carried forward.
I will have an updated patch out shortly.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH V2] Keep 3 high personality bytes across exec

2008-06-27 Thread Eric B Munson

Currently when a 32 bit process is exec'd on a powerpc 64 bit host the value
in the top three bytes of the personality is clobbered.  This patch adds a
check in the SET_PERSONALITY macro that will carry all the values in the top
three bytes across the exec.

These three bytes currently carry flags to disable address randomisation,
limit the address space, force zeroing of an mmapped page, etc.  Should an
application set any of these bits they will be maintained and honoured on
homogeneous environment but discarded and ignored on a heterogeneous
environment.  So if an application requires all mmapped pages to be initialised
to zero and a wrapper is used to setup the personality and exec the target,
these flags will remain set on an all 32 or all 64 bit envrionment, but they
will be lost in the exec on a mixed 32/64 bit environment.  Losing these bits
means that the same application would behave differently in different
environments.  Tested on a POWER5+ machine with 64bit kernel and a mixed
64/32 bit user space.

Signed-off-by: Eric B Munson [EMAIL PROTECTED]

---
V2

Changes from V1:
Updated changelog with a better description of why this change is useful

Based on 2.6.26-rc6

 include/asm-powerpc/elf.h   |3 ++-
 include/linux/personality.h |6 ++
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/include/asm-powerpc/elf.h b/include/asm-powerpc/elf.h
index 9080d85..2f11a0e 100644
--- a/include/asm-powerpc/elf.h
+++ b/include/asm-powerpc/elf.h
@@ -257,7 +257,8 @@ do {
\
else\
clear_thread_flag(TIF_ABI_PENDING); \
if (personality(current-personality) != PER_LINUX32)   \
-   set_personality(PER_LINUX); \
+   set_personality(PER_LINUX | \
+   (current-personality  PER_INHERIT));  \
 } while (0)
 /*
  * An executable for which elf_read_implies_exec() returns TRUE will
diff --git a/include/linux/personality.h b/include/linux/personality.h
index a84e9ff..362eb90 100644
--- a/include/linux/personality.h
+++ b/include/linux/personality.h
@@ -36,6 +36,12 @@ enum {
ADDR_LIMIT_3GB =0x800,
 };
 
+/* Mask for the above personality values */
+#define PER_INHERIT (ADDR_NO_RANDOMIZE|FDPIC_FUNCPTRS|MMAP_PAGE_ZERO| \
+   ADDR_COMPAT_LAYOUT|READ_IMPLIES_EXEC|ADDR_LIMIT_32BIT| \
+   SHORT_INODE|WHOLE_SECONDS|STICKY_TIMEOUTS| \
+   ADDR_LIMIT_3GB)
+
 /*
  * Security-relevant compatibility flags that must be
  * cleared upon setuid or setgid exec:



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[RFC PATCH] Keep 3 high personality bytes across exec

2008-06-18 Thread Eric B Munson

Currently when a 32 bit process is exec'd on a powerpc 64 bit host the values
of the top three bytes of the personality are clobbered.  This patch adds a
check in the SET_PERSONALITY macro that will carry all the values in the top
three bytes across the exec.

Signed-off-by: Eric B Munson [EMAIL PROTECTED]

---

Based on 2.6.26-rc6

 include/asm-powerpc/elf.h   |3 ++-
 include/linux/personality.h |6 ++
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/include/asm-powerpc/elf.h b/include/asm-powerpc/elf.h
index 9080d85..2f11a0e 100644
--- a/include/asm-powerpc/elf.h
+++ b/include/asm-powerpc/elf.h
@@ -257,7 +257,8 @@ do {
\
else\
clear_thread_flag(TIF_ABI_PENDING); \
if (personality(current-personality) != PER_LINUX32)   \
-   set_personality(PER_LINUX); \
+   set_personality(PER_LINUX | \
+   (current-personality  PER_INHERIT));  \
 } while (0)
 /*
  * An executable for which elf_read_implies_exec() returns TRUE will
diff --git a/include/linux/personality.h b/include/linux/personality.h
index a84e9ff..362eb90 100644
--- a/include/linux/personality.h
+++ b/include/linux/personality.h
@@ -36,6 +36,12 @@ enum {
ADDR_LIMIT_3GB =0x800,
 };
 
+/* Mask for the above personality values */
+#define PER_INHERIT (ADDR_NO_RANDOMIZE|FDPIC_FUNCPTRS|MMAP_PAGE_ZERO| \
+   ADDR_COMPAT_LAYOUT|READ_IMPLIES_EXEC|ADDR_LIMIT_32BIT| \
+   SHORT_INODE|WHOLE_SECONDS|STICKY_TIMEOUTS| \
+   ADDR_LIMIT_3GB)
+
 /*
  * Security-relevant compatibility flags that must be
  * cleared upon setuid or setgid exec:



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[RFC PATCH V2] Keep 3 high personality bytes across exec

2008-06-18 Thread Eric B Munson

Currently when a 32 bit process is exec'd on a powerpc 64 bit host the value
in the top three bytes of the personality is clobbered.  This patch adds a
check in the SET_PERSONALITY macro that will carry all the values in the top
three bytes across the exec.

These three bytes currently carry flags to disable address randomisation,
limit the address space, force zeroing of an mmapped page, etc.  Should an
application set any of these bits they will be maintained and honoured on
homogeneous environment but discarded and ignored on a heterogeneous
environment.  So if an application requires all mmapped pages to be initialised
to zero and a wrapper is used to setup the personality and exec the target,
these flags will remain set on an all 32 or all 64 bit envrionment, but they
will be lost in the exec on a mixed 32/64 bit environment.  Losing these bits
means that the same application would behave differently in different
environments.  Tested on a POWER5+ machine with 64bit kernel and a mixed
64/32 bit user space.

Signed-off-by: Eric B Munson [EMAIL PROTECTED]

---
V2

Changes from V1:
Updated changelog with a better description of why this change is useful

Based on 2.6.26-rc6

 include/asm-powerpc/elf.h   |3 ++-
 include/linux/personality.h |6 ++
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/include/asm-powerpc/elf.h b/include/asm-powerpc/elf.h
index 9080d85..2f11a0e 100644
--- a/include/asm-powerpc/elf.h
+++ b/include/asm-powerpc/elf.h
@@ -257,7 +257,8 @@ do {
\
else\
clear_thread_flag(TIF_ABI_PENDING); \
if (personality(current-personality) != PER_LINUX32)   \
-   set_personality(PER_LINUX); \
+   set_personality(PER_LINUX | \
+   (current-personality  PER_INHERIT));  \
 } while (0)
 /*
  * An executable for which elf_read_implies_exec() returns TRUE will
diff --git a/include/linux/personality.h b/include/linux/personality.h
index a84e9ff..362eb90 100644
--- a/include/linux/personality.h
+++ b/include/linux/personality.h
@@ -36,6 +36,12 @@ enum {
ADDR_LIMIT_3GB =0x800,
 };
 
+/* Mask for the above personality values */
+#define PER_INHERIT (ADDR_NO_RANDOMIZE|FDPIC_FUNCPTRS|MMAP_PAGE_ZERO| \
+   ADDR_COMPAT_LAYOUT|READ_IMPLIES_EXEC|ADDR_LIMIT_32BIT| \
+   SHORT_INODE|WHOLE_SECONDS|STICKY_TIMEOUTS| \
+   ADDR_LIMIT_3GB)
+
 /*
  * Security-relevant compatibility flags that must be
  * cleared upon setuid or setgid exec:



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

93 matches

Mail list logo