Re: [PATCH] limited support for non-root mlock()

2015-07-09 Thread Justus Winter
Quoting Samuel Thibault (2015-07-08 18:22:56)
 Justus Winter, le Wed 08 Jul 2015 16:22:28 +0200, a écrit :
  Alternatively, you could re-purpose the existing RPC `vm_wire',
  changing the type of its first argument from `host_priv_t' to `host_t'
  (this is backwards compatible as the privileged host control port is
  also a host port), and changing the behavior slightly depending on
  whether the user passed the priv port or a normal host port.
 
 Ok, so it'd look like this?

Yes, I think that looks much nicer.

Nitpick: The documentation now says [The function returns]
KERN_INVALID_HOST if host was not the privileged host port, maybe
just ... was not a valid host port.

 (I can't reuse the intran, since the obtained host_t is the same in
 non-priv and priv cases).

Ah, I didn't know that.  But your solution looks reasonable to me.

Cheers,
Justus



Re: [PATCH] limited support for non-root mlock()

2015-07-08 Thread Justus Winter
Hi :)

Quoting Samuel Thibault (2015-07-08 15:55:33)
 I have implemented a start of support for calling mlock() in a non-root
 process, which I have attached.

Sweet.  You talked about that in your FOSDEM talk, right?  Ever since
I thought that managing this in userspace is the better solution, then
again, the implementation is simple enough.

 I had to introduce a newer RPC since the existing vm_wire RPC is
 done on the privileged host port.

Alternatively, you could re-purpose the existing RPC `vm_wire',
changing the type of its first argument from `host_priv_t' to `host_t'
(this is backwards compatible as the privileged host control port is
also a host port), and changing the behavior slightly depending on
whether the user passed the priv port or a normal host port.

 It for now allows 64KiB mlocked memory per task (the default Linux
 value).

Isn't a per-process limit rather pointless?  I thought Linux limit is
per-user.

 Could somebody review it? (in particular on the name of the RPC)

Looks good, modulo maybe what I wrote above wrt to just using
`vm_wire'.

Oh, and why did you chose `mach4.defs' to add a new RPC?  I mean, it
doesn't really matter, the Mach API isn't properly grouped into
protocols, and there is just one huge dispatcher, but I thought we put
new definitions into `gnumach.defs'.

Cheers,
Justus



Re: [PATCH] limited support for non-root mlock()

2015-07-08 Thread Samuel Thibault
Hello,

Justus Winter, le Wed 08 Jul 2015 16:22:28 +0200, a écrit :
 You talked about that in your FOSDEM talk, right?

Yes, although since I hadn't had any look at it at the time, I hadn't
realized that passing host==NULL wasn't an option (since we need a port
to make the RPC on anyway).

 Ever since I thought that managing this in userspace is the better
 solution,

Well, the actual enforcement has to be done where default paging
happens, otherwise it'd be a security issue.

  I had to introduce a newer RPC since the existing vm_wire RPC is
  done on the privileged host port.
 
 Alternatively, you could re-purpose the existing RPC `vm_wire',
 changing the type of its first argument from `host_priv_t' to `host_t'
 (this is backwards compatible as the privileged host control port is
 also a host port), and changing the behavior slightly depending on
 whether the user passed the priv port or a normal host port.

Ah, right.

  It for now allows 64KiB mlocked memory per task (the default Linux
  value).
 
 Isn't a per-process limit rather pointless?  I thought Linux limit is
 per-user.

On Linux it's per process too, changing ulimit -l in a process won't
affect another process.

The issue with mlocked memory is mostly about eating non-pageable
memory. Creating processes already eats non-pageable memory anyway, so
it doesn't buy much to eat a bit more non-pageable memory per process.

 Oh, and why did you chose `mach4.defs' to add a new RPC?  I mean, it
 doesn't really matter, the Mach API isn't properly grouped into
 protocols, and there is just one huge dispatcher, but I thought we put
 new definitions into `gnumach.defs'.

I don't really know actually. I'd tend to think that mach4.defs is for
newer RPCs which make sense on oldish Mach, while gnumach.defs is for
GNU-introduced features, but I really don't know the rules.

Samuel



Re: [PATCH] limited support for non-root mlock()

2015-07-08 Thread Richard Braun
On Wed, Jul 08, 2015 at 04:32:36PM +0200, Samuel Thibault wrote:
  Isn't a per-process limit rather pointless?  I thought Linux limit is
  per-user.
 
 On Linux it's per process too, changing ulimit -l in a process won't
 affect another process.
 
 The issue with mlocked memory is mostly about eating non-pageable
 memory. Creating processes already eats non-pageable memory anyway, so
 it doesn't buy much to eat a bit more non-pageable memory per process.

The rationale is that you restrict the number of processes per user
instead, so it's fine.

  Oh, and why did you chose `mach4.defs' to add a new RPC?  I mean, it
  doesn't really matter, the Mach API isn't properly grouped into
  protocols, and there is just one huge dispatcher, but I thought we put
  new definitions into `gnumach.defs'.
 
 I don't really know actually. I'd tend to think that mach4.defs is for
 newer RPCs which make sense on oldish Mach, while gnumach.defs is for
 GNU-introduced features, but I really don't know the rules.

At this point, we make the rules, and it was decided we would put our
own RPCs in gnumach.defs. The mach4.defs file is really meant to contain
stuff that came with Mach 4.

-- 
Richard Braun



Re: [PATCH] limited support for non-root mlock()

2015-07-08 Thread Samuel Thibault
Justus Winter, le Wed 08 Jul 2015 16:22:28 +0200, a écrit :
 Alternatively, you could re-purpose the existing RPC `vm_wire',
 changing the type of its first argument from `host_priv_t' to `host_t'
 (this is backwards compatible as the privileged host control port is
 also a host port), and changing the behavior slightly depending on
 whether the user passed the priv port or a normal host port.

Ok, so it'd look like this? (I can't reuse the intran, since the
obtained host_t is the same in non-priv and priv cases).

Samuel
diff --git a/doc/mach.texi b/doc/mach.texi
index 59872c9..2d127fb 100644
--- a/doc/mach.texi
+++ b/doc/mach.texi
@@ -3241,14 +3241,15 @@ successfully set and @code{KERN_INVALID_ADDRESS} if an 
invalid or
 non-allocated address was specified.
 @end deftypefun
 
-@deftypefun kern_return_t vm_wire (@w{host_priv_t @var{host_priv}}, 
@w{vm_task_t @var{target_task}}, @w{vm_address_t @var{address}}, @w{vm_size_t 
@var{size}}, @w{vm_prot_t @var{access}})
-The function @code{vm_wire} allows privileged applications to control
-memory pageability.  @var{host_priv} is the privileged host port for the
+@deftypefun kern_return_t vm_wire (@w{host_t @var{host}}, @w{vm_task_t 
@var{target_task}}, @w{vm_address_t @var{address}}, @w{vm_size_t @var{size}}, 
@w{vm_prot_t @var{access}})
+The function @code{vm_wire} allows applications to control
+memory pageability.  @var{host} is the host port for the
 host on which @var{target_task} resides.  @var{address} is the starting
 address, which will be rounded down to a page boundary.  @var{size} is
 the size in bytes of the region for which protection is to change, and
 will be rounded up to give a page boundary.  @var{access} specifies the
-types of accesses that must not cause page faults.
+types of accesses that must not cause page faults.  If the host port is
+not privileged, the amount of memory is limited per task.
 
 The semantics of a successful @code{vm_wire} operation are that memory
 in the specified range will not cause page faults for any accesses
@@ -3257,7 +3258,7 @@ access argument of @code{VM_PROT_READ | VM_PROT_WRITE}.  
A special case
 is that @code{VM_PROT_NONE} makes the memory pageable.
 
 The function returns @code{KERN_SUCCESS} if the call succeeded,
-@code{KERN_INVALID_HOST} if @var{host_priv} was not the privileged host
+@code{KERN_INVALID_HOST} if @var{host} was not the privileged host
 port, @code{KERN_INVALID_TASK} if @var{task} was not a valid task,
 @code{KERN_INVALID_VALUE} if @var{access} specified an invalid access
 mode, @code{KERN_FAILURE} if some memory in the specified range is not
@@ -3265,7 +3266,7 @@ present or has an inappropriate protection value, and
 @code{KERN_INVALID_ARGUMENT} if unwiring (@var{access} is
 @code{VM_PROT_NONE}) and the memory is not already wired.
 
-The @code{vm_wire} call is actually an RPC to @var{host_priv}, normally
+The @code{vm_wire} call is actually an RPC to @var{host}, normally
 a send right for a privileged host port, but potentially any send right.
 In addition to the normal diagnostic return codes from the call's server
 (normally the kernel), the call may return @code{mach_msg} return codes.
diff --git a/include/mach/mach_host.defs b/include/mach/mach_host.defs
index 6699a50..28439a0 100644
--- a/include/mach/mach_host.defs
+++ b/include/mach/mach_host.defs
@@ -296,7 +296,7 @@ routine host_reboot(
  * [ To unwire the pages, specify VM_PROT_NONE. ]
  */
 routinevm_wire(
-   host_priv   : host_priv_t;
+   host: mach_port_t;
task: vm_task_t;
address : vm_address_t;
size: vm_size_t;
diff --git a/vm/vm_map.c b/vm/vm_map.c
index 6b13724..ae3ce21 100644
--- a/vm/vm_map.c
+++ b/vm/vm_map.c
@@ -208,6 +208,7 @@ void vm_map_setup(
rbtree_init(map-hdr.tree);
 
map-size = 0;
+   map-user_wired = 0;
map-ref_count = 1;
map-pmap = pmap;
map-min_offset = min;
@@ -1409,7 +1410,10 @@ kern_return_t vm_map_pageable_common(
 
if (user_wire) {
if (--(entry-user_wired_count) == 0)
+   {
+   map-user_wired -= entry-vme_end - 
entry-vme_start;
entry-wired_count--;
+   }
}
else {
entry-wired_count--;
@@ -1486,7 +1490,10 @@ kern_return_t vm_map_pageable_common(
 
if (user_wire) {
if ((entry-user_wired_count)++ == 0)
+   {
+   map-user_wired += entry-vme_end - 
entry-vme_start;
entry-wired_count++;
+   }
}
else {
entry-wired_count++;
@@ -1512,6 +1519,7 @@ kern_return_t vm_map_pageable_common(
(entry-vme_end  start)) {
  

Re: [PATCH] limited support for non-root mlock()

2015-07-08 Thread Samuel Thibault
Samuel Thibault, le Wed 08 Jul 2015 18:22:56 +0200, a écrit :
 Justus Winter, le Wed 08 Jul 2015 16:22:28 +0200, a écrit :
  Alternatively, you could re-purpose the existing RPC `vm_wire',
  changing the type of its first argument from `host_priv_t' to `host_t'
  (this is backwards compatible as the privileged host control port is
  also a host port), and changing the behavior slightly depending on
  whether the user passed the priv port or a normal host port.
 
 Ok, so it'd look like this? (I can't reuse the intran, since the
 obtained host_t is the same in non-priv and priv cases).

I forgot the glibc part, which becomes different.

Samuel
diff --git a/sysdeps/mach/hurd/mlock.c b/sysdeps/mach/hurd/mlock.c
index 14c311c..ed54166 100644
--- a/sysdeps/mach/hurd/mlock.c
+++ b/sysdeps/mach/hurd/mlock.c
@@ -28,19 +28,21 @@
 int
 mlock (const void *addr, size_t len)
 {
-  mach_port_t hostpriv;
+  mach_port_t host;
   vm_address_t page;
   error_t err;
 
-  err = __get_privileged_ports (hostpriv, NULL);
-  if (err)
-return __hurd_fail (EPERM);
-
   page = trunc_page ((vm_address_t) addr);
   len = round_page ((vm_address_t) addr + len) - page;
-  err = __vm_wire (hostpriv, __mach_task_self (), page, len,
+
+  err = __get_privileged_ports (host, NULL);
+  if (err)
+host = __mach_host_self();
+
+  err = __vm_wire (host, __mach_task_self (), page, len,
   VM_PROT_READ);
-  __mach_port_deallocate (__mach_task_self (), hostpriv);
+  if (host != __mach_host_self())
+__mach_port_deallocate (__mach_task_self (), host);
 
   return err ? __hurd_fail (err) : 0;
 }
diff --git a/sysdeps/mach/hurd/munlock.c b/sysdeps/mach/hurd/munlock.c
index c03af90..e2fe21c 100644
--- a/sysdeps/mach/hurd/munlock.c
+++ b/sysdeps/mach/hurd/munlock.c
@@ -27,18 +27,20 @@
 int
 munlock (const void *addr, size_t len)
 {
-  mach_port_t hostpriv;
+  mach_port_t host;
   vm_address_t page;
   error_t err;
 
-  err = __get_privileged_ports (hostpriv, NULL);
-  if (err)
-return __hurd_fail (EPERM);
-
   page = trunc_page ((vm_address_t) addr);
   len = round_page ((vm_address_t) addr + len) - page;
-  err = __vm_wire (hostpriv, __mach_task_self (), page, len, VM_PROT_NONE);
-  __mach_port_deallocate (__mach_task_self (), hostpriv);
+
+  err = __get_privileged_ports (host, NULL);
+  if (err)
+host = __mach_host_self();
+
+  err = __vm_wire (host, __mach_task_self (), page, len, VM_PROT_NONE);
+  if (host != __mach_host_self())
+__mach_port_deallocate (__mach_task_self (), host);
 
   return err ? __hurd_fail (err) : 0;
 }