[bug #49730] /proc/PID/environ returns I/O errors on read attempts

2016-11-28 Thread Brent Baccala
URL:
  

 Summary: /proc/PID/environ returns I/O errors on read
attempts
 Project: The GNU Hurd
Submitted by: baccala
Submitted on: Tue 29 Nov 2016 07:42:00 AM GMT
Category: Hurd Servers
Severity: 3 - Normal
Priority: 5 - Normal
  Item Group: None
  Status: None
 Privacy: Public
 Assigned to: None
 Originator Name: 
Originator Email: 
 Open/Closed: Open
 Discussion Lock: Any
 Reproducibility: None
  Size (loc): None
 Planned Release: None
  Effort: 0.00
Wiki-like text discussion box: 

___

Details:

I'm having problems retrieving process environment from /proc.  Something like
"cat /proc/936/environ" often (but not always) returns an I/O error, while
"msgport --getenv -p 936" works fine.  I'm not sure exactly which processes
suffer from this, but translators seem particularly vulnerable.

Studying the problem with rpctrace, it seems that there are two different ways
to obtain a process's environment.  The proc server hold a pointer to the
environment array (in the process's address space, not its own), and then
vm_read's it to answer proc_procgetenv RPCs.  This is how procfs does it, and
this is what isn't working.

I've written a short program (attached) to fetch the argument locations and
print them.  Comparing this output to /proc/PID/maps shows that on the
affected processes, the addresses appear sane, but do not correspond to mapped
memory locations, thus the I/O errors.

The other way to get a process's environment is to fetch the process's msg
port and query it using a msg_get_environment RPC.  The msgport program does
this, and is able to successfully retrieve environment information.

Related: Do we need two different ways to retrieve a process's environment?



___

File Attachments:


---
Date: Tue 29 Nov 2016 07:42:00 AM GMT  Name: get_arg_locations.c  Size: 380B  
By: baccala



___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




Re: Unreclaimed swap space upon process termination?

2016-11-28 Thread Samuel Thibault
Hello,

Thomas Schwinge, on Mon 28 Nov 2016 17:10:26 +0100, wrote:
> ..., but on the new ("bad") system, the first non-sensical (huge;
> -2147479552 is 0x80001000) vm_allocate call actually succeeds:

Yes, the userland address space is now 3G again, so it can now indeed
allocate that much.

> I have not yet figured out where these vm_allocate calls and/or their
> huge size parameters are coming from.

Probably the real source of the issue :)

Samuel



Re: Unreclaimed swap space upon process termination?

2016-11-28 Thread Thomas Schwinge
Hi!

On Mon, 28 Nov 2016 16:03:44 +0100, I wrote:
> Updating a Debian GNU/Hurd virtual machine to recent packages after many
> months, and then running the GCC testsuite, I observe the following
> behavior, which should be reproducible with the executable in the
> attached tarball:
> 
> $ vmstat | grep swap\ free
> swap free: 4096M
> $ ./1.exe 
> $ vmstat | grep swap\ free
> swap free: 3288M
> $ ./1.exe 
> $ vmstat | grep swap\ free
> swap free: 2495M
> $ ./1.exe 
> $ vmstat | grep swap\ free
> swap free: 1726M
> $ ./1.exe 
> $ vmstat | grep swap\ free
> swap free:  931M
> $ ./1.exe 
> $ vmstat | grep swap\ free
> swap free:  164M
> $ ./1.exe 
> Bus error
> $ vmstat | grep swap\ free
> swap free:0 
> 
> At this point, the system doesn't recover from this low memory situation.
> 
> For each invocation of the executable, there are three "no more room in
> [...]  (./1.exe([...])" messages on the Mach console.
> 
> The executable is compiled from
> [gcc]/libstdc++-v3/testsuite/21_strings/basic_string/modifiers/insert/char/1.cc
> from commit a050099a416f013bda35832b878d9a57b0cbb231 (gcc-6-branch branch
> point; 2016-04-15), which doesn't look very spectacular -- apart from
> maybe the __gnu_test::set_memory_limits call, which I'll try to figure
> out what it does.

That uses setrlimit for RLIMIT_DATA, RLIMIT_RSS, RLIMIT_VMEM, RLIMIT_AS,
but there is no change with that call removed.

> But nevertheless, unreclaimed swap space upon process
> termination sounds like a bug?
> 
> Unless this a known issue, or somebody can quickly pinpoint the problem,
> I'll try to bisect core system packages, between the version of the
> "good" and "bad" disk images.

Running with rpctrace, I see that on the old ("good") system, at the end
of the process, we got:

[...]
task52(pid2198)->vm_deallocate (16973824 16) = 0 
task52(pid2198)->vm_allocate (0 -2147479552 1) = 0x3 ((os/kern) no space 
available) 
task52(pid2198)->vm_allocate (0 -2147348480 1) = 0x3 ((os/kern) no space 
available) 
task52(pid2198)->vm_map (0 2097152 0 1  (null) 0 1 0 7 1) = 0 21405696
task52(pid2198)->vm_deallocate (21405696 614400) = 0 
task52(pid2198)->vm_deallocate (23068672 434176) = 0 
task52(pid2198)->vm_protect (22020096 135168 0 3) = 0 
task52(pid2198)->vm_allocate (0 -2147479552 1) = 0x3 ((os/kern) no space 
available) 
  61<--68(pid2198)->proc_mark_exit_request (0 0) = 0 
task52(pid2198)->task_terminate () = 0 
Child 2198 exited with 0

..., but on the new ("bad") system, the first non-sensical (huge;
-2147479552 is 0x80001000) vm_allocate call actually succeeds:

[...]
task154(pid1080)->vm_deallocate (16973824 16) = 0 
task154(pid1080)->vm_allocate (0 -2147479552 1) = 0 268742656
task154(pid1080)->vm_allocate (0 -2147479552 1) = 0x3 ((os/kern) no space 
available) 
task154(pid1080)->vm_allocate (0 -2147348480 1) = 0x3 ((os/kern) no space 
available) 
task154(pid1080)->vm_map (0 2097152 0 1  (null) 0 1 0 7 1) = 0 2162
task154(pid1080)->vm_deallocate (2162 364544) = 0 
task154(pid1080)->vm_deallocate (23068672 684032) = 0 
task154(pid1080)->vm_protect (22020096 135168 0 3) = 0 
task154(pid1080)->vm_allocate (0 -2147479552 1) = 0x3 ((os/kern) no space 
available) 
task154(pid1080)->vm_deallocate (268742656 -2147479552) = 0 
  163<--170(pid1080)->proc_mark_exit_request (0 0) = 0 
task154(pid1080)->task_terminate () = 0 
Child 1080 exited with 0

I have not yet figured out where these vm_allocate calls and/or their
huge size parameters are coming from.


Grüße
 Thomas



Re: Unreclaimed swap space upon process termination?

2016-11-28 Thread Samuel Thibault
Hello,

Thomas Schwinge, on Mon 28 Nov 2016 16:03:44 +0100, wrote:
> But nevertheless, unreclaimed swap space upon process
> termination sounds like a bug?

It's actually not really a new problem.  It has just been emphasized
more with recent memory management changes.

Samuel



Re: Unreclaimed swap space upon process termination?

2016-11-28 Thread Richard Braun
On Mon, Nov 28, 2016 at 04:03:44PM +0100, Thomas Schwinge wrote:
> Unless this a known issue, or somebody can quickly pinpoint the problem,
> I'll try to bisect core system packages, between the version of the
> "good" and "bad" disk images.

It's a known issue, a side effect of the page cache changes that I
merged some months ago. Because of increased cache usage, there is
more memory pressure, and therefore more swap usage (mostly through
double paging, since the kernel always tries to evict external pages
first). The default pager has never been able to correctly reclaim
these pages in the past, but now there are a lot more of them.

-- 
Richard Braun



Unreclaimed swap space upon process termination?

2016-11-28 Thread Thomas Schwinge
Hi!

Updating a Debian GNU/Hurd virtual machine to recent packages after many
months, and then running the GCC testsuite, I observe the following
behavior, which should be reproducible with the executable in the
attached tarball:

$ vmstat | grep swap\ free
swap free: 4096M
$ ./1.exe 
$ vmstat | grep swap\ free
swap free: 3288M
$ ./1.exe 
$ vmstat | grep swap\ free
swap free: 2495M
$ ./1.exe 
$ vmstat | grep swap\ free
swap free: 1726M
$ ./1.exe 
$ vmstat | grep swap\ free
swap free:  931M
$ ./1.exe 
$ vmstat | grep swap\ free
swap free:  164M
$ ./1.exe 
Bus error
$ vmstat | grep swap\ free
swap free:0 

At this point, the system doesn't recover from this low memory situation.

For each invocation of the executable, there are three "no more room in
[...]  (./1.exe([...])" messages on the Mach console.

The executable is compiled from
[gcc]/libstdc++-v3/testsuite/21_strings/basic_string/modifiers/insert/char/1.cc
from commit a050099a416f013bda35832b878d9a57b0cbb231 (gcc-6-branch branch
point; 2016-04-15), which doesn't look very spectacular -- apart from
maybe the __gnu_test::set_memory_limits call, which I'll try to figure
out what it does.  But nevertheless, unreclaimed swap space upon process
termination sounds like a bug?

Unless this a known issue, or somebody can quickly pinpoint the problem,
I'll try to bisect core system packages, between the version of the
"good" and "bad" disk images.


Grüße
 Thomas




1.tar.xz
Description: application/xz


Re: C++ vs. glibc/Hurd/Mach headers

2016-11-28 Thread Thomas Schwinge
Hi!

On Sun, 27 Nov 2016 17:14:26 +0100, Samuel Thibault  
wrote:
> Thomas Schwinge, on Sat 26 Nov 2016 19:53:34 +0100, wrote:
> > When changing the GDB source code to use kern_return_t (or int) instead
> > of error_t, I still see hurd.h:__hurd_fail and
> > hurd/signal.h:HURD_MSGPORT_RPC choke on their own error_t usage:
> > 
> > $ echo -e '#include \n#include \n#include 
> > \n#include \nvoid f(){ kern_return_t err = 0; err = 
> > thread_get_state(0,0,0,0); err = HURD_MSGPORT_RPC(0,0,0,0); }' | g++ 
> > -D_GNU_SOURCE -x c++ - -S -o /dev/null -O2
> 
> Don't use 0, but ESUCCESS.

;-) I do know about ESUCCESS (I added it), but the literal 0 (int) here
are to model the fact that Mach API calls have a return type of
kern_return_t (int).  (Thus, I will change GDB's "err" variables from
error_t to kern_return_t.)

> > In file included from /usr/include/errno.h:35:0,
> >  from :1:
> > /usr/include/hurd.h: In function ‘int __hurd_fail(error_t)’:
> > /usr/include/hurd.h:60:13: error: invalid conversion from ‘int’ to 
> > ‘error_t {aka __error_t_codes}’ [-fpermissive]
> >err = EIEIO;
> >  ^
> > /usr/include/hurd.h:64:13: error: invalid conversion from ‘int’ to 
> > ‘error_t {aka __error_t_codes}’ [-fpermissive]
> >err = ENOMEM;
> >  ^
> > /usr/include/hurd.h:68:13: error: invalid conversion from ‘int’ to 
> > ‘error_t {aka __error_t_codes}’ [-fpermissive]
> >err = EINVAL;

This remains to be fixed; can you please commit your patch?

> The HURD_MSGPORT_RPC seems missing casts between kern_error and error_t
> indeed.

Thanks for changing this code.  Though, the explicit casts are also not
completely ideal, as they now hide other kinds of problems, for example:

$ echo -e '#include \n#include \n#include 
\nvoid f(){ error_t err = HURD_MSGPORT_RPC(,,,); 
}' | gcc -D_GNU_SOURCE -x c - -S -o /dev/null -O2

... in C compilation mode now no longer diagnoses "error: incompatible
types when assigning [...]".  Oh well...  ;-/


Grüße
 Thomas