[valgrind] [Bug 342040] Valgrind mishandles clone with CLONE_VFORK | CLONE_VM that clones to a different stack

2016-12-13 Thread Nach
https://bugs.kde.org/show_bug.cgi?id=342040

--- Comment #7 from Nach  ---
I ran a bunch more tests to ensure stability with the stack in the child.

It's not using the exact stack the parent is specifying to use (beyond setting
some a few bytes at the top of it), which may or may not be a problem for some
(application which save child state?). However I can confirm that the stack in
the child is as big as it needs to be. The larger the stack I tell clone() to
use, the larger the stack appears to be in the child (which without would stack
overflow). So it would seem any sane application using clone() with a stack
which doesn't care about actually getting the stack data directly in the parent
seems to work as expected.

On very large stacks I did see this warning:
==8967== Warning: client switching stacks?  SP change: 0xa5fd030 --> 0x65fd020
==8967==  to suppress, use: --max-stackframe=67108880 or greater

But again, things work as expected. So all my testing shows this patch is good
except it does not add support for more advanced cases (CLONE_VM, child stack
access in parent).

I cannot speak for the original bug reporter, but for the cases I'm dealing
with, I consider this bug fixed.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 342040] Valgrind mishandles clone with CLONE_VFORK | CLONE_VM that clones to a different stack

2016-12-11 Thread Philippe Waroquiers
https://bugs.kde.org/show_bug.cgi?id=342040

--- Comment #6 from Philippe Waroquiers  ---
(In reply to Nach from comment #4)
> Strace now shows: clone(child_stack=NULL,
> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x7f8e5944d9d0) = 16514
> 
> Which I'm not sure what to make of.
The above is normal. Valgrind does not keep the CLONE_VM flag, as there
is no way to support this: valgrind has to do a lot of work in the child
and with CLONE_VM, any work done on any Valgrind data structure would
corrupt the parent Valgrind data structures.


> 
> 
> Anyways, I'm setting a seperate stack to use, and some basic testing shows
> things are behaving sanely, although I'm not certain the stack I'm
> specifying is even being used.
The fact that you see a NULL stack is also normal: this is the 'valgrind host
stack'. The stack that you provide in the clone syscall is set as the
guest stack (not visible in the syscall, this is only visible in the simulated
CPU SP register).


> Of the flags, CLONE_VM|CLONE_VFORK|SIGCHLD, SIGCHLD is indeed being
> delivered to the parent, and the child appears to be scheduled before the
> parent (at least when tested across a bunch of runs), although the memory is
> not being shared, so it would appear CLONE_VM is being ignored.
In the final version of the commited patch, CLONE_VFORK is kept.
There is no way to support CLONE_VM.
> 
> For the needs of a well implemented posix_spawn() such as in musl or mine,
> CLONE_VM and CLONE_VFORK are really just optimizations, so it matters little
> if either was not respected. The main issue is to ensure a seperate stack is
> in operation in the child in order to not mangle the parent's, and
> preliminary testing shows it now appears to be fine.
> 
> I'll do further testing with more complicated cases (heavier stack use in
> child) to ensure whatever clone mangling is done seems to work for these
> scenerios. However, if a library/application is actually expecting the
> shared memory semantics, it seems a little more work is required.
Thanks for any additional testing or verification you could do.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 342040] Valgrind mishandles clone with CLONE_VFORK | CLONE_VM that clones to a different stack

2016-12-11 Thread Philippe Waroquiers
https://bugs.kde.org/show_bug.cgi?id=342040

Philippe Waroquiers  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Philippe Waroquiers  ---
Fixed in revision 16186.
Retesting welcome.

Thanks for the test case, added as none/tests/linux/clonev

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 342040] Valgrind mishandles clone with CLONE_VFORK | CLONE_VM that clones to a different stack

2016-12-08 Thread Nach
https://bugs.kde.org/show_bug.cgi?id=342040

--- Comment #4 from Nach  ---
Hello Philippe,

I double checked with Valgrind 3.12.0 to ensure the bug was still occuring (it
was), and then applied your patch.
The basic code I was testing no longer crashes and seems to be mostly working
as it should with your patch!

Strace now shows: clone(child_stack=NULL,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f8e5944d9d0) = 16514

Which I'm not sure what to make of.


Anyways, I'm setting a seperate stack to use, and some basic testing shows
things are behaving sanely, although I'm not certain the stack I'm specifying
is even being used.

Of the flags, CLONE_VM|CLONE_VFORK|SIGCHLD, SIGCHLD is indeed being delivered
to the parent, and the child appears to be scheduled before the parent (at
least when tested across a bunch of runs), although the memory is not being
shared, so it would appear CLONE_VM is being ignored.

For the needs of a well implemented posix_spawn() such as in musl or mine,
CLONE_VM and CLONE_VFORK are really just optimizations, so it matters little if
either was not respected. The main issue is to ensure a seperate stack is in
operation in the child in order to not mangle the parent's, and preliminary
testing shows it now appears to be fine.

I'll do further testing with more complicated cases (heavier stack use in
child) to ensure whatever clone mangling is done seems to work for these
scenerios. However, if a library/application is actually expecting the shared
memory semantics, it seems a little more work is required.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 342040] Valgrind mishandles clone with CLONE_VFORK | CLONE_VM that clones to a different stack

2016-12-08 Thread Philippe Waroquiers
https://bugs.kde.org/show_bug.cgi?id=342040

--- Comment #3 from Philippe Waroquiers  ---
Can you please try the patch attached to bug 373192 ?
Thanks

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 342040] Valgrind mishandles clone with CLONE_VFORK | CLONE_VM that clones to a different stack

2016-12-07 Thread Philippe Waroquiers
https://bugs.kde.org/show_bug.cgi?id=342040

Philippe Waroquiers  changed:

   What|Removed |Added

 CC||philippe.waroquiers@skynet.
   ||be

--- Comment #2 from Philippe Waroquiers  ---
(In reply to Steven Stewart-Gallus from comment #0)
> Typically, clone is used with the CLONE_VFORK | CLONE_VM options to emulate
> the vfork system call but with additional options. However, it is extremely
> awkward and dangerous to use vfork functionality in such a way that stacks
> are shared. To make things safer and avoid getting into trouble with
> compilers like Clang misoptimizing things I plan to use clone with
> CLONE_VFORK | CLONE_VM and not share stacks but jump to a newly allocated
> one. CLONE_VM alone without CLONE_VFORK is explicitly unsupported by
> Valgrind.
CLONE_VM without CLONE_VFORK is effectively unsupported.
However, the way valgrind "supports" CLONE_VM|CLONE_VFORK is to just
remove these 2 flags: valgrind does not support vfork semantic, and
so transforms a vfork into a fork, hoping the application does not
depends on the vfork semantic.

See the following code in e.g. syswrap-amd64-linux.c:
   case VKI_CLONE_VFORK | VKI_CLONE_VM: /* vfork */
  /* FALLTHROUGH - assume vfork == fork */
  cloneflags &= ~(VKI_CLONE_VFORK | VKI_CLONE_VM);

   case 0: /* plain fork */
  ...

That being said, it is unclear why SEGV is raised.
A simple fork test (none/tests/fork) gives the following strace -f valgrind:
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f15d396a9d0) = 18741

With your test program, strace -f valgrind gives:
clone(child_stack=0, flags=SIGCHLD) = 18774

So, I would guess that what causes the SEGV is not the child stack being 0,
as this looks 'classical' when vfork-ing or clone-ing under Valgrind,
but rather the CLEARTID/SETTID and child_tidptr aspects.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 342040] Valgrind mishandles clone with CLONE_VFORK | CLONE_VM that clones to a different stack

2016-08-28 Thread Nach via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=342040

Nach  changed:

   What|Removed |Added

 CC||nac...@gmail.com

--- Comment #1 from Nach  ---
While testing my own implementation of posix_spawn()
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html)
using the following snippet:

char stack[4096];
pid_t pid = clone(child, stack+sizeof(stack), CLONE_VM|CLONE_VFORK|SIGCHLD,
args);

I also noticed this being mishandled. Running valgrind (valgrind-3.12.0.SVN)
through strace, I see valgrind is running this code as:
clone(child_stack=0, flags=SIGCHLD) = 4070

While dropping off some of the flags is annoying, and it really should NOT be
doing that, it's setting the child_stack to 0! This is a garaunteed
segmentation fault or other nasty behavior occuring. Notice the address in
above message  "Access not within mapped region at address 0xFFD4",
that's stack growing downwards on a 64-bit platform from 0. 0 only seems to be
allowed as a child_stack with very specific flags.

The popular musl (http://www.musl-libc.org/) C library's posix_spawn() is also
affected as it uses similar code internally
(http://git.musl-libc.org/cgit/musl/tree/src/process/posix_spawn.c?id=8f7bc690f07e90177b176b6e19736ad7c1d49840#n168).

Valgrind should honor the specified child_stack or instead replace with its own
managed stack instead of just setting it to 0 with these flags, which makes it
impossible to debug programs using these implementations of posix_spawn().

-- 
You are receiving this mail because:
You are watching all bug changes.