On Mon, 2014-03-31 at 14:43 -0700, janjust wrote: > (hm my direct reply seems to be getting rejected) > > Yes, > The output is rather large so I attached 3 files that were the result of > running it with 2 procs. 1 for stdout and the other two are from > --log-file=valgrind.%p > -Tommy > > val.out <http://valgrind.10908.n7.nabble.com/file/n49155/val.out> > > valgrind.26200 > <http://valgrind.10908.n7.nabble.com/file/n49155/valgrind.26200> > > valgrind.26269 > <http://valgrind.10908.n7.nabble.com/file/n49155/valgrind.26269>
Looking at the output, this seems to be the relevant trace: SYSCALL[26201,1]( 2) sys_open ( 0x5ec9cc(/proc/mounts), 0 ) --> [async] ... SYSCALL[26201,1]( 2) ... [async] --> Success(0x0:0x11) SYSCALL[26201,1]( 5) sys_newfstat ( 17, 0xffebf76e0 )[sync] --> Success(0x0:0x0) SYSCALL[26201,1]( 9) sys_mmap ( 0x0, 4096, 3, 34, -1, 0 ) --> [pre-success] Success(0x0:0x4e71000) SYSCALL[26201,1]( 0) sys_read ( 17, 0x4e71000, 1024 ) --> [async] ... SYSCALL[26201,1]( 0) ... [async] --> Success(0x0:0x400) SYSCALL[26201,1](137) sys_statfs ( 0xffebfaa85(/var/lib/hugetlbfs/global/pagesize-2097152), 0xffebf9a80 )[sync] --> Success(0x0:0x0) SYSCALL[26201,1]( 3) sys_close ( 17 )[sync] --> Success(0x0:0x0) SYSCALL[26201,1]( 11) sys_munmap ( 0x4e71000, 4096 )[sync] --> Success(0x0:0x0) SYSCALL[26201,1]( 2) sys_open ( 0xffebf8a80(/var/lib/hugetlbfs/global/pagesize-2097152/hugepagefile.MPICH.0.26201.kvs_4761352), 66, 493 ) --> [async] ... SYSCALL[26201,1]( 2) ... [async] --> Success(0x0:0x11) SYSCALL[26201,1]( 87) sys_unlink ( 0xffebf8a80(/var/lib/hugetlbfs/global/pagesize-2097152/hugepagefile.MPICH.0.26201.kvs_4761352) ) --> [async] ... SYSCALL[26201,1]( 87) ... [async] --> Success(0x0:0x0) SYSCALL[26201,1]( 9) sys_mmap ( 0x0, 4194304, 3, 1, 17, 0 ) --> [pre-fail] Failure(0x16) I then tried to reproduce the problem above with the small below code, doing exactly the same syscalls with same parameter, except the fd arg to mmap, which must be the result of the open. The below works on my system. You could try the below (natively and under valgrind) and see if that fails or not. If that does not fail, then you should replace 4m.txt with a path name similar to the above (assuming the path /var/lib/hugetlbfs/global/pagesize-2097152/hugepagefile.MPICH.0.26201.kvs_4761352 is on a "strangely mounted" filesystem cfr the open /proc/mounts just above). If the below (and the modified below) succeeds, but the mpich run fails, then I guess you will be obliged to debug the valgrind code itself, to see what exactly makes the syscall fail with EINVAL: is it the valgrind checks ? or is it the real syscall failing ? Rather than debugging Valgrind, you might first try to find if the syscall itself fails by running valgrind under strace e.g. strace -f valgrind mmap_huge Philippe #include <stdio.h> #include <sys/mman.h> main() { int fd; char *m; fd = open("4m.txt", 66, 493 ); printf ("open result : %d\n", fd); unlink ("4m.txt"); m = (char*) mmap ( 0x0, 4194304, 3, 1, fd, 0 ); printf ("mmap result %p\n", m); } ------------------------------------------------------------------------------ _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users