On Wed, 2014-04-02 at 07:17 -0700, janjust wrote: > ah ok so this is a virtual filesystem or something, I'm still unsure what is > going on. > the example you provided succeeds for me as well, but...this file: > > > /var/lib/hugetlbfs/global/pagesize-2097152/hugepagefile.MPICH.0.26201.kvs_4761352 > Doesn't exist or something else is going on… Yes, this file is created when needed, and then unlinked. See the trace produced by valgrind --trace-syscalls=yes ...
> > $ls /var/lib doesn't show hugetlbfs from my launch node, but it shows up if > I run the $ls /var/lib by launching the job with $aprun it evaluates to: Not knowing what is aprun and what it does, I have no idea why it helps to "see" the mounted hugetlbfs file system. > > janjust@titan-batch7:~/janjust_proj/tmp$ aprun -n 1 -N 1 ls -lah > /var/lib/hugetlbfs/global/ > Couldn't parse executable > total 0 > drwxr-xr-x 8 root root 160 Apr 1 13:45 . > drwxr-xr-x 3 root root 60 Apr 1 13:45 .. > drwxrwxrwt 2 root root 0 Apr 1 13:45 pagesize-131072 > drwxrwxrwt 2 root root 0 Apr 1 13:45 pagesize-16777216 > drwxrwxrwt 2 root root 0 Apr 2 09:52 pagesize-2097152 > drwxrwxrwt 2 root root 0 Apr 1 13:45 pagesize-524288 > drwxrwxrwt 2 root root 0 Apr 1 13:45 pagesize-67108864 > drwxrwxrwt 2 root root 0 Apr 1 13:45 pagesize-8388608 > > but then "pagesize-2097152" is empty, which could be why the call is > failing… I do not think the failure is linked to the 0 size. The small executable works with a 0 size 4m.txt (just remove the unlink from the small test program, and you will see that 4m.txt is created if needed, and then mapped). Relaunching works even if the 4m.txt has a 0 size. What might be the problem is bad/wrong support of huge pages by Valgrind. I know very little about huge pages, but it looks like the below pagesize-xxxxx indicates to map a huge page of the size. It looks like you ask for a 4M huge page but on the pagesize-2097152. Maybe you could update the small program to do exactly the same open and same mmap, but with the absolute patch name of the file in the hugetbls stuff ? Then run it natively (I am assuming this should work, including with the hugetbls) Then run it under strace Then run it under valgrind Then run it under strace -f valgrind We might see the difference in the way the underlying mmap calls are done. (you might have to do all that under aprun (maybe you can do aprun bash or something like that) Philippe > > > > > -- > View this message in context: > http://valgrind.10908.n7.nabble.com/mpich-unable-to-munmap-hugepages-tp49150p49168.html > Sent from the Valgrind - Users mailing list archive at Nabble.com. > > ------------------------------------------------------------------------------ > _______________________________________________ > Valgrind-users mailing list > Valgrind-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/valgrind-users ------------------------------------------------------------------------------ _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users