Ok I got some debugging output finally by hardcoding in the gossip...
calls. I have posted a log file at:
http://www.scl.ameslab.gov/~brett/pvfs2.log
The app in this case is using a 1MB IO buffer to write a ~62MB file
once and then read it back in several times. The pvfs2 debug output
is mixed in with the application output, but I think its still not
too hard to follow.
It appears to me that despite always being passed the same buffer the
memcache_register function almost always misses for the write. note
that the output for a run on one of the EHCA's is very similar. On
the EHCA I can write up to about 220MB before it dies with the too
much memory registered error.
Brett
On Oct 18, 2006, at 8:38 AM, Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Tue, 17 Oct 2006 18:48 -0500:
samples % app name symbol name
6361527 26.3906 gamess.Feb222006R5.x hstar_
5029662 20.8654 gamess.Feb222006R5.x memcache_lookup_cover
4185769 17.3645 no-vmlinux (no symbols)
4147846 17.2072 gamess.Feb222006R5.x bufferRead
1969618 8.1709 gamess.Feb222006R5.x memcache_memfree
799027 3.3147 libc-2.3.6.so (no symbols)
206162 0.8553 libpscrt.so.1 memset.pathscale.opteron
197544 0.8195 gamess.Feb222006R5.x __job_time_mgr_add
106939 0.4436 mthca.so (no symbols)
92519 0.3838 gamess.Feb222006R5.x ddot_
68023 0.2822 gamess.Feb222006R5.x sotran_
52222 0.2166 oprofiled (no symbols)
41085 0.1704 libpthread-2.3.6.so pthread_mutex_lock
32711 0.1357 gamess.Feb222006R5.x PINT_process_request
There are plenty of heuristics involved in doing memory registration
caching. An ugly part of dealing with devices that require it.
As a quick hack, you can edit pvfs2/src/io/bmi/bmi_ib/mem.c to
change ENABLE_MEMCACHE to 0 for the non-server case. This will
force each read or write operation to do a pair of mem register
and deregister operations. It doesn't get a lot of testing, but
hopefully will work. That should change where the memcache_*
functions land in your profile.
Why are you using both mthca.so and libpscrt.so? Mellanox and
pathscale adapters being used in the same code?
Or we can modify the way the cache is managed. This would require
knowing a bit more about what your code is doing. Care to send a
trace, or the code itself?
-- Pete
____________________________________________
Dr. Brett Bode
329 Wilhelm Hall
Ames Laboratory
Iowa State University
Ames, IA 50011 (515) 294-9192
[EMAIL PROTECTED] FAX: (515) 294-4491
____________________________________________
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers