Ok I got some debugging output finally by hardcoding in the gossip... calls. I have posted a log file at:
http://www.scl.ameslab.gov/~brett/pvfs2.log

The app in this case is using a 1MB IO buffer to write a ~62MB file once and then read it back in several times. The pvfs2 debug output is mixed in with the application output, but I think its still not too hard to follow.

It appears to me that despite always being passed the same buffer the memcache_register function almost always misses for the write. note that the output for a run on one of the EHCA's is very similar. On the EHCA I can write up to about 220MB before it dies with the too much memory registered error.

Brett
On Oct 18, 2006, at 8:38 AM, Pete Wyckoff wrote:

[EMAIL PROTECTED] wrote on Tue, 17 Oct 2006 18:48 -0500:
samples  %        app name                 symbol name
6361527  26.3906  gamess.Feb222006R5.x     hstar_
5029662  20.8654  gamess.Feb222006R5.x     memcache_lookup_cover
4185769  17.3645  no-vmlinux               (no symbols)
4147846  17.2072  gamess.Feb222006R5.x     bufferRead
1969618   8.1709  gamess.Feb222006R5.x     memcache_memfree
799027    3.3147  libc-2.3.6.so            (no symbols)
206162    0.8553  libpscrt.so.1            memset.pathscale.opteron
197544    0.8195  gamess.Feb222006R5.x     __job_time_mgr_add
106939    0.4436  mthca.so                 (no symbols)
92519     0.3838  gamess.Feb222006R5.x     ddot_
68023     0.2822  gamess.Feb222006R5.x     sotran_
52222     0.2166  oprofiled                (no symbols)
41085     0.1704  libpthread-2.3.6.so      pthread_mutex_lock
32711     0.1357  gamess.Feb222006R5.x     PINT_process_request

There are plenty of heuristics involved in doing memory registration
caching.  An ugly part of dealing with devices that require it.

As a quick hack, you can edit pvfs2/src/io/bmi/bmi_ib/mem.c to
change ENABLE_MEMCACHE to 0 for the non-server case.  This will
force each read or write operation to do a pair of mem register
and deregister operations.  It doesn't get a lot of testing, but
hopefully will work.  That should change where the memcache_*
functions land in your profile.

Why are you using both mthca.so and libpscrt.so?  Mellanox and
pathscale adapters being used in the same code?

Or we can modify the way the cache is managed.  This would require
knowing a bit more about what your code is doing.  Care to send a
trace, or the code itself?

                -- Pete


____________________________________________
Dr. Brett Bode
329 Wilhelm Hall
Ames Laboratory
Iowa State University
Ames, IA 50011              (515) 294-9192
[EMAIL PROTECTED]  FAX: (515) 294-4491
____________________________________________



_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to