Troy Benjegerdes wrote:
Pete -
I've attached a link to a log of the failure with network debugging
on in
the client, single IO node. The whole log is 5.9GB so I only
attached the
last 10k lines. Same error as before of course.
http://www.scl.ameslab.gov/~kschoche/pvfs2-client.log.gz
The mopids are fairly difficult to track as they are used all over the
place and end up here and there, I cant make out anything useful from it
:'(
Any advice would be great,
~Kyle
Here is another, full logfile of another failure: (93M compressed, 1GB
unpacked)
http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client.log.gz
Kind of an update....
after doing some tracing of function calls and trying to figure out why
the same "mop_id" was used 10,000+ times during my failed run, troy and
I stumbled upon some of the fmr code.. and after changing the
id_gen_fast(mopid) functions to use the id_gen_safe(mopid) functions in
id_generator.c... We have possibly fixed the problem, however, this
does introduce some amount of overhead. I'll attempt to do some tests
to quantify the exact amount next week, but for now it seems to at least
allow my tests to complete.
Maybe something is wrong with the id_gen_fast() stuff, locking or other
issues maybe?
Troy and I had some questions about how these mop_id's, which are just
addresses, are generated, and whether or not there is the possibility
for two I/O servers to generate the same address, and send that to the
client somehow?
Can you give us a brief description of the process Pete?
Thanks,
Kyle
--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers