Troy Benjegerdes wrote:

Pete -

I've attached a link to a log of the failure with network debugging on in the client, single IO node. The whole log is 5.9GB so I only attached the
last 10k lines.  Same error as before of course.

http://www.scl.ameslab.gov/~kschoche/pvfs2-client.log.gz

The mopids are fairly difficult to track as they are used all over the
place and end up here and there, I cant make out anything useful from it
:'(

Any advice would be great,

~Kyle

Here is another, full logfile of another failure: (93M compressed, 1GB unpacked)

http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client.log.gz

Kind of an update....
after doing some tracing of function calls and trying to figure out why the same "mop_id" was used 10,000+ times during my failed run, troy and I stumbled upon some of the fmr code.. and after changing the id_gen_fast(mopid) functions to use the id_gen_safe(mopid) functions in id_generator.c... We have possibly fixed the problem, however, this does introduce some amount of overhead. I'll attempt to do some tests to quantify the exact amount next week, but for now it seems to at least allow my tests to complete.

Maybe something is wrong with the id_gen_fast() stuff, locking or other issues maybe?

Troy and I had some questions about how these mop_id's, which are just addresses, are generated, and whether or not there is the possibility for two I/O servers to generate the same address, and send that to the client somehow?
Can you give us a brief description of the process Pete?

Thanks,
Kyle

--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to