I got ahold of some hardware information, realized that I was running into resource issues with our nic, and modified the FlowBufferSizeBytes parameter in the server configs, so far I have yet to be able to
reproduce this problem.

On a related note, I think we should add in the cache-flushing code, as we're running into problems and _successfully_ flushing the cash and moving on, though I'm not sure yet how to test to make sure that we're getting the correct data moved around after flushing. I'll try to do that this afternoon.

Kyle



Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Wed, 04 Apr 2007 11:21 -0600:
I can reproducibly trigger this error on the server by doing multiple instances of pvfs2-cp over various IB hardware.

For this one, I did:

pvfs2-cp -t /pvfs2/1node/test2 /dev/null & pvfs2-cp -t /pvfs2/1node/test2 /dev/null & pvfs2-cp -t /pvfs2/1node/test2 /dev/null & pvfs2-cp -t /pvfs2/1node/test2 /dev/null & pvfs2-cp -t /pvfs2/1node/test2 /dev/null & pvfs2-cp -t /pvfs2/1node/test2 /dev/null

That should be 6 of them, everything worked fine up until 6 processes started hammering the server. I can reproduce this with only 3 processes using faster/lower-latency hardware on the client.

Any ideas where to start tracking this one down?

Yeah, what Sam said.  This error is a bug, but it comes after things
are starting to time out, so you're not in ideal territory already.
If you want to figure out the bug, I'd be most grateful.  I'm
planning to ignore this for a while as paper deadlines and travel
are coming up.

Re the timeout:  if your workload involves running 6 independent
processes on each node, you may want to increase the default timeout
in your fs.conf so you do not get job cancels.

                -- Pete

[D 21:25:03.375378] PVFS2 Server version 2.6.2pre1-2007-02-23-150254 starting. [E 17:00:09.488945] job_time_mgr_expire: job time out: cancelling flow operation
, job_id: 4608742.
[..]
[E 17:00:09.573630] Error: memcache_memfree: buf 0x2aaaab272010 len 262144 count = 2, expected 1.

!DSPAM:46166cfb151591336712104!


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to