[Pvfs2-developers] Re: pvfs2-server failures openib

Pete Wyckoff Fri, 06 Apr 2007 08:53:16 -0700

[EMAIL PROTECTED] wrote on Wed, 04 Apr 2007 11:21 -0600:
> I can reproducibly trigger this error on the server by doing multiple 
> instances of pvfs2-cp over various IB hardware.
> 
> For this one, I did:
> 
> pvfs2-cp -t /pvfs2/1node/test2 /dev/null & pvfs2-cp -t 
> /pvfs2/1node/test2 /dev/null & pvfs2-cp -t /pvfs2/1node/test2 /dev/null 
> & pvfs2-cp -t /pvfs2/1node/test2 /dev/null & pvfs2-cp -t 
> /pvfs2/1node/test2 /dev/null & pvfs2-cp -t /pvfs2/1node/test2 /dev/null
> 
> That should be 6 of them, everything worked fine up until 6 processes 
> started hammering the server.  I can reproduce this with only 3 
> processes using faster/lower-latency hardware on the client.
> 
> Any ideas where to start tracking this one down?


Yeah, what Sam said.  This error is a bug, but it comes after things
are starting to time out, so you're not in ideal territory already.
If you want to figure out the bug, I'd be most grateful.  I'm
planning to ignore this for a while as paper deadlines and travel
are coming up.

Re the timeout:  if your workload involves running 6 independent
processes on each node, you may want to increase the default timeout
in your fs.conf so you do not get job cancels.

                -- Pete

> [D 21:25:03.375378] PVFS2 Server version 2.6.2pre1-2007-02-23-150254 
> starting.
> [E 17:00:09.488945] job_time_mgr_expire: job time out: cancelling flow 
> operation
> , job_id: 4608742.
[..]
> [E 17:00:09.573630] Error: memcache_memfree: buf 0x2aaaab272010 len 
> 262144 count = 2, expected 1.
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

[Pvfs2-developers] Re: pvfs2-server failures openib

Reply via email to