I too am seeing the issue originally reported under high load (specifically 32 threads worth of iozone) causing a server to fail on: Error: encourage_recv_incoming: mop_id af3580 in RTS_DONE message not found.
Any one else have thoughts where this may be coming from? I'm not seeing where the mopid pool (or it's size) is determined so I don't know if that suggestion will resolve the problem. Thanks, Michael On Tue, Jul 19, 2011 at 5:16 PM, Kyle Schochenmaier <[email protected]>wrote: > A really poor hack might be to crank up the size if the mopid pool I think > its a circular buffer of ids. But this certainly isn't scalable > On Jul 19, 2011 4:11 PM, "Kyle Schochenmaier" <[email protected]> wrote: > > Hi becky > > I think this is the mopid reuse problem from years past. Basically at > high > > load a mopid on one machine gets recycled and used again before getting > > invalidated on the receiver side so we ends up with a dupe. I dont recall > > being able to fix this but we lowered the frequency of its occurrance to > a > > point where the kernel module interface was stable by adding locking > logic > > to the mopid usage in bmi-send. I don't really have access to the code > > anymore but that's where id recommend starting the search. Even with this > > though we still observed thee problem in heavy usage over native bmi > > implementations like netpipe and a port of gamess I made use native > calls. > > On Jul 19, 2011 4:00 PM, "Becky Ligon" <[email protected]> wrote: > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
