I too am seeing the issue originally reported under high load (specifically
32 threads worth of iozone) causing a server to fail on:
Error: encourage_recv_incoming: mop_id af3580 in RTS_DONE message not found.

Any one else have thoughts where this may be coming from? I'm not seeing
where the mopid pool (or it's size) is determined so I don't know if that
suggestion will resolve the problem.

Thanks,
Michael

On Tue, Jul 19, 2011 at 5:16 PM, Kyle Schochenmaier <[email protected]>wrote:

> A really poor hack might be to crank up the size if the mopid pool I think
> its a circular buffer of ids. But this certainly isn't scalable
> On Jul 19, 2011 4:11 PM, "Kyle Schochenmaier" <[email protected]> wrote:
> > Hi becky
> > I think this is the mopid reuse problem from years past. Basically at
> high
> > load a mopid on one machine gets recycled and used again before getting
> > invalidated on the receiver side so we ends up with a dupe. I dont recall
> > being able to fix this but we lowered the frequency of its occurrance to
> a
> > point where the kernel module interface was stable by adding locking
> logic
> > to the mopid usage in bmi-send. I don't really have access to the code
> > anymore but that's where id recommend starting the search. Even with this
> > though we still observed thee problem in heavy usage over native bmi
> > implementations like netpipe and a port of gamess I made use native
> calls.
> > On Jul 19, 2011 4:00 PM, "Becky Ligon" <[email protected]> wrote:
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to