Hi Matthieu,

If you could print the source code line associated with that crash address
that will help get us started. Something like:
gdb <path to pvfs2-server binary>
list *0x46f55a

Then with that and the info from Kyle we can work on getting it resolved.

As a side note, if you have the opportunity you should upgrade your
installation to 2.8.3 (under the name OrangeFS at orangefs.org) which has
additional functionality and bug fixes although I don't believe any of the
fixes are applicable to this issue.

Michael

On Sat, Mar 26, 2011 at 5:35 PM, Kyle Schochenmaier <[email protected]>wrote:

> HI Matthieu -
>
> The last time I worked on this we ran into this problem and I think we
> narrowed it down to a mopid reuse issue, we tried to insert some thread
> locking mechanisms into the mopid 'cache' but I dont think it ever got
> resolved.  This was years ago and only occurred under very heavy load of
> relatively small messages.
>
> That would be the place to start I would imagine.
>
> Cheers,
> Kyle Schochenmaier
>
>
> On Sat, Mar 26, 2011 at 4:21 PM, Matthieu Dorier <
> [email protected]> wrote:
>
>> Hello,
>>
>> I'm trying to evaluate the performance of my PVFS installation over an
>> InfiniBand network, but from time to time a server crashes with this trace
>> in the log:
>>
>> [E 03/26 21:58] Error: encourage_recv_incoming: mop_id 12952a0 in RTS_DONE
>> message not found.
>> [E 03/26 21:58]     [bt] /usr/sbin/pvfs2-server(error+0xca) [0x46f55a]
>> [E 03/26 21:58]     [bt] /usr/sbin/pvfs2-server [0x46c88c]
>> [E 03/26 21:58]     [bt] /usr/sbin/pvfs2-server [0x46e485]
>> [E 03/26 21:58]     [bt] /usr/sbin/pvfs2-server(BMI_testunexpected+0x384)
>> [0x421004]
>> [E 03/26 21:58]     [bt] /usr/sbin/pvfs2-server [0x41cf4a]
>> [E 03/26 21:58]     [bt] /lib/libpthread.so.0 [0x7f6422ff0fc7]
>> [E 03/26 21:58]     [bt] /lib/libc.so.6(clone+0x6d) [0x7f642295164d]
>>
>> I've seen that some other users reported this kind of error in some
>> archives of the mailing list, but didn't find any answer to solve the
>> problem. Any idea how to solve this problem?
>>
>> If it can be of any use: I'm working with 16 PVFS servers (IO server and
>> metadata server at the same time), and I'm benchmarking with the IOR
>> program, for now I have 648 processes writing 8MB each in a shared file with
>> a transfer size that corresponds to the strip size (64KB).
>>
>> Thank you,
>>
>> Matthieu
>>
>> --
>> Matthieu Dorier
>> ENS Cachan, Brittany (Computer Science dpt.)
>> IRISA Rennes, Office E324
>> http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to