Thanks! It may take me a while to track down where to go from here, I'll let
you know what I find. Any input from others who have seen this issue would
be great.

Michael

On Sun, Mar 27, 2011 at 8:28 AM, Matthieu Dorier <
[email protected]> wrote:

> Here is GDB's answer:
>
> (gdb) list *0x46f55a
> 0x46f55a is in error (src/io/bmi/bmi_ib/util.c:31).
> 26        va_start(ap, fmt);
> 27        vsprintf(s, fmt, ap);
> 28        va_end(ap);
> 29        gossip_err("Error: %s.\n", s);
> 30        gossip_backtrace();
> 31        exit(1);
> 32    }
> 33
> 34    void __attribute__((noreturn,format(printf,1,2))) __hidden
> 35    error_errno(const char *fmt, ...)
>
> Matthieu
>
>
> 2011/3/27 Michael Moore <[email protected]>
>
>> Hi Matthieu,
>>
>> If you could print the source code line associated with that crash address
>> that will help get us started. Something like:
>> gdb <path to pvfs2-server binary>
>> list *0x46f55a
>>
>> Then with that and the info from Kyle we can work on getting it resolved.
>>
>> As a side note, if you have the opportunity you should upgrade your
>> installation to 2.8.3 (under the name OrangeFS at orangefs.org) which has
>> additional functionality and bug fixes although I don't believe any of the
>> fixes are applicable to this issue.
>>
>> Michael
>>
>>
>> On Sat, Mar 26, 2011 at 5:35 PM, Kyle Schochenmaier 
>> <[email protected]>wrote:
>>
>>> HI Matthieu -
>>>
>>> The last time I worked on this we ran into this problem and I think we
>>> narrowed it down to a mopid reuse issue, we tried to insert some thread
>>> locking mechanisms into the mopid 'cache' but I dont think it ever got
>>> resolved.  This was years ago and only occurred under very heavy load of
>>> relatively small messages.
>>>
>>> That would be the place to start I would imagine.
>>>
>>> Cheers,
>>> Kyle Schochenmaier
>>>
>>>
>>> On Sat, Mar 26, 2011 at 4:21 PM, Matthieu Dorier <
>>> [email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm trying to evaluate the performance of my PVFS installation over an
>>>> InfiniBand network, but from time to time a server crashes with this trace
>>>> in the log:
>>>>
>>>> [E 03/26 21:58] Error: encourage_recv_incoming: mop_id 12952a0 in
>>>> RTS_DONE message not found.
>>>> [E 03/26 21:58]     [bt] /usr/sbin/pvfs2-server(error+0xca) [0x46f55a]
>>>> [E 03/26 21:58]     [bt] /usr/sbin/pvfs2-server [0x46c88c]
>>>> [E 03/26 21:58]     [bt] /usr/sbin/pvfs2-server [0x46e485]
>>>> [E 03/26 21:58]     [bt]
>>>> /usr/sbin/pvfs2-server(BMI_testunexpected+0x384) [0x421004]
>>>> [E 03/26 21:58]     [bt] /usr/sbin/pvfs2-server [0x41cf4a]
>>>> [E 03/26 21:58]     [bt] /lib/libpthread.so.0 [0x7f6422ff0fc7]
>>>> [E 03/26 21:58]     [bt] /lib/libc.so.6(clone+0x6d) [0x7f642295164d]
>>>>
>>>> I've seen that some other users reported this kind of error in some
>>>> archives of the mailing list, but didn't find any answer to solve the
>>>> problem. Any idea how to solve this problem?
>>>>
>>>> If it can be of any use: I'm working with 16 PVFS servers (IO server and
>>>> metadata server at the same time), and I'm benchmarking with the IOR
>>>> program, for now I have 648 processes writing 8MB each in a shared file 
>>>> with
>>>> a transfer size that corresponds to the strip size (64KB).
>>>>
>>>> Thank you,
>>>>
>>>> Matthieu
>>>>
>>>> --
>>>> Matthieu Dorier
>>>> ENS Cachan, Brittany (Computer Science dpt.)
>>>> IRISA Rennes, Office E324
>>>> http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/
>>>>
>>>> _______________________________________________
>>>> Pvfs2-users mailing list
>>>> [email protected]
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> [email protected]
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>
>>>
>>
>
>
> --
> Matthieu Dorier
> ENS Cachan, Brittany (Computer Science dpt.)
> IRISA Rennes, Office E324
> http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to