Hello Michael,

On 09/20/2011 06:40 AM, Michael Raab wrote:
> sorry to bother you again, but I wasn't able to make any progress with that 
> issue.
> I'm currently testing with 2 servers and 1 client. When I call 
> OSGGroupMcastConnection::disconnect() on client side, the socket connection 
> to the appropriate render server seems to be closed well.
> The server application closes without problems.
> The client and the remaining server do their regular sync-signal-swap routine 
> for the next 7 frames. Everything seems to be fine. After rendering the 7th 
> frame the client waits forever for the next signal message of the server. The 
> problem is well reproducable does not depend on timing....
> I tried to debug using TCPView and its seems all relevant sockets are still 
> open and the number of sent/received signal/swap messages for the client and 
> the server are equal. So it seems that the OpenSG connection somehow looses 
> or mishandles that special message. Maybe the threaded 
> OSGGroupMcastConnection::sendQueue() method is involved but I have no more 
> clue how to track that down.

from the stack trace you posted earlier, the client is stuck in 
GroupSockConnection::wait(), waiting for the servers to acknowledge 
completion of a frame:

>>> it should. May be it waits for the disconnected server...
>>>>> This is the stack were all gets stuck:
>>>>>
>>>>>           ntdll.dll!76f4f8c1()    
>>>>>           [Unten angegebene Rahmen sind möglicherweise nicht korrekt
>>> und/oder fehlen, keine Symbole geladen für ntdll.dll]       
>>>>>           ntdll.dll!76f4f8c1()    
>>>>>           mswsock.dll!74506f0f()  
>>>>>           mswsock.dll!74506d30()  
>>>>>           ntdll.dll!76f63ca3()    
>>>>>           OSGSystem.dll!osg::DrawActionBase::stop(osg::Action::ResultE
>>> res=Continue)  Zeile 259    C++
>>>>>           msvcp80.dll!std::basic_filebuf<char,std::char_traits<char>
>>>> ::overflow(int _Meta=64)  Zeile 304 + 0x5 Bytes    C++
>>>>>           ws2_32.dll!74c76a28()   
>>>>>           OSGBase.dll!osg::SocketSelection::select(double
>>> duration=-1.0000000000000000)  Zeile 236    C++
>>>>>           OSGBase.dll!osg::SocketSelection::select(double
>>> duration=-1.0000000000000000, osg::SocketSelection&    result={...})
>> Zeile 265 + 0x2d
>>> Bytes       C++
>>>>>           OSGBase.dll!osg::GroupSockConnection::wait(double
>>> timeout=-1.0000000000000000)  Zeile 336 + 0x15 Bytes        C++
>>>>>>  OSGBase.dll!osg::GroupMCastConnection::wait(double
>>> timeout=-1.0000000000000000)  Zeile 205     C++
>>>>>           OSGExt.dll!osg::VDTMultiDisplayWindow::clientSwap()  Zeile
>> 546  C++
>>>>>
>>      OSGExt.dll!osg::VDTMultiDisplayWindow::render(osg::RenderActionBase
>>> * action=0x0c6b7958)  Zeile 303     C++
>>>>>
>>>>> I don't know how the DrawAction stuff comes into the stack, maybe its
>> a
>>> visualization bug in VS...
>>>> yeah, that looks bogus ;)
>>>> Hmm, the interesting information would be, which sockets are in the
>>>> SocketSelection and if a disconnected server is still in there what
>> and
>>>> why it was placed there?
>>>
>>> As far as I have seen today the correct sockets were closed and removed
>>> from the socket list.

hmm, that seems to be contradicted by the client still waiting on a 
socket it should have removed from it's list of connected sockets. 
GroupSockConnection::wait() simply adds all entries of _sockets to a 
SocketSelection and loops until it has received the tag value on each 
one. Since GroupSockConnection::disconnect() erases an element from 
_sockets.size() should be 1 after disconnecting one server. As a sanity 
check can you verify that is the case?
If only the still connected socket is in the list, it does not receive 
the expected signal or rather it does not receive anything because 
otherwise you'd see a "Stream out of sync" message/exception. That would 
then mean looking at the server and why it stops signalling the end of 
frame.
It seems I won't be much help without being able to reproduce the 
problem, is there a chance you could come up with something like 
testClusterClient/Server.cpp that shows the problem? I don't know when 
I'd get to writing them myself, but I promise that I'll debug the 
problem if I have the ability to reproduce it.

        Cheers,
                Carsten

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Opensg-users mailing list
Opensg-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensg-users

Reply via email to