That's correct.  I added some high level summary to [1].

My expectation is that in many situations, the async packets are indeed handled pretty quickly with very often only a single thread from the pool used. I've seen this when analyzing threadpool data from the crude stats recording mechanism [2].

However, blocking IO can indeed be a problem, often due to heavy load on services. For instance, it used to be the case that some asset fetches were performed asynchronously, so if the asset service was slow, processing triggered by inbound packet handling would be held up.

If one were running the two tier system that you described, then this would hold up processing of all second tier packets more than the current system. Perhaps one could forensically identify which handlers could be subject to this problem and handle those differently from others.

It's a complex topic as OpenSimulator is very much an evolved (and evolving codebase), where delays can occur in unexpected places and there's a huge variance in network and hardware conditions. In this case, I can imagine that our async handling is more resilient in cases where only a few requests are slow.

I believe the key is trying to measure the performance change of a second-tier loop to see if the potential gains are worth the potential problems.

[1] http://opensimulator.org/wiki/LLUDP_ClientStack#Inbound_UDP
[2] http://opensimulator.org/wiki/Show_stats#stats_record

On 26/04/14 13:53, Matt Lehmann wrote:
I looked at this a bit more this morning.

So, as I understand, the handling looks like this-->

--async read cycle which drops packet into a blocking queue
--smart thread which services the blocking queue, and calls the LLClientView 
method ProcessInPacket

LLClientView sorts the packets according to whether the handler should be 
called asynchronously or not.

If async is needed, LLClientView will create a smart thread for the handler, 
and start the thread.
...the handlers basically signal the events defined in LLClientView which are 
listened to by one or more other callbacks.
If async is not needed/desired, then LLClientView will process the packet 
directly.

So there is one additional thread being created for each async handler, with 
the original smart thread running all the
non-async packet handlers.

The question is /can these async threads can be replaced by a second smart 
thread, which services a queue of async
handlers/?  Do the handlers require some sort of blocking I/O?  Can we 
rearrange the handlers to operate under these
conditions?

If the answer is yes, then a great deal of compute cycles can be saved by 
consolidating all the spawned threads into one
single thread loop.

Matt




On Fri, Apr 25, 2014 at 10:03 PM, Matt Lehmann <[email protected] 
<mailto:[email protected]>> wrote:

    Yes I agree that the udp service is critical and would need extensive 
testing.

    I wouldn't expect you all to make any changes.

    Still it's an interesting topic.  The networking world seems to be moving 
towards smaller virtualized servers with
    less resources, so I think it's an important discussion.  At my work we are 
deploying an opensim cluster which is
    why I have become so interested.


    Thanks

    Matt


    On Friday, April 25, 2014, Diva Canto <[email protected] 
<mailto:[email protected]>> wrote:

        That is one very specific and unique case, something that happens in 
the beginning, and that is necessary,
        otherwise clients crash. It's an "exception" wrt the bulk of processing 
UDP packets. The bulk of them are
        processed as you described in your first message: placed in a queue, 
consumed by a consumer thread which either
        processes them directly or spawns threads for processing them.

        In general, my experience is also that limiting the amount of 
concurrency is a Good Thing. A couple of years ago
        we had way too much concurrency; we've been taming that down.

        As Dahlia said, the packet handling layer of OpenSim is really 
critical, and the viewers are sensitive to it, so
        any drastic changes to it need to go through extensive testing. The 
current async reading is not bad, as it
        empties the socket queue almost immediately. The threads that are spawn 
from the consumer thread, though, could
        use some rethinking.

        On 4/25/2014 9:29 PM, Matt Lehmann wrote:
        One example of what I'm trying to say.

        In part of the packet handling there is a condition where the server 
needs to respond to the client, but does
        not yet know the identity of the client. So the server responds to the 
client and then spawns a thread which
        loops and sleeps until it can identify the client.( I don't really 
understand what's going on here,)

        Nevertheless in this case you could do without the new thread if you 
queued a lambda function which would
        check to see if the client can be identified.  A second event loop 
could periodically poll this function until
        it completes.

        You could also queue other contexts which would complete the handling 
of other types of packets.

        Matt

        On Friday, April 25, 2014, Dahlia Trimble <[email protected]> 
wrote:

            From my experience there are some things that need to happen as 
soon as possible and others which can be
            delayed. What needs to happen ASAP:
            1). reading the socket and keeping it emptied.
            2) acknowledge any received packets which may require such
            3) process any acknowledgements sent by the viewer
            4) handle AgentUpdate packets. (these can probably be filtered for 
uniqueness and mostly discarded if not
            unique).

            This list is off the top of my head and may not be complete. Most, 
if not all, other packets could be put
            into queues and process as resources permit without negatively 
affecting the quality of the shared state
            of the simulation.

            Please be aware that viewers running on high-end machines can 
constantly send several hundred packets per
            second, and that under extreme conditions there can be several 
hundred viewers connected to a single
            simulator.  Any improvements in the UDP processing portions of the 
code base should probably take these
            constraints into consideration.


            On Fri, Apr 25, 2014 at 8:17 PM, Matt Lehmann <[email protected]> 
wrote:

                That makes sense to me.

                If I recall, the packet handlers will create more threads if 
they expect delays, such as when waiting
                for a client to finish movement into the sim.

                Considering that I have 65 threads running on my standalone 
instance, with 4 cores that leaves about
                15 threads competing.  You have to do the work at some point.

                Matt

                On Friday, April 25, 2014, Dahlia Trimble 
<[email protected]> wrote:

                    Depends on what you mean by "services the packets". 
Decoding and ACKing could probably work well
                    in a socket read loop but dispatching the packet to the 
proper part of the simulation could incur
                    many delays which can cause a lot of packet loss in the 
lower level operating system routines as
                    the buffers are only so large and any excessive data is 
discarded. Putting them in a queue




_______________________________________________
Opensim-dev mailing list
[email protected]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev



--
Justin Clark-Casey (justincc)
OSVW Consulting
http://justincc.org
http://twitter.com/justincc
_______________________________________________
Opensim-dev mailing list
[email protected]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev

Reply via email to