That's correct. I added some high level summary to [1].
My expectation is that in many situations, the async packets are indeed handled pretty quickly with very often only a single thread from the pool used. I've seen this when analyzing threadpool data from the crude stats recording mechanism [2].
However, blocking IO can indeed be a problem, often due to heavy load on services. For instance, it used to be the case that some asset fetches were performed asynchronously, so if the asset service was slow, processing triggered by inbound packet handling would be held up.
If one were running the two tier system that you described, then this would hold up processing of all second tier packets more than the current system. Perhaps one could forensically identify which handlers could be subject to this problem and handle those differently from others.
It's a complex topic as OpenSimulator is very much an evolved (and evolving codebase), where delays can occur in unexpected places and there's a huge variance in network and hardware conditions. In this case, I can imagine that our async handling is more resilient in cases where only a few requests are slow.
I believe the key is trying to measure the performance change of a second-tier loop to see if the potential gains are worth the potential problems.
[1] http://opensimulator.org/wiki/LLUDP_ClientStack#Inbound_UDP [2] http://opensimulator.org/wiki/Show_stats#stats_record On 26/04/14 13:53, Matt Lehmann wrote:
I looked at this a bit more this morning. So, as I understand, the handling looks like this--> --async read cycle which drops packet into a blocking queue --smart thread which services the blocking queue, and calls the LLClientView method ProcessInPacket LLClientView sorts the packets according to whether the handler should be called asynchronously or not. If async is needed, LLClientView will create a smart thread for the handler, and start the thread. ...the handlers basically signal the events defined in LLClientView which are listened to by one or more other callbacks. If async is not needed/desired, then LLClientView will process the packet directly. So there is one additional thread being created for each async handler, with the original smart thread running all the non-async packet handlers. The question is /can these async threads can be replaced by a second smart thread, which services a queue of async handlers/? Do the handlers require some sort of blocking I/O? Can we rearrange the handlers to operate under these conditions? If the answer is yes, then a great deal of compute cycles can be saved by consolidating all the spawned threads into one single thread loop. Matt On Fri, Apr 25, 2014 at 10:03 PM, Matt Lehmann <[email protected] <mailto:[email protected]>> wrote: Yes I agree that the udp service is critical and would need extensive testing. I wouldn't expect you all to make any changes. Still it's an interesting topic. The networking world seems to be moving towards smaller virtualized servers with less resources, so I think it's an important discussion. At my work we are deploying an opensim cluster which is why I have become so interested. Thanks Matt On Friday, April 25, 2014, Diva Canto <[email protected] <mailto:[email protected]>> wrote: That is one very specific and unique case, something that happens in the beginning, and that is necessary, otherwise clients crash. It's an "exception" wrt the bulk of processing UDP packets. The bulk of them are processed as you described in your first message: placed in a queue, consumed by a consumer thread which either processes them directly or spawns threads for processing them. In general, my experience is also that limiting the amount of concurrency is a Good Thing. A couple of years ago we had way too much concurrency; we've been taming that down. As Dahlia said, the packet handling layer of OpenSim is really critical, and the viewers are sensitive to it, so any drastic changes to it need to go through extensive testing. The current async reading is not bad, as it empties the socket queue almost immediately. The threads that are spawn from the consumer thread, though, could use some rethinking. On 4/25/2014 9:29 PM, Matt Lehmann wrote:One example of what I'm trying to say. In part of the packet handling there is a condition where the server needs to respond to the client, but does not yet know the identity of the client. So the server responds to the client and then spawns a thread which loops and sleeps until it can identify the client.( I don't really understand what's going on here,) Nevertheless in this case you could do without the new thread if you queued a lambda function which would check to see if the client can be identified. A second event loop could periodically poll this function until it completes. You could also queue other contexts which would complete the handling of other types of packets. Matt On Friday, April 25, 2014, Dahlia Trimble <[email protected]> wrote: From my experience there are some things that need to happen as soon as possible and others which can be delayed. What needs to happen ASAP: 1). reading the socket and keeping it emptied. 2) acknowledge any received packets which may require such 3) process any acknowledgements sent by the viewer 4) handle AgentUpdate packets. (these can probably be filtered for uniqueness and mostly discarded if not unique). This list is off the top of my head and may not be complete. Most, if not all, other packets could be put into queues and process as resources permit without negatively affecting the quality of the shared state of the simulation. Please be aware that viewers running on high-end machines can constantly send several hundred packets per second, and that under extreme conditions there can be several hundred viewers connected to a single simulator. Any improvements in the UDP processing portions of the code base should probably take these constraints into consideration. On Fri, Apr 25, 2014 at 8:17 PM, Matt Lehmann <[email protected]> wrote: That makes sense to me. If I recall, the packet handlers will create more threads if they expect delays, such as when waiting for a client to finish movement into the sim. Considering that I have 65 threads running on my standalone instance, with 4 cores that leaves about 15 threads competing. You have to do the work at some point. Matt On Friday, April 25, 2014, Dahlia Trimble <[email protected]> wrote: Depends on what you mean by "services the packets". Decoding and ACKing could probably work well in a socket read loop but dispatching the packet to the proper part of the simulation could incur many delays which can cause a lot of packet loss in the lower level operating system routines as the buffers are only so large and any excessive data is discarded. Putting them in a queue_______________________________________________ Opensim-dev mailing list [email protected] http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
-- Justin Clark-Casey (justincc) OSVW Consulting http://justincc.org http://twitter.com/justincc _______________________________________________ Opensim-dev mailing list [email protected] http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
