Hey there, A while back, we had somewhat reasonable statistics being generated and presented to the client. They were not always accurate, but based on what I saw, I could, pretty much pin certain parts of the simulator as the limiting factor during load tests. I'd say, the number 1 reason that they were semi-accurate and not accurate.. in the past.. is because nobody ever thought about instrumentation during the functionality design. It was always 'tacked on later'. One example of this.. is the current AssetCache implementation. There's no way, currently, to know, at a glance.. how many external requests it has open. Additionally, it will be extremely difficult to put one in because of the way the objects are designed and accessed. To put one in, an event needs to be added to the IAssetService interface and each AssetCache implementation will need an interlocked int to count how many gets and puts it currently has open to the external data source as well as it's own event calling schedule. Then, the IAssetService property in Scene, (AssetService) will need an event handler.. which updates the values in SimStatsReporter in Scene (StatsReporter). This idea of external access resource instrumentation should really have been built in to the design of the AssetService.
This last recent load test, there were no real statistics that I could use to determine what the limiting factor was. Time Dilation was pegged at 1.0.. even when the simulator was obviously struggling. Total Frame time (MS) was -50ms even when the simulation MS was 850ms and the Physics ms was 250ms, so the inconsistencies made it impossible to know what part of the simulator was struggling. Agent Updates were erratic.. sometimes high.. sometimes low when the simulator was fine and when it was struggling. Pending Uploads and Downloads were always 0, so there was no way to know how well the simulator was downloading and uploading assets to and from the grid. Packet stats were non-existant, so there was no way to know how well the UDP handlers were faring under the load. When it crashed, it crashed with a mono based stack trace which pointed to out of memory errors, so the only way that you could, scientifically, find out what the issue is.. is to run a load test under a memory profiler. We know, that running a public load test under a memory profiler is quite impractical. To make something better, I need to know two things, where it is, and where I want it to be. How can we make OpenSimulator better if we don't have statistics that point to where we are currently? On that note, I propose that, when designing objects for functionality in OpenSimulator, that we also consider if the objects should be instrumented and, what would be the best way to go about instrumenting the objects. We should incorporate instrumentation into the design of the objects. Some of that instrumentation is appropriate for a client to see, some of it might not be. Consider that, many of them should be client facing and be included in the SimStats that get sent to the client.. so that we can have a reasonable idea of what's going on with a simulator at a glance. Also, in the design of the instrumentation, we make sure that the instrumentation is accurate and lightweight. The load test went reasonably... but, we didn't get half of the information on the simulator that we needed to be able to improve it. Please comment :) I look forward to hearing your responses. Regards Teravus _______________________________________________ Opensim-dev mailing list [email protected] https://lists.berlios.de/mailman/listinfo/opensim-dev
