Hi, sounds great.
Some things to consider: - Some actions require explicit sending of a packet which is an update packet, but is used for special cases. Sit, stand, changing group tags, creating/joining groups are all such cases where special care needs to be taken. - Resend is evil for static objects and avatars, but may be needed to sync up dead reckoning with the real data on physical objects. Just a feeling. Melanie Mic Bowman wrote: > Over the last several weeks, Dan Lake & I have been looking some of the > networking performance issues in opensim. As always, our concerns are with > the problems caused by very complex scenes with very large numbers of > avatars. However, I think some of the issues we have found will generally > improve networking with OpenSim. Since the behavior represents a fairly > significant change in behavior (though the number of lines of code is not > great), I'm going to put this into a separate branch for testing (called > queuetest) in the opensim git repository. > > We've found several problems with the current > networking/prioritization code. > > * Reprioritization is completely broken for SceneObjectParts. On > reprioritization, the current code uses the localid stored in the scene > Entities list but since the scene does not store the localid for SOPs, that > attempt always fails. So the original priority of the SOP continues to be > used. This could be the cause of some problems since the initial > prioritization assumes position 128,128. I don't understand all the possible > ramifications, but suffice it to say, using the localid is causing > problems. > > Fix: The sceneentity is already stored in the update, just use that instead > of the localid. > > * We currently pull (by default) 100 entity updates from the entityupdate > queue and convert them into packets. Once converted into packets, they are > then queued again for transmissions. This is a bad thing. Under any kind of > load, we've measured the time in the packet queue to be up to many > hundreds/thousands of milliseconds (and to be highly variable). When an > object changes one property and then doesn't change it again, the time in > the packet queue is largely irrelevant. However, if the object is > continuously changing (an avatar changing position, a physical object > moving, etc) then the conversion from a entity update to a packet "freezes" > the properties to be sent. If the object is continuously changing, then with > fairly high probability, the packet contains old data (the properties of the > entity from the point at which it was converted into a packet). > > The real problem is that, in theory, to improve the efficiency of the > packets (fill up each message) we are grabbing big chunks of updates. Under > load, that causes queuing at the packet layer which makes updates stale. > That is... queuing at the packet layer is BAD. > > Fix: We implemented an adaptive algorithm for the number of updates to grab > with each pass. We set a target time of 200ms for each iteration. That > means, we are trying to bound the maximum age of any update in the packet > queue to 200ms. The adaptive algorithm looks a lot like a TCP slow start: > every time we complete an iteration (flush the packet queue) in less than > 200ms we increase linearly the number of updates we take in the next > iteration (add 5 to the count) and when we don't make it back in 200ms, we > drop the number we take quadratically (cut the number in half). In our > experiments with large numbers of moving avatars, this algorithm works > *very* well. The number of updates taken per iteration stabilizes very > quickly and the response time is dramatically improved (no "snap back" on > avatars, for example). One difference from the traditional slow start... > since the number of "static" items in the queue is very high when a client > first enters a region, we start with the number of updates taken at 500. > that gets the static items out of the queue quickly (and delay doesn't > matter as much) and the number taken is generally stable before the > login/teleport screen even goes away. > > * The current prioritization queue can lead to update starvation. The > prioritization algorithm dumps all entity updates into a single ordered > queue. Lets say you have several hundred avatars moving around in a scene. > Since we take a limited number of updates from the queue in each iteration, > we will take only the updates for the "closest" (highest priority) avatars. > However, since those avatars continue to move, they are re-inserted into the > priority queue *ahead* of the updates that were already there. So... unless > the queue can be completely emptied each iteration or the priority of the > "distant" (low priority) avatars changes, those avatars will never be > updated. > > Fix: We converted the single priority queue into multiple priority queues > and use fair queuing to retrieve updates from each. Here's how it works > (more or less)... the current metrics (all of the current prioritization > algorithms use distance at some point for prioritization) compute a distance > from the avatar/camera to an object. We take the log of that distance and > use that as the index for the queue where we place the update. So close > things go into the highest priority queue and distant things go into the > lowest priority queue. Since the area covered by a priority queue grows as > the square of the radius, the distant (lowest priority queues) will have the > most objects while the highest priority queues will have a small number of > objects. Inside each priority queue, we order the updates by the time in > which they entered the queue. Then we pull a fixed number of updates from > each priority queue each iteration. The result is that local updates get a > high fraction of the outgoing bandwidth but distant updates are guaranteed > to get at least "some" of the bandwidth. No starvation. The current > prioritization algorithm we implemented is a modification of the "best > avatar responsiveness" and "front back" in that we use root prim location > for child prims and the priority of updates "in back" of the avatar is lower > than updates "in front". Our experiments show that the fair queuing does > drain the update queue AND continues to provide a disproportionately high > percentage of the bw to "close" updates. > > One other note on this... we should be able to improve the performance of > reprioritization with this approach. If we know the distance an avatar has > moved, we only have to reprioritize objects that might have changed priority > queues. Haven't implemented this yet but have some ideas for how to do it. > > * The resend queue is evil. When an update packet is sent (they are marked > reliable) it is moved to a queue to await acknowledgement. If no > acknowledgement is received (in time), the packet is retransmitted and the > wait time is doubled and so on... What that means is that a resend packets > in a scene that is rapidly changing will often contain updates that are > outdated. That is, when we resend the packet, we are just resending old data > (and if you're having a lot of resends that means you already have a bad > connection & now you're filling it up with useless data). > > Fix: this isn't implemented yet (help would be appreciated)... we think that > instead of saving packets for resend... a better solution would be to keep > the entity updates that went into the packet. if we don't receive an ack in > time, then put the entity updates back into the entity update queue (with > entry time from their original enqueuing). That would ensure that we send an > update for the object & that the data sent is the most recent. > > * One final note... per client bandwidth throttles seem to work very well. > however, our experiments with per-simulator throttles was not positive. it > appeared that a small number of clients was consuming all of the bw > available to the simulator and the rest were starved. Haven't looked into > this any more. > > > So... > > Feedback appreciated... there is some logging code (disabled) in the branch; > real data would be great. And help testing. there are a number of > attachment, deletes and so on that i'm not sure work correctly. > > --mic > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Opensim-dev mailing list > [email protected] > https://lists.berlios.de/mailman/listinfo/opensim-dev _______________________________________________ Opensim-dev mailing list [email protected] https://lists.berlios.de/mailman/listinfo/opensim-dev
