the only thing we've touched so far is the entity update queue. that's all avatar updates & prim updates. we haven't touched any of the other packets. the resend focus would be for prims & avatar updates only. --mic
On Mon, Mar 28, 2011 at 11:19 AM, Melanie <mela...@t-data.com> wrote: > Hi, > > sounds great. > > Some things to consider: > > - Some actions require explicit sending of a packet which is an > update packet, but is used for special cases. Sit, stand, changing > group tags, creating/joining groups are all such cases where special > care needs to be taken. > > - Resend is evil for static objects and avatars, but may be needed > to sync up dead reckoning with the real data on physical objects. > Just a feeling. > > Melanie > > Mic Bowman wrote: > > Over the last several weeks, Dan Lake & I have been looking some of the > > networking performance issues in opensim. As always, our concerns are > with > > the problems caused by very complex scenes with very large numbers of > > avatars. However, I think some of the issues we have found will generally > > improve networking with OpenSim. Since the behavior represents a fairly > > significant change in behavior (though the number of lines of code is not > > great), I'm going to put this into a separate branch for testing (called > > queuetest) in the opensim git repository. > > > > We've found several problems with the current > > networking/prioritization code. > > > > * Reprioritization is completely broken for SceneObjectParts. On > > reprioritization, the current code uses the localid stored in the scene > > Entities list but since the scene does not store the localid for SOPs, > that > > attempt always fails. So the original priority of the SOP continues to be > > used. This could be the cause of some problems since the initial > > prioritization assumes position 128,128. I don't understand all the > possible > > ramifications, but suffice it to say, using the localid is causing > > problems. > > > > Fix: The sceneentity is already stored in the update, just use that > instead > > of the localid. > > > > * We currently pull (by default) 100 entity updates from the entityupdate > > queue and convert them into packets. Once converted into packets, they > are > > then queued again for transmissions. This is a bad thing. Under any kind > of > > load, we've measured the time in the packet queue to be up to many > > hundreds/thousands of milliseconds (and to be highly variable). When an > > object changes one property and then doesn't change it again, the time in > > the packet queue is largely irrelevant. However, if the object is > > continuously changing (an avatar changing position, a physical object > > moving, etc) then the conversion from a entity update to a packet > "freezes" > > the properties to be sent. If the object is continuously changing, then > with > > fairly high probability, the packet contains old data (the properties of > the > > entity from the point at which it was converted into a packet). > > > > The real problem is that, in theory, to improve the efficiency of the > > packets (fill up each message) we are grabbing big chunks of updates. > Under > > load, that causes queuing at the packet layer which makes updates stale. > > That is... queuing at the packet layer is BAD. > > > > Fix: We implemented an adaptive algorithm for the number of updates to > grab > > with each pass. We set a target time of 200ms for each iteration. That > > means, we are trying to bound the maximum age of any update in the packet > > queue to 200ms. The adaptive algorithm looks a lot like a TCP slow start: > > every time we complete an iteration (flush the packet queue) in less than > > 200ms we increase linearly the number of updates we take in the next > > iteration (add 5 to the count) and when we don't make it back in 200ms, > we > > drop the number we take quadratically (cut the number in half). In our > > experiments with large numbers of moving avatars, this algorithm works > > *very* well. The number of updates taken per iteration stabilizes very > > quickly and the response time is dramatically improved (no "snap back" on > > avatars, for example). One difference from the traditional slow start... > > since the number of "static" items in the queue is very high when a > client > > first enters a region, we start with the number of updates taken at 500. > > that gets the static items out of the queue quickly (and delay doesn't > > matter as much) and the number taken is generally stable before the > > login/teleport screen even goes away. > > > > * The current prioritization queue can lead to update starvation. The > > prioritization algorithm dumps all entity updates into a single ordered > > queue. Lets say you have several hundred avatars moving around in a > scene. > > Since we take a limited number of updates from the queue in each > iteration, > > we will take only the updates for the "closest" (highest priority) > avatars. > > However, since those avatars continue to move, they are re-inserted into > the > > priority queue *ahead* of the updates that were already there. So... > unless > > the queue can be completely emptied each iteration or the priority of the > > "distant" (low priority) avatars changes, those avatars will never be > > updated. > > > > Fix: We converted the single priority queue into multiple priority queues > > and use fair queuing to retrieve updates from each. Here's how it works > > (more or less)... the current metrics (all of the current prioritization > > algorithms use distance at some point for prioritization) compute a > distance > > from the avatar/camera to an object. We take the log of that distance and > > use that as the index for the queue where we place the update. So close > > things go into the highest priority queue and distant things go into the > > lowest priority queue. Since the area covered by a priority queue grows > as > > the square of the radius, the distant (lowest priority queues) will have > the > > most objects while the highest priority queues will have a small number > of > > objects. Inside each priority queue, we order the updates by the time in > > which they entered the queue. Then we pull a fixed number of updates from > > each priority queue each iteration. The result is that local updates get > a > > high fraction of the outgoing bandwidth but distant updates are > guaranteed > > to get at least "some" of the bandwidth. No starvation. The current > > prioritization algorithm we implemented is a modification of the "best > > avatar responsiveness" and "front back" in that we use root prim location > > for child prims and the priority of updates "in back" of the avatar is > lower > > than updates "in front". Our experiments show that the fair queuing does > > drain the update queue AND continues to provide a disproportionately high > > percentage of the bw to "close" updates. > > > > One other note on this... we should be able to improve the performance of > > reprioritization with this approach. If we know the distance an avatar > has > > moved, we only have to reprioritize objects that might have changed > priority > > queues. Haven't implemented this yet but have some ideas for how to do > it. > > > > * The resend queue is evil. When an update packet is sent (they are > marked > > reliable) it is moved to a queue to await acknowledgement. If no > > acknowledgement is received (in time), the packet is retransmitted and > the > > wait time is doubled and so on... What that means is that a resend > packets > > in a scene that is rapidly changing will often contain updates that are > > outdated. That is, when we resend the packet, we are just resending old > data > > (and if you're having a lot of resends that means you already have a bad > > connection & now you're filling it up with useless data). > > > > Fix: this isn't implemented yet (help would be appreciated)... we think > that > > instead of saving packets for resend... a better solution would be to > keep > > the entity updates that went into the packet. if we don't receive an ack > in > > time, then put the entity updates back into the entity update queue (with > > entry time from their original enqueuing). That would ensure that we send > an > > update for the object & that the data sent is the most recent. > > > > * One final note... per client bandwidth throttles seem to work very > well. > > however, our experiments with per-simulator throttles was not positive. > it > > appeared that a small number of clients was consuming all of the bw > > available to the simulator and the rest were starved. Haven't looked into > > this any more. > > > > > > So... > > > > Feedback appreciated... there is some logging code (disabled) in the > branch; > > real data would be great. And help testing. there are a number of > > attachment, deletes and so on that i'm not sure work correctly. > > > > --mic > > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Opensim-dev mailing list > > Opensim-dev@lists.berlios.de > > https://lists.berlios.de/mailman/listinfo/opensim-dev > _______________________________________________ > Opensim-dev mailing list > Opensim-dev@lists.berlios.de > https://lists.berlios.de/mailman/listinfo/opensim-dev >
_______________________________________________ Opensim-dev mailing list Opensim-dev@lists.berlios.de https://lists.berlios.de/mailman/listinfo/opensim-dev