Re: [Opensim-dev] networking issues

Mic Bowman Mon, 28 Mar 2011 12:20:10 -0700

the only thing we've touched so far is the entity update queue. that's all
avatar updates & prim updates. we haven't touched any of the other packets.
the resend focus would be for prims & avatar updates only.
--mic



On Mon, Mar 28, 2011 at 11:19 AM, Melanie <mela...@t-data.com> wrote:

> Hi,
>
> sounds great.
>
> Some things to consider:
>
> - Some actions require explicit sending of a packet which is an
> update packet, but is used for special cases. Sit, stand, changing
> group tags, creating/joining groups are all such cases where special
> care needs to be taken.
>
> - Resend is evil for static objects and avatars, but may be needed
> to sync up dead reckoning with the real data on physical objects.
> Just a feeling.
>
> Melanie
>
> Mic Bowman wrote:
> > Over the last several weeks, Dan Lake & I have been looking some of the
> > networking performance issues in opensim. As always, our concerns are
> with
> > the problems caused by very complex scenes with very large numbers of
> > avatars. However, I think some of the issues we have found will generally
> > improve networking with OpenSim. Since the behavior represents a fairly
> > significant change in behavior (though the number of lines of code is not
> > great), I'm going to put this into a separate branch for testing (called
> > queuetest) in the opensim git repository.
> >
> > We've found several problems with the current
> > networking/prioritization code.
> >
> > * Reprioritization is completely broken for SceneObjectParts. On
> > reprioritization, the current code uses the localid stored in the scene
> > Entities list but since the scene does not store the localid for SOPs,
> that
> > attempt always fails. So the original priority of the SOP continues to be
> > used. This could be the cause of some problems since the initial
> > prioritization assumes position 128,128. I don't understand all the
> possible
> > ramifications, but suffice it to say, using the localid is causing
> > problems.
> >
> > Fix: The sceneentity is already stored in the update, just use that
> instead
> > of the localid.
> >
> > * We currently pull (by default) 100 entity updates from the entityupdate
> > queue and convert them into packets. Once converted into packets, they
> are
> > then queued again for transmissions. This is a bad thing. Under any kind
> of
> > load, we've measured the time in the packet queue to be up to many
> > hundreds/thousands of milliseconds (and to be highly variable). When an
> > object changes one property and then doesn't change it again, the time in
> > the packet queue is largely irrelevant. However, if the object is
> > continuously changing (an avatar changing position, a physical object
> > moving, etc) then the conversion from a entity update to a packet
> "freezes"
> > the properties to be sent. If the object is continuously changing, then
> with
> > fairly high probability, the packet contains old data (the properties of
> the
> > entity from the point at which it was converted into a packet).
> >
> > The real problem is that, in theory, to improve the efficiency of the
> > packets (fill up each message) we are grabbing big chunks of updates.
> Under
> > load, that causes queuing at the packet layer which makes updates stale.
> > That is... queuing at the packet layer is BAD.
> >
> > Fix: We implemented an adaptive algorithm for the number of updates to
> grab
> > with each pass. We set a target time of 200ms for each iteration. That
> > means, we are trying to bound the maximum age of any update in the packet
> > queue to 200ms. The adaptive algorithm looks a lot like a TCP slow start:
> > every time we complete an iteration (flush the packet queue) in less than
> > 200ms we increase linearly the number of updates we take in the next
> > iteration (add 5 to the count) and when we don't make it back in 200ms,
> we
> > drop the number we take quadratically (cut the number in half). In our
> > experiments with large numbers of moving avatars, this algorithm works
> > *very* well. The number of updates taken per iteration stabilizes very
> > quickly and the response time is dramatically improved (no "snap back" on
> > avatars, for example). One difference from the traditional slow start...
> > since the number of "static" items in the queue is very high when a
> client
> > first enters a region, we start with the number of updates taken at 500.
> > that gets the static items out of the queue quickly (and delay doesn't
> > matter as much) and the number taken is generally stable before the
> > login/teleport screen even goes away.
> >
> > * The current prioritization queue can lead to update starvation. The
> > prioritization algorithm dumps all entity updates into a single ordered
> > queue. Lets say you have several hundred avatars moving around in a
> scene.
> > Since we take a limited number of updates from the queue in each
> iteration,
> > we will take only the updates for the "closest" (highest priority)
> avatars.
> > However, since those avatars continue to move, they are re-inserted into
> the
> > priority queue *ahead* of the updates that were already there. So...
> unless
> > the queue can be completely emptied each iteration or the priority of the
> > "distant" (low priority) avatars changes, those avatars will never be
> > updated.
> >
> > Fix: We converted the single priority queue into multiple priority queues
> > and use fair queuing to retrieve updates from each. Here's how it works
> > (more or less)... the current metrics (all of the current prioritization
> > algorithms use distance at some point for prioritization) compute a
> distance
> > from the avatar/camera to an object. We take the log of that distance and
> > use that as the index for the queue where we place the update. So close
> > things go into the highest priority queue and distant things go into the
> > lowest priority queue. Since the area covered by a priority queue grows
> as
> > the square of the radius, the distant (lowest priority queues) will have
> the
> > most objects while the highest priority queues will have a small number
> of
> > objects. Inside each priority queue, we order the updates by the time in
> > which they entered the queue. Then we pull a fixed number of updates from
> > each priority queue each iteration. The result is that local updates get
> a
> > high fraction of the outgoing bandwidth but distant updates are
> guaranteed
> > to get at least "some" of the bandwidth. No starvation. The current
> > prioritization algorithm we implemented is a modification of the "best
> > avatar responsiveness" and "front back" in that we use root prim location
> > for child prims and the priority of updates "in back" of the avatar is
> lower
> > than updates "in front". Our experiments show that the fair queuing does
> > drain the update queue AND continues to provide a disproportionately high
> > percentage of the bw to "close" updates.
> >
> > One other note on this... we should be able to improve the performance of
> > reprioritization with this approach. If we know the distance an avatar
> has
> > moved, we only have to reprioritize objects that might have changed
> priority
> > queues. Haven't implemented this yet but have some ideas for how to do
> it.
> >
> > * The resend queue is evil. When an update packet is sent (they are
> marked
> > reliable) it is moved to a queue to await acknowledgement. If no
> > acknowledgement is received (in time), the packet is retransmitted and
> the
> > wait time is doubled and so on... What that means is that a resend
> packets
> > in a scene that is rapidly changing will often contain updates that are
> > outdated. That is, when we resend the packet, we are just resending old
> data
> > (and if you're having a lot of resends that means you already have a bad
> > connection & now you're filling it up with useless data).
> >
> > Fix: this isn't implemented yet (help would be appreciated)... we think
> that
> > instead of saving packets for resend... a better solution would be to
> keep
> > the entity updates that went into the packet. if we don't receive an ack
> in
> > time, then put the entity updates back into the entity update queue (with
> > entry time from their original enqueuing). That would ensure that we send
> an
> > update for the object & that the data sent is the most recent.
> >
> > * One final note... per client bandwidth throttles seem to work very
> well.
> > however, our experiments with per-simulator throttles was not positive.
> it
> > appeared that a small number of clients was consuming all of the bw
> > available to the simulator and the rest were starved. Haven't looked into
> > this any more.
> >
> >
> > So...
> >
> > Feedback appreciated... there is some logging code (disabled) in the
> branch;
> > real data would be great. And help testing. there are a number of
> > attachment, deletes and so on that i'm not sure work correctly.
> >
> > --mic
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Opensim-dev mailing list
> > Opensim-dev@lists.berlios.de
> > https://lists.berlios.de/mailman/listinfo/opensim-dev
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev@lists.berlios.de
> https://lists.berlios.de/mailman/listinfo/opensim-dev
>

_______________________________________________
Opensim-dev mailing list
Opensim-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/opensim-dev

Re: [Opensim-dev] networking issues

Reply via email to