Re: [Opensim-dev] Behaviour of adaptive throttles under high load

Justin Clark-Casey Mon, 01 Dec 2014 11:52:13 -0800

In the initial period where this problem was seen only adaptive was active, 
AFAIK.

I am quite familiar with the hierarchical structure now, as I had to investigate it to work out what was going on.Indeed, to address various bugs and make it possible to change parameters in real time for experimentation (e.g. toadjust client throttles on the fly from the console) I rewrote parts of it (without change the algorithms) and addedsome regression tests where there were none before. These changes were made after the original problem with adaptivethrottles was encountered.

So I don't believe (though I could be wrong) that it's a timing issue with the buckets themselves. That said, it's notimpossible that there's an issue somewhere deep within the UDP processing code or even in Mono. My problem is that thisis the kind of problem that takes a very large amount of time to investigate and may turn up nothing, compared withmaking the throttle reduction less aggressive in the face of a stream of ack timeouts that occur in the same second.

In the future, if a problem elsewhere is identified so that this behaviour no longer occurs then one can, of course,retighten the algorithm. However, I would posit that with UDP streams, it's always possible for a stream to dropmomentarily because of network issues and that it is better behaviour not to severe penalise what may be a momentary glitch.


On 01/12/14 16:46, Mic Bowman wrote:

and, just to be clear...

did you have *both* adaptive and total bw throttles turned on?

the interaction between the two through the hierarchical token bucket is 
another place where i was more than a little
worried. i tested that with network emulators under high load & it seemed to do 
what it was supposed to do, but i
wouldn't be surprised to find a timing issue.

--mic


On Mon, Dec 1, 2014 at 8:42 AM, Mic Bowman <[email protected] 
<mailto:[email protected]>> wrote:

    one thing that i was concerned about when i put the throttles in place is 
the relationship between congestion
    control and packet sizes. if you're generating a large number of small, 
reliable packets that are being dropped,
    that could cause the congestion control to kick in more quickly. that would 
suggest an adjustment based on bytes
    sent rather than time (though both are probably appropriate).

    my biggest concern is that we start fixing by "stabbing in the dark". 
congestion control is particularly nasty in
    how it interacts which is why i started with a well known & long battle 
tested algorithm. making random changes
    might fix one problem and introduce a half dozen others.

    i'm not in a position to help on the diagnosis until next week if you can 
wait until then.

    --mic


    On Wed, Nov 26, 2014 at 4:04 PM, Justin Clark-Casey <[email protected] 
<mailto:[email protected]>> wrote:

        This was actually happening at quite low loads (< 40 connections over 
all 4 keynotes).  Once adaptive throttles
        was disabled and other unrelated issues fixed the system had no obvious 
issues coping with higher loads in both
        testing and the conference itself (e.g. the 159 peak keynote avatars in 
the conference).  So I don't think it
        was a server bandwidth issue.

        That said, it was somewhat strange behaviour as affected only maybe 
10-20% of connections.  Once it did affect a
        connection (I saw this happening by logging downward adjustments which 
one can still do with the console command
        "debug lludp throttles log 1"), the connection would not recover - at 
some point a bunch of expires would reduce
        the throttle again.  Connections seemed to be affected randomly - I 
experienced the issue myself at one point
        and I have pretty solid fibre.

        You're right in that I don't know why this happened or why problematic 
connections stayed problematic instead of
        slowly recovering.  Because of time constraints we had to disable 
adaptive instead of investigating further.
        But I don't advocate doing this by default at all because, as you say, 
it's an important mechanism for
        congestion control.

        I do plan further investigation will happen at some point but it's time 
consuming work and I'd really love to
        get a release out soon-ish.  So for the moment I would like to do tune 
the adapation mechanism tuning as you've
        mentioned, which I believe should probably be done anyway.  Because of 
the nature of the problem, my plan would
        be not to change the adaption divisor but rather to adapt downwards 
only every 2 seconds or so if packets are
        expiring rather than on every packet expire.  I believe this should 
still achieve the adaption effect without
        massively penalising the connection if there has been a momentary 
connection issue or similar.

        On 26/11/14 02:39, Mic Bowman wrote:

            As you mention... cutting the throttle by 50% was modeled on the 
TCP congestion control approach. It is very
            aggressive
            as a congestion control mechanism and certainly could be tuned.

            That being said... do you know why the packets were considered 
un-acked? If its because the simulator is
            having problems
            (which given your description that it happens under load seems to 
be the case) then we can probably do
            something more
            intelligent about throttling over all simulator BW. That is... 
maybe the problem is that the top end of the
            overall
            simulator bw is the problem, not the per connection throttles.

            Manual throttles & adaptive throttles are not exclusive. You can 
use both. Adaptive manages the top end, but
            the manual
            throttles set an absolute max.

            --mic


            On Tue, Nov 25, 2014 at 5:15 PM, Justin Clark-Casey 
<[email protected]
            <mailto:[email protected]> <mailto:jjustincc@googlemail.__com 
<mailto:[email protected]>>> wrote:

                 Hi Mic (primarily),

                 Two years ago [1] we had a discussion about the 
enable_adaptive_throttles setting.  Just for
            background, this is a
                 setting that adapts the amount of data sent to the viewer 
depending on whether reliable packets sent
            from the
                 simulator are acked or not.  As such, it looks to make sure 
that a viewer which sets a downstream
            bandwidth higher
                 than its network connection can cope with is not permanently 
hosed with too much data.  We enabled it on an
                 experimental basis [2].

                 As you said at the time, this is modelled on the congestion 
approach used in TCP.  I see that for TCP,
            the rate is
                 halved on every unacked segment.  In OpenSimulator, it's 
halved on every unacked reliable packet.

                 However, under fairly modest load conditions in the conference 
grid, I saw a behaviour where sometimes
            for a
                 connection a sequence of packets would expire for some 
connections in a very short time period (< 1
            sec).  This
                 would halve the throttle many times, in my observations right 
down to the absolute minimum.  This
            caused the
                 behaviour from the user's point of view to degrade 
considerably for an extended period of time.  The
            throttles takes
                 quite a long period to grow again.

                 I didn't get much further with the diagnostics since a lack of 
time forced us to switch back to manual
            throttling
                 instead (with a 1 mbit per viewer and 400 mbit total on the 
keynotes).  This seemed to work okay in
            testing and in
                 the event itself.  However, this leaves one vulnerable to the 
problem adaptive_throttles looks to
            tackle in the
                 first place.

                 I'm still reading up about this stuff, but it strikes me that 
halving the throttle on every missed
            packet is much
                 harsher than the TCP approach, as with UDP a whole sequence 
can expire at once rather than a single
            segment that is
                 subsequently retried before another segment can be missed.

                 One idea is to ignore all expiries in a certain period (e.g. 
next 2 seconds) if an expired packet has
            already caused
                 the throttle to be halved.  Of course, this is a bit more 
complicated to do but hopefully not too much
            so.  What do
                 you think?  Any other ideas?

                 [1] 
http://opensimulator.org/____pipermail/opensim-dev/2011-____October/023017.html
            
<http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023017.html>
                 
<http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023017.html
            
<http://opensimulator.org/pipermail/opensim-dev/2011-October/023017.html>>
                 [2] 
http://opensimulator.org/____pipermail/opensim-dev/2011-____October/023063.html
            
<http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023063.html>
                 
<http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023063.html
            
<http://opensimulator.org/pipermail/opensim-dev/2011-October/023063.html>>

                 Best Regards,

                 --
                 Justin Clark-Casey (justincc)
                 OSVW Consulting
            http://justincc.org
            http://twitter.com/justincc
                 ___________________________________________________
                 Opensim-dev mailing list
            [email protected] <mailto:[email protected]> 
<mailto:Opensim-dev@__opensimulator.org
            <mailto:[email protected]>>
            
http://opensimulator.org/cgi-____bin/mailman/listinfo/opensim-____dev
            <http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev>
                 
<http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev
            <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>>




            _________________________________________________
            Opensim-dev mailing list
            [email protected] <mailto:[email protected]>
            http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev
            <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>



        --
        Justin Clark-Casey (justincc)
        OSVW Consulting
        http://justincc.org
        http://twitter.com/justincc
        _________________________________________________
        Opensim-dev mailing list
        [email protected] <mailto:[email protected]>
        http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev
        <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>





_______________________________________________
Opensim-dev mailing list
[email protected]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev



--
Justin Clark-Casey (justincc)
OSVW Consulting
http://justincc.org
http://twitter.com/justincc
_______________________________________________
Opensim-dev mailing list
[email protected]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev

Re: [Opensim-dev] Behaviour of adaptive throttles under high load

Reply via email to