I agree with you about not stabbing in the dark. That is why I spent an enormous amount of treasure on improving pCampbot/libomv, creating other tools and adding more stat and real time change capabiltiies to allow us to reproduce issues before making changes.

So it's unfortunate that in this case that I simply ran out of time to properly investigate the issue with adaptive throttles being decimated by a string of ack waits expiring in the same second.

My major concern is that this could well be the single remaining blocker to decent performance at higher avatar numbers than has previously been possible in core out-of-the-box OpenSimulator (excpet for using mysql instead of sqlite). It won't matter that every other issue has been addressed - just this one is potentially enough to still screw up performance. So I really want to address this before making the next release. I really don't want to tell people having problems that they should try switching off adaptive - OpenSimulator should work out-of-the-box for as broad a selection of load profiles as possible.

The problem I face now is that in a perfect world one would definitely go back and do extensive testing on this particular issue. However, I am quickly running out of time and resources to do so, along with all the other issues that need to be addressed before a release can be made.

Hence, I favour making what I think is a single innocuous change to throttle back a tiny bit more slowly, where continual packet loss will always throttle back to absolutely minimum anyway - it may just take slightly longer. On the surface, a time adjustment appears simpler to me than byte counting, though I haven't thought about that much. There would be no further changes at this point. Of course, there would be testing but maybe not the extensive and time expensive load testing that took place during the conference buildup. Although if you have time to help with that it would be very welcome. If so, I am happy to wait till next week.

I'm also not sure I regard this as a tried and tested algorithm in this context. It certainly is in the TCP world but there things appear to be rather different - only one segment is going to expire at a time and halve the throttle (not many UDP acks expiring at once to decimate it). Also, the build up in TCP land looks rather quickly - in our case we only increase throttle on receipt of an ack. The number of reliable packets sent from server to client is not that high so the throttle takes a considerable period to build up from low levels.

On 01/12/14 16:42, Mic Bowman wrote:
one thing that i was concerned about when i put the throttles in place is the 
relationship between congestion control
and packet sizes. if you're generating a large number of small, reliable 
packets that are being dropped, that could
cause the congestion control to kick in more quickly. that would suggest an 
adjustment based on bytes sent rather than
time (though both are probably appropriate).

my biggest concern is that we start fixing by "stabbing in the dark". 
congestion control is particularly nasty in how it
interacts which is why i started with a well known & long battle tested 
algorithm. making random changes might fix one
problem and introduce a half dozen others.

i'm not in a position to help on the diagnosis until next week if you can wait 
until then.

--mic


On Wed, Nov 26, 2014 at 4:04 PM, Justin Clark-Casey <[email protected] 
<mailto:[email protected]>> wrote:

    This was actually happening at quite low loads (< 40 connections over all 4 
keynotes).  Once adaptive throttles was
    disabled and other unrelated issues fixed the system had no obvious issues 
coping with higher loads in both testing
    and the conference itself (e.g. the 159 peak keynote avatars in the 
conference).  So I don't think it was a server
    bandwidth issue.

    That said, it was somewhat strange behaviour as affected only maybe 10-20% 
of connections.  Once it did affect a
    connection (I saw this happening by logging downward adjustments which one 
can still do with the console command
    "debug lludp throttles log 1"), the connection would not recover - at some 
point a bunch of expires would reduce the
    throttle again.  Connections seemed to be affected randomly - I experienced 
the issue myself at one point and I have
    pretty solid fibre.

    You're right in that I don't know why this happened or why problematic 
connections stayed problematic instead of
    slowly recovering.  Because of time constraints we had to disable adaptive 
instead of investigating further.  But I
    don't advocate doing this by default at all because, as you say, it's an 
important mechanism for congestion control.

    I do plan further investigation will happen at some point but it's time 
consuming work and I'd really love to get a
    release out soon-ish.  So for the moment I would like to do tune the 
adapation mechanism tuning as you've mentioned,
    which I believe should probably be done anyway.  Because of the nature of 
the problem, my plan would be not to
    change the adaption divisor but rather to adapt downwards only every 2 
seconds or so if packets are expiring rather
    than on every packet expire.  I believe this should still achieve the 
adaption effect without massively penalising
    the connection if there has been a momentary connection issue or similar.

    On 26/11/14 02:39, Mic Bowman wrote:

        As you mention... cutting the throttle by 50% was modeled on the TCP 
congestion control approach. It is very
        aggressive
        as a congestion control mechanism and certainly could be tuned.

        That being said... do you know why the packets were considered 
un-acked? If its because the simulator is having
        problems
        (which given your description that it happens under load seems to be 
the case) then we can probably do something
        more
        intelligent about throttling over all simulator BW. That is... maybe 
the problem is that the top end of the overall
        simulator bw is the problem, not the per connection throttles.

        Manual throttles & adaptive throttles are not exclusive. You can use 
both. Adaptive manages the top end, but the
        manual
        throttles set an absolute max.

        --mic


        On Tue, Nov 25, 2014 at 5:15 PM, Justin Clark-Casey <[email protected] 
<mailto:[email protected]>
        <mailto:jjustincc@googlemail.__com <mailto:[email protected]>>> 
wrote:

             Hi Mic (primarily),

             Two years ago [1] we had a discussion about the 
enable_adaptive_throttles setting.  Just for background,
        this is a
             setting that adapts the amount of data sent to the viewer 
depending on whether reliable packets sent from the
             simulator are acked or not.  As such, it looks to make sure that a 
viewer which sets a downstream bandwidth
        higher
             than its network connection can cope with is not permanently hosed 
with too much data.  We enabled it on an
             experimental basis [2].

             As you said at the time, this is modelled on the congestion 
approach used in TCP.  I see that for TCP, the
        rate is
             halved on every unacked segment.  In OpenSimulator, it's halved on 
every unacked reliable packet.

             However, under fairly modest load conditions in the conference 
grid, I saw a behaviour where sometimes for a
             connection a sequence of packets would expire for some connections in 
a very short time period (< 1 sec).  This
             would halve the throttle many times, in my observations right down 
to the absolute minimum.  This caused the
             behaviour from the user's point of view to degrade considerably 
for an extended period of time.  The
        throttles takes
             quite a long period to grow again.

             I didn't get much further with the diagnostics since a lack of 
time forced us to switch back to manual
        throttling
             instead (with a 1 mbit per viewer and 400 mbit total on the 
keynotes).  This seemed to work okay in testing
        and in
             the event itself.  However, this leaves one vulnerable to the 
problem adaptive_throttles looks to tackle in the
             first place.

             I'm still reading up about this stuff, but it strikes me that 
halving the throttle on every missed packet
        is much
             harsher than the TCP approach, as with UDP a whole sequence can 
expire at once rather than a single segment
        that is
             subsequently retried before another segment can be missed.

             One idea is to ignore all expiries in a certain period (e.g. next 
2 seconds) if an expired packet has
        already caused
             the throttle to be halved.  Of course, this is a bit more 
complicated to do but hopefully not too much so.
        What do
             you think?  Any other ideas?

             [1] 
http://opensimulator.org/____pipermail/opensim-dev/2011-____October/023017.html
        
<http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023017.html>
             
<http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023017.html
        
<http://opensimulator.org/pipermail/opensim-dev/2011-October/023017.html>>
             [2] 
http://opensimulator.org/____pipermail/opensim-dev/2011-____October/023063.html
        
<http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023063.html>
             
<http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023063.html
        
<http://opensimulator.org/pipermail/opensim-dev/2011-October/023063.html>>

             Best Regards,

             --
             Justin Clark-Casey (justincc)
             OSVW Consulting
        http://justincc.org
        http://twitter.com/justincc
             ___________________________________________________
             Opensim-dev mailing list
        [email protected] <mailto:[email protected]> 
<mailto:Opensim-dev@__opensimulator.org
        <mailto:[email protected]>>
        http://opensimulator.org/cgi-____bin/mailman/listinfo/opensim-____dev
        <http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev>
             <http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev
        <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>>




        _________________________________________________
        Opensim-dev mailing list
        [email protected] <mailto:[email protected]>
        http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev
        <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>



    --
    Justin Clark-Casey (justincc)
    OSVW Consulting
    http://justincc.org
    http://twitter.com/justincc
    _________________________________________________
    Opensim-dev mailing list
    [email protected] <mailto:[email protected]>
    http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev
    <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>




_______________________________________________
Opensim-dev mailing list
[email protected]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev



--
Justin Clark-Casey (justincc)
OSVW Consulting
http://justincc.org
http://twitter.com/justincc
_______________________________________________
Opensim-dev mailing list
[email protected]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev

Reply via email to