Re: [Tigervnc-devel] The deferred update timer

2011-12-09 Thread Pierre Ossman
On Wed, 07 Dec 2011 04:39:40 -0600
DRC dcomman...@users.sourceforge.net wrote:

 
 This is an Amdahl's Law thing.  The frame rate is capped to 1 /
 ({deferred update time} + {FBU processing time}).  Whether or not the
 FBU processing time is all CPU or some CPU and some I/O is irrelevant.
 As long as the server can't receive new X updates during that time, then
 the effect is the same.
 

That's my point. It is very relevant if it is CPU or I/O as we can deal
with those in very different ways. If I/O is the problem, then
increasing the buffer size should make the issue go away.

  As to a solution, the only proper one is to reduce the time spent
  encoding stuff. We can't really do it in the background as we can't
 
 No, because, per above, if we speed up processing, then the DUT delay
 will have more of a relative effect, not less.  If it takes 10 ms on
 average to process every update, then the difference between a 1 ms and
 a 10 ms DUT is a 45% throughput loss.  If it takes 100 ms on average to
 process every update, then the difference between a 1 ms and a 10 ms DUT
 is only an 8% throughput loss.

So? We don't want throughput for throughput's sake. Want we want is
sufficient throughput. Anything above that is wasting resources. As to
what is sufficent, that's certainly up for discussion.

  That said, perhaps we can consider treating this as a corner case and
  dial back the aggregation when we are hitting the CPU wall.
 
 Well, ultimately the purpose of
 aggregation/coalescence/whatever-you-want-to-call-it is increased
 performance, so I'm assuming there is data to show that 10 ms performs
 better than 1 ms under certain conditions?  I have never seen it to have
 any effect other than a negative one, either on a WAN or a LAN.  We
 really need to figure out a way to address these issues quantitatively,
 because I feel like I have a lot of data to show what does and doesn't
 work for 3D and video apps, but there is not the same data to show what
 does and doesn't work for Firefox or OpenOffice or whatever, nor a
 reproducible way to measure the effect of certain design decisions on
 such apps.

It's not just performance. The deferred updates have three purposes:

 - Aggregate updates in the hope that we'll get a more efficient
   transfer when overlapping or adjacent areas get modified.

 - Rate limit the updates to avoid spending CPU and bandwidth on
   something that will have little to no perceived effect.

 - Avoid partially updated applications by trying to get through the
   applications entire update routine before sending anything.

There has not been any thorough investigation in how well it achieves
these goals, no. I messed around with it because it was getting in the
way of my other work. I had two options at that point, remove it or fix
it according to what it was supposed to do. I did not reevaluate its
basic premise.

As for the default setting, it's mostly pulled out of my ass. The old
default of 40 ms seemed like an excessive frame rate limit. And the new
default of 1 ms (which was more of a workaround than an actual setting
given how flaky the old code was) seemed insufficient for applications
that rendered something complex.

 I will also say that LAN usage is not a corner case, and if we treat it
 as such, it will become a self-fulfilling prophesy, because no one will
 want to use TigerVNC on a LAN if it's slow.

It's not a corner case. But it is also not the most important use case.
And LAN doesn't automatically mean infinite bandwidth. In many cases it
just means low latency.

We should balance the needs, make it easier to switch settings
depending on the use case and preferably work towards a system that can
automatically reconfigure itself.

Rgds
-- 
Pierre OssmanOpenSource-based Thin Client Technology
System Developer Telephone: +46-13-21 46 00
Cendio ABWeb: http://www.cendio.com

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/___
Tigervnc-devel mailing list
Tigervnc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-devel


Re: [Tigervnc-devel] The deferred update timer

2011-12-07 Thread Pierre Ossman
On Fri, 02 Dec 2011 04:14:14 -0600
DRC dcomman...@users.sourceforge.net wrote:

 On 12/2/11 3:19 AM, Pierre Ossman wrote:
  Annoying. I did some work at making the thing more asynchronous, but
  more might be needed. If the problem is getting the data on the wire
  rather than the actual encoding, then a quick fix is increasing the
  outgoing buffer size. As long as your entire update fits in there, then
  X won't be throttled (and the update timer should also be more precise).
 
 Not sure if I follow.  It's not the send delay that's the problem-- it's
 the time that the CPU takes to encode the framebuffer.

Just wanted to double check that you've looked at actual CPU time, and
not the wall time it spends in writeFramebufferUpdate(). The latter
includes both CPU time and the time needed to drain the socket buffer
(if it fills).

 Now, with the latest TigerVNC code, my understanding of it is that
 FBUR's no longer result in an immediate FBU unless there is no deferred
 update currently in progress.  Thus, almost all updates will be deferred
 updates now.  That means that we're always going to incur the overhead
 of the deferred update timer on every frame.

Indeed. The purpose of the deferred updates is to aggregate X11 stuff,
and so should be fairly independent of what the VNC clients are up to.
Note that FBUR:s still have some influence here as it will continue to
aggregate stuff (and not reset the timer) if there is no client ready.

As to a solution, the only proper one is to reduce the time spent
encoding stuff. We can't really do it in the background as we can't
allow the framebuffer to be modified whilst we're encoding. Double
buffering is one way to go, but I wouldn't be surprised if the copying
between the buffers would eat up any performance gain.

That said, perhaps we can consider treating this as a corner case and
dial back the aggregation when we are hitting the CPU wall.

Rgds
-- 
Pierre OssmanOpenSource-based Thin Client Technology
System Developer Telephone: +46-13-21 46 00
Cendio ABWeb: http://www.cendio.com

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/___
Tigervnc-devel mailing list
Tigervnc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-devel


Re: [Tigervnc-devel] The deferred update timer

2011-12-07 Thread DRC
On 12/7/11 2:44 AM, Pierre Ossman wrote:
 Annoying. I did some work at making the thing more asynchronous, but
 more might be needed. If the problem is getting the data on the wire
 rather than the actual encoding, then a quick fix is increasing the
 outgoing buffer size. As long as your entire update fits in there, then
 X won't be throttled (and the update timer should also be more precise).

 Not sure if I follow.  It's not the send delay that's the problem-- it's
 the time that the CPU takes to encode the framebuffer.
 
 Just wanted to double check that you've looked at actual CPU time, and
 not the wall time it spends in writeFramebufferUpdate(). The latter
 includes both CPU time and the time needed to drain the socket buffer
 (if it fills).

This is an Amdahl's Law thing.  The frame rate is capped to 1 /
({deferred update time} + {FBU processing time}).  Whether or not the
FBU processing time is all CPU or some CPU and some I/O is irrelevant.
As long as the server can't receive new X updates during that time, then
the effect is the same.


 Now, with the latest TigerVNC code, my understanding of it is that
 FBUR's no longer result in an immediate FBU unless there is no deferred
 update currently in progress.  Thus, almost all updates will be deferred
 updates now.  That means that we're always going to incur the overhead
 of the deferred update timer on every frame.
 
 Indeed. The purpose of the deferred updates is to aggregate X11 stuff,
 and so should be fairly independent of what the VNC clients are up to.
 Note that FBUR:s still have some influence here as it will continue to
 aggregate stuff (and not reset the timer) if there is no client ready.
 
 As to a solution, the only proper one is to reduce the time spent
 encoding stuff. We can't really do it in the background as we can't

No, because, per above, if we speed up processing, then the DUT delay
will have more of a relative effect, not less.  If it takes 10 ms on
average to process every update, then the difference between a 1 ms and
a 10 ms DUT is a 45% throughput loss.  If it takes 100 ms on average to
process every update, then the difference between a 1 ms and a 10 ms DUT
is only an 8% throughput loss.


 allow the framebuffer to be modified whilst we're encoding. Double
 buffering is one way to go, but I wouldn't be surprised if the copying
 between the buffers would eat up any performance gain.

It wouldn't eat up any performance gain.  memcpy()ing large blocks is
typically very quick, and actually, the CUT is already double buffering.
 However, double buffering does eat up a lot of memory.  I looked into
double buffering with TurboVNC, in an attempt to figure out how to
create a separate compress/send thread for each client and do flow
control that way (the way VirtualGL does it.)  Ultimately, I figured out
that it was possible, but it would require maintaining an intermediary
buffer for each client, and you can imagine that with the 4-megapixel
sessions that TurboVNC users commonly use, if they try to collaborate
with 5 people, suddenly they have a 100 MB VNC process on their hands.


 That said, perhaps we can consider treating this as a corner case and
 dial back the aggregation when we are hitting the CPU wall.

Well, ultimately the purpose of
aggregation/coalescence/whatever-you-want-to-call-it is increased
performance, so I'm assuming there is data to show that 10 ms performs
better than 1 ms under certain conditions?  I have never seen it to have
any effect other than a negative one, either on a WAN or a LAN.  We
really need to figure out a way to address these issues quantitatively,
because I feel like I have a lot of data to show what does and doesn't
work for 3D and video apps, but there is not the same data to show what
does and doesn't work for Firefox or OpenOffice or whatever, nor a
reproducible way to measure the effect of certain design decisions on
such apps.

I will also say that LAN usage is not a corner case, and if we treat it
as such, it will become a self-fulfilling prophesy, because no one will
want to use TigerVNC on a LAN if it's slow.

--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
___
Tigervnc-devel mailing list
Tigervnc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-devel


Re: [Tigervnc-devel] The deferred update timer

2011-12-02 Thread Pierre Ossman
On Thu, 01 Dec 2011 18:01:32 -0600
DRC dcomman...@users.sourceforge.net wrote:

 When the deferred update timer behavior was recently overhauled such
 that it pushes out updates whenever the timer is triggered rather than
 waiting for an update request from the client, the default DUT value was
 also changed to 10 ms (from 1 ms.)  Unfortunately, setting the DUT to 10
 ms results in a dramatic decrease in peak performance on high-speed
 networks.  The reason is that, when the timer is set, all X updates that
 arrive between that time and the time it is triggered are coalesced.
 As soon as the timer is triggered, a framebuffer update containing all
 of the coalesced X updates is sent immediately, then the server is tied
 up sending the update and cannot process any new X updates until the
 update is sent.  Once the update is sent, then the first new X update
 starts the deferred update timer again.  Effectively, what this means is
 that the frame rate is capped to 1 / (deferred update time + encoding
 time), and since the encoding time is typically about 20 ms for a
 1280x1024 screen, setting the DUT to 10 ms caps the frame rate at about
 30 Hz for such a screen, whereas previously it was near 50 Hz when the
 DUT was 1 ms.

Annoying. I did some work at making the thing more asynchronous, but
more might be needed. If the problem is getting the data on the wire
rather than the actual encoding, then a quick fix is increasing the
outgoing buffer size. As long as your entire update fits in there, then
X won't be throttled (and the update timer should also be more precise).

Rgds
-- 
Pierre OssmanOpenSource-based Thin Client Technology
System Developer Telephone: +46-13-21 46 00
Cendio ABWeb: http://www.cendio.com

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
Tigervnc-devel mailing list
Tigervnc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-devel


[Tigervnc-devel] The deferred update timer

2011-12-01 Thread DRC
When the deferred update timer behavior was recently overhauled such
that it pushes out updates whenever the timer is triggered rather than
waiting for an update request from the client, the default DUT value was
also changed to 10 ms (from 1 ms.)  Unfortunately, setting the DUT to 10
ms results in a dramatic decrease in peak performance on high-speed
networks.  The reason is that, when the timer is set, all X updates that
arrive between that time and the time it is triggered are coalesced.
As soon as the timer is triggered, a framebuffer update containing all
of the coalesced X updates is sent immediately, then the server is tied
up sending the update and cannot process any new X updates until the
update is sent.  Once the update is sent, then the first new X update
starts the deferred update timer again.  Effectively, what this means is
that the frame rate is capped to 1 / (deferred update time + encoding
time), and since the encoding time is typically about 20 ms for a
1280x1024 screen, setting the DUT to 10 ms caps the frame rate at about
30 Hz for such a screen, whereas previously it was near 50 Hz when the
DUT was 1 ms.

I'm not sure how best to address this, but it does represent a
performance regression, since LAN performance has now decreased by 40%
with default settings.

DRC

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
Tigervnc-devel mailing list
Tigervnc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-devel