Re: [Tigervnc-devel] The deferred update timer
On Wed, 07 Dec 2011 04:39:40 -0600 DRC dcomman...@users.sourceforge.net wrote: This is an Amdahl's Law thing. The frame rate is capped to 1 / ({deferred update time} + {FBU processing time}). Whether or not the FBU processing time is all CPU or some CPU and some I/O is irrelevant. As long as the server can't receive new X updates during that time, then the effect is the same. That's my point. It is very relevant if it is CPU or I/O as we can deal with those in very different ways. If I/O is the problem, then increasing the buffer size should make the issue go away. As to a solution, the only proper one is to reduce the time spent encoding stuff. We can't really do it in the background as we can't No, because, per above, if we speed up processing, then the DUT delay will have more of a relative effect, not less. If it takes 10 ms on average to process every update, then the difference between a 1 ms and a 10 ms DUT is a 45% throughput loss. If it takes 100 ms on average to process every update, then the difference between a 1 ms and a 10 ms DUT is only an 8% throughput loss. So? We don't want throughput for throughput's sake. Want we want is sufficient throughput. Anything above that is wasting resources. As to what is sufficent, that's certainly up for discussion. That said, perhaps we can consider treating this as a corner case and dial back the aggregation when we are hitting the CPU wall. Well, ultimately the purpose of aggregation/coalescence/whatever-you-want-to-call-it is increased performance, so I'm assuming there is data to show that 10 ms performs better than 1 ms under certain conditions? I have never seen it to have any effect other than a negative one, either on a WAN or a LAN. We really need to figure out a way to address these issues quantitatively, because I feel like I have a lot of data to show what does and doesn't work for 3D and video apps, but there is not the same data to show what does and doesn't work for Firefox or OpenOffice or whatever, nor a reproducible way to measure the effect of certain design decisions on such apps. It's not just performance. The deferred updates have three purposes: - Aggregate updates in the hope that we'll get a more efficient transfer when overlapping or adjacent areas get modified. - Rate limit the updates to avoid spending CPU and bandwidth on something that will have little to no perceived effect. - Avoid partially updated applications by trying to get through the applications entire update routine before sending anything. There has not been any thorough investigation in how well it achieves these goals, no. I messed around with it because it was getting in the way of my other work. I had two options at that point, remove it or fix it according to what it was supposed to do. I did not reevaluate its basic premise. As for the default setting, it's mostly pulled out of my ass. The old default of 40 ms seemed like an excessive frame rate limit. And the new default of 1 ms (which was more of a workaround than an actual setting given how flaky the old code was) seemed insufficient for applications that rendered something complex. I will also say that LAN usage is not a corner case, and if we treat it as such, it will become a self-fulfilling prophesy, because no one will want to use TigerVNC on a LAN if it's slow. It's not a corner case. But it is also not the most important use case. And LAN doesn't automatically mean infinite bandwidth. In many cases it just means low latency. We should balance the needs, make it easier to switch settings depending on the use case and preferably work towards a system that can automatically reconfigure itself. Rgds -- Pierre OssmanOpenSource-based Thin Client Technology System Developer Telephone: +46-13-21 46 00 Cendio ABWeb: http://www.cendio.com A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature -- Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/___ Tigervnc-devel mailing list Tigervnc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tigervnc-devel
Re: [Tigervnc-devel] The deferred update timer
On Fri, 02 Dec 2011 04:14:14 -0600 DRC dcomman...@users.sourceforge.net wrote: On 12/2/11 3:19 AM, Pierre Ossman wrote: Annoying. I did some work at making the thing more asynchronous, but more might be needed. If the problem is getting the data on the wire rather than the actual encoding, then a quick fix is increasing the outgoing buffer size. As long as your entire update fits in there, then X won't be throttled (and the update timer should also be more precise). Not sure if I follow. It's not the send delay that's the problem-- it's the time that the CPU takes to encode the framebuffer. Just wanted to double check that you've looked at actual CPU time, and not the wall time it spends in writeFramebufferUpdate(). The latter includes both CPU time and the time needed to drain the socket buffer (if it fills). Now, with the latest TigerVNC code, my understanding of it is that FBUR's no longer result in an immediate FBU unless there is no deferred update currently in progress. Thus, almost all updates will be deferred updates now. That means that we're always going to incur the overhead of the deferred update timer on every frame. Indeed. The purpose of the deferred updates is to aggregate X11 stuff, and so should be fairly independent of what the VNC clients are up to. Note that FBUR:s still have some influence here as it will continue to aggregate stuff (and not reset the timer) if there is no client ready. As to a solution, the only proper one is to reduce the time spent encoding stuff. We can't really do it in the background as we can't allow the framebuffer to be modified whilst we're encoding. Double buffering is one way to go, but I wouldn't be surprised if the copying between the buffers would eat up any performance gain. That said, perhaps we can consider treating this as a corner case and dial back the aggregation when we are hitting the CPU wall. Rgds -- Pierre OssmanOpenSource-based Thin Client Technology System Developer Telephone: +46-13-21 46 00 Cendio ABWeb: http://www.cendio.com A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature -- Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/___ Tigervnc-devel mailing list Tigervnc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tigervnc-devel
Re: [Tigervnc-devel] The deferred update timer
On 12/7/11 2:44 AM, Pierre Ossman wrote: Annoying. I did some work at making the thing more asynchronous, but more might be needed. If the problem is getting the data on the wire rather than the actual encoding, then a quick fix is increasing the outgoing buffer size. As long as your entire update fits in there, then X won't be throttled (and the update timer should also be more precise). Not sure if I follow. It's not the send delay that's the problem-- it's the time that the CPU takes to encode the framebuffer. Just wanted to double check that you've looked at actual CPU time, and not the wall time it spends in writeFramebufferUpdate(). The latter includes both CPU time and the time needed to drain the socket buffer (if it fills). This is an Amdahl's Law thing. The frame rate is capped to 1 / ({deferred update time} + {FBU processing time}). Whether or not the FBU processing time is all CPU or some CPU and some I/O is irrelevant. As long as the server can't receive new X updates during that time, then the effect is the same. Now, with the latest TigerVNC code, my understanding of it is that FBUR's no longer result in an immediate FBU unless there is no deferred update currently in progress. Thus, almost all updates will be deferred updates now. That means that we're always going to incur the overhead of the deferred update timer on every frame. Indeed. The purpose of the deferred updates is to aggregate X11 stuff, and so should be fairly independent of what the VNC clients are up to. Note that FBUR:s still have some influence here as it will continue to aggregate stuff (and not reset the timer) if there is no client ready. As to a solution, the only proper one is to reduce the time spent encoding stuff. We can't really do it in the background as we can't No, because, per above, if we speed up processing, then the DUT delay will have more of a relative effect, not less. If it takes 10 ms on average to process every update, then the difference between a 1 ms and a 10 ms DUT is a 45% throughput loss. If it takes 100 ms on average to process every update, then the difference between a 1 ms and a 10 ms DUT is only an 8% throughput loss. allow the framebuffer to be modified whilst we're encoding. Double buffering is one way to go, but I wouldn't be surprised if the copying between the buffers would eat up any performance gain. It wouldn't eat up any performance gain. memcpy()ing large blocks is typically very quick, and actually, the CUT is already double buffering. However, double buffering does eat up a lot of memory. I looked into double buffering with TurboVNC, in an attempt to figure out how to create a separate compress/send thread for each client and do flow control that way (the way VirtualGL does it.) Ultimately, I figured out that it was possible, but it would require maintaining an intermediary buffer for each client, and you can imagine that with the 4-megapixel sessions that TurboVNC users commonly use, if they try to collaborate with 5 people, suddenly they have a 100 MB VNC process on their hands. That said, perhaps we can consider treating this as a corner case and dial back the aggregation when we are hitting the CPU wall. Well, ultimately the purpose of aggregation/coalescence/whatever-you-want-to-call-it is increased performance, so I'm assuming there is data to show that 10 ms performs better than 1 ms under certain conditions? I have never seen it to have any effect other than a negative one, either on a WAN or a LAN. We really need to figure out a way to address these issues quantitatively, because I feel like I have a lot of data to show what does and doesn't work for 3D and video apps, but there is not the same data to show what does and doesn't work for Firefox or OpenOffice or whatever, nor a reproducible way to measure the effect of certain design decisions on such apps. I will also say that LAN usage is not a corner case, and if we treat it as such, it will become a self-fulfilling prophesy, because no one will want to use TigerVNC on a LAN if it's slow. -- Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ ___ Tigervnc-devel mailing list Tigervnc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tigervnc-devel
Re: [Tigervnc-devel] The deferred update timer
On Thu, 01 Dec 2011 18:01:32 -0600 DRC dcomman...@users.sourceforge.net wrote: When the deferred update timer behavior was recently overhauled such that it pushes out updates whenever the timer is triggered rather than waiting for an update request from the client, the default DUT value was also changed to 10 ms (from 1 ms.) Unfortunately, setting the DUT to 10 ms results in a dramatic decrease in peak performance on high-speed networks. The reason is that, when the timer is set, all X updates that arrive between that time and the time it is triggered are coalesced. As soon as the timer is triggered, a framebuffer update containing all of the coalesced X updates is sent immediately, then the server is tied up sending the update and cannot process any new X updates until the update is sent. Once the update is sent, then the first new X update starts the deferred update timer again. Effectively, what this means is that the frame rate is capped to 1 / (deferred update time + encoding time), and since the encoding time is typically about 20 ms for a 1280x1024 screen, setting the DUT to 10 ms caps the frame rate at about 30 Hz for such a screen, whereas previously it was near 50 Hz when the DUT was 1 ms. Annoying. I did some work at making the thing more asynchronous, but more might be needed. If the problem is getting the data on the wire rather than the actual encoding, then a quick fix is increasing the outgoing buffer size. As long as your entire update fits in there, then X won't be throttled (and the update timer should also be more precise). Rgds -- Pierre OssmanOpenSource-based Thin Client Technology System Developer Telephone: +46-13-21 46 00 Cendio ABWeb: http://www.cendio.com A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d___ Tigervnc-devel mailing list Tigervnc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tigervnc-devel
[Tigervnc-devel] The deferred update timer
When the deferred update timer behavior was recently overhauled such that it pushes out updates whenever the timer is triggered rather than waiting for an update request from the client, the default DUT value was also changed to 10 ms (from 1 ms.) Unfortunately, setting the DUT to 10 ms results in a dramatic decrease in peak performance on high-speed networks. The reason is that, when the timer is set, all X updates that arrive between that time and the time it is triggered are coalesced. As soon as the timer is triggered, a framebuffer update containing all of the coalesced X updates is sent immediately, then the server is tied up sending the update and cannot process any new X updates until the update is sent. Once the update is sent, then the first new X update starts the deferred update timer again. Effectively, what this means is that the frame rate is capped to 1 / (deferred update time + encoding time), and since the encoding time is typically about 20 ms for a 1280x1024 screen, setting the DUT to 10 ms caps the frame rate at about 30 Hz for such a screen, whereas previously it was near 50 Hz when the DUT was 1 ms. I'm not sure how best to address this, but it does represent a performance regression, since LAN performance has now decreased by 40% with default settings. DRC -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Tigervnc-devel mailing list Tigervnc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tigervnc-devel