Folks -
I've been implementing reverse-connection functionality in the Mac OS X server and viewer projects:
http://www.ozonehouse.com/mark/blog/code/Reverse_VNC.html
A problem has been found when using the OSXvnc server in this new reverse-connection mode with the VNC 4.0 viewers if "auto select" is enabled. After much hair pulling I uncovered that there is a race condition in the VNC protocol if the viewer changes the pixel format between two request messages. See below for my full analysis.
Turns out this problem was discussed on the list long ago, in February of 1999:
http://www.realvnc.com/pipermail/vnc-list/1999-February/004950.html
Alas, the only solution from that discussion was "it doesn't happen that often" and some vague ideas for protocol enhancement that clearly didn't happen.
It also turns out to be the same problem as the often reported, but never full diagnosed problem that people have with getting ZRLE "failed assertion" errors when using "auto select" mode. See, for example:
http://www.realvnc.com/pipermail/vnc-list/2002-September/033671.html
http://www.realvnc.com/pipermail/vnc-list/2002-September/033685.html
http://www.realvnc.com/pipermail/vnc-list/2002-October/033930.html
http://www.realvnc.com/pipermail/vnc-list/2003-October/041260.html (probably)
http://www.realvnc.com/pipermail/vnc-list/2003-August/040507.html (probably)
What is the current wisdom for how to implement around this problem in the protocol? Telling users to turn off auto select isn't a very attractive solution!
- Mark
Detailed Analysis:
------------------
There is a race condition in the VNC protocol that can lead to viewer/server pairs that won't run:
In "The RFB Protocol" (version 3.8, http://www.realvnc.com/docs/rfbproto.pdf), the last paragraph of section 2 states:
"... an update is only sent from the server to the client
in response to an explicit request from the client"
Section 6.3.4 states:
"Note however that a single FramebufferUpdate may be
sent in reply to several FramebufferUpdateRequests."and then:
"Note that there may be an indefinite period between
the FramebufferUpdateRequest and the FramebufferUpdate."Finally, Section 6.4.1 states:
"[FramebufferUpdate] is sent in response to a
FramebufferUpdateRequest from the client."VNC protocol is built on top of TCP which has no implicit synchronization between sending and receiving streams. If a viewer sends two requests and then receives one update, the viewer cannot know if the update covers the two requests or only the first one. [In the first case the update was sent by the server after it received the second request. In the second case the update was sent by the server after it received the first request but before it received the second, yet it was only received by the viewer after the viewer sent the second request. TCP exhibits special relativity!]
Given the nature of VNC, this seems like it shouldn't be an issue: On the face of it, neither the viewer nor the server care about the pairing of request and update messages. Indeed, examining the source to VNC 4.0 (from vnc-4.0-unixsrc), it is clear that the server and client are written this way: While the server generally only sends updates in response to requests, it also will send updates on its own from time to time. (See various callers to tryUpdate() and writeFramebufferUpdate()). Similarly, the viewer doesn't track which request an update is paired with, and simply processes any and all updates it gets.
This would work except that there is a piece of hidden state between viewer and server: The pixel format. The 4.0 viewer implements an "auto select" feature that changes the pixel format and encoding in response to network speeds. This means that the pixel format can change during a session between viewer and server. While the protocol has always allowed this, the 3.3 viewer never did it.
Consider now that the viewer first sends a request, and then sometime later sends pixel format change followed by another request. When the next update is received, the viewer has no way of knowing if this update from the server covers only the first request, and hence will be in the old pixel format, or covers both requests, and therefore will be in the new pixel format.
The current viewer will assume that it is in the new pixel format. Alas, if the bytes per pixel changed, this will cause decoding errors if the update was in the old format, and this leads shutting down of the connection (at best -- or buffer overrun errors at worst if the code isn't careful.)
Indeed, I can actually generate this very case reliably between the OSXvnc server and VNC 4.0 unix viewer. In this case the laxity of the server contributes: The server sends a non-solicited update just before it receives a pixel format change and new request. The update message and the format change and new request messages pass each other on the wire. When the viewer processes the update, it assumes it is in response to the request that followed the format change. Since this involves a bytes-per-pixel change, all the computed sizes in the message are off and decoding errors occur, and the connection shuts down. Of course, even if both sides followed the spec, there would still be no way for the viewer to know which format that update should be in.
Since the auto select feature kicks in very early in a session - if this issue occurs, it does so almost as soon as the first whole screen update is finished, making this combination of server and viewer unusable. Obviously timing plays an issues here: This was only found while implementing the reverse-connect feature. Disabling that, or leaving it in and running under a debugger are enough to change the timing just enough to keep it working. But when it fails, it fails repeatedly.
Mark Lentczner http://www.ozonehouse.com/mark/ [EMAIL PROTECTED] _______________________________________________ VNC-List mailing list [EMAIL PROTECTED] To remove yourself from the list visit: http://www.realvnc.com/mailman/listinfo/vnc-list
