Bug Tracker item #3305357, was opened at 2011-05-20 18:00 Message generated for change (Comment added) made by dcommander You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126848&aid=3305357&group_id=254363
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: UN*X version Group: 1.1.X >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Brian Hinz (bphinz) Assigned to: Adam Tkac (atkac) Summary: Enabling custom compression level on client crashes server Initial Comment: Enabling custom compression causes the server to crash with the following log message: Fri May 20 18:39:07 2011 VNCSConnST: Client pixel format depth 24 (32bpp) little-endian rgb888 Fri May 20 18:40:11 2011 Connections: closed: 10.1.1.20::42053 (ZlibOutStream: deflate failed) SMsgWriter: framebuffer updates 148 SMsgWriter: copyRect rects 43, bytes 688 SMsgWriter: Tight rects 592, bytes 311570 SMsgWriter: raw bytes equivalent 18822164, compression ratio 60.410707 Segmentation fault Tried from both java and Windows exe. Tried DRC's latest nightly build as well as r4428 (1_1 branch) built on RHEL4. ---------------------------------------------------------------------- >Comment By: D. R. Commander (dcommander) Date: 2011-08-09 15:44 Message: This proved to be a *very* difficult problem to solve. The solution is somewhat inelegant, but it works and has been thoroughly tested at the low levels. There were several factors at play: (1) The ZlibOutStream class doesn't work properly unless the underlying OutStream has enough space to hold the entire compressed buffer. That's because the underlying OutStream is not invoked via its writeBytes() method. Its pointers are manipulated directly. I am really leery of all of these wrapper classes, to be honest. They don't seem to be hurting performance, but it is really hard to follow what is going on and who's writing what where, etc. To fix it would require a lot of re-architecting, though. I copped out and simply increased the size of the temporary MemOutStream created by compressData() to match the worst case size needed to encode the largest possible subrectangle (the formula was borrowed from the TightVNC encoder.) (2) Zlib 1.2.4 and later behave differently from Zlib 1.2.3 and earlier. In both cases, the library will try to automatically call deflate() within the body of deflateParams() if the compression level has changed. Zlib 1.2.3 and earlier call deflate() with Z_PARTIAL_FLUSH, whereas Zlib 1.2.4 and later call deflate() with Z_BLOCK. The issue is that, after Zlib 1.2.4+ called deflate(..., Z_BLOCK), the subsequent call to deflate(..., Z_SYNC_FLUSH) in ZlibOutStream failed because there was nothing left in the buffer. So, I added checks to make sure that zs->avail_in was non-zero before calling deflate() in the body of the flush() and overrun() methods. This put an end to the encoder errors, but now the decoder was barfing. Z_BLOCK doesn't fully flush the stream, so it was apparently leaving some stray bytes around, and I couldn't figure out how to do a Z_SYNC_FLUSH after Z_BLOCK without causing Zlib to throw an error. Thus, I added an explicit Z_SYNC_FLUSH prior to the deflateParams() call. So far so good, except that this modification made ZlibOutStream break with Zlib 1.2.3, which is why I ultimately had to check for the Zlib version and run the Z_SYNC_FLUSH only if the version is 1.2.4 or later. <sigh> I'm sure there is a nicer way to go about this, but I really don't want to play anymore. Will spin new builds for testing purposes. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-08-04 10:32 Message: Please try the patch that I just uploaded. With it I have been able to go through all available compression levels using both the unix viewer from the 1_1 branch and the new java client with no crash. Xvnc was compiled and is running on RHEL4. I don't have access to the FLTK viewer on our intranet and my src code is a few weeks old, but I don't think anything else has changed that's relevant to this bug. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-08-03 11:03 Message: One thing I see that doesn't look right is that in common/rdr/tightEncode.h the variables "idxZlibLevel", "rawZlibLevel", and "monoZlibLevel" are used to pass the compression level to the ZlibOutStream, however I don't see where they are ever set or even initialized. They get declared in common/rdr/TightEncoder.h, but that seems to be it. Another potential issue might be that ZlibOutStream::setCompressionLevel() is essentially asynchronous to the actual change of the compression level. The call to deflateParams() only occurs in checkCompressionLevel(), which itself is only called in ZlibOutStream::flush() and ZlibOutStream::overrun(). tightEncode::compressData() calls: zos->setCompressionLevel zos->writeBytes zos->flush Shouldn't the call to deflateParams (possibly preceded by a Z_FULL_FLUSH or Z_SYNC_FLUSH to set the stream state per the zlib docs) happen in ZlibOutStream::setCompressionLevel()? ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-08-03 08:18 Message: One thing I see that doesn't look right is that in common/rdr/tightEncode.h the variables "idxZlibLevel", "rawZlibLevel", and "monoZlibLevel" are used to pass the compression level to the ZlibOutStream, however I don't see where they are ever set or even initialized. They get declared in common/rdr/TightEncoder.h, but that seems to be it. Another potential issue might be that ZlibOutStream::setCompressionLevel() is essentially asynchronous to the actual change of the compression level. The call to deflateParams() only occurs in checkCompressionLevel(), which itself is only called in ZlibOutStream::flush() and ZlibOutStream::overrun(). tightEncode::compressData() calls: zos->setCompressionLevel zos->writeBytes zos->flush Shouldn't the call to deflateParams (possibly preceded by a Z_FULL_FLUSH or Z_SYNC_FLUSH to set the stream state per the zlib docs) happen in ZlibOutStream::setCompressionLevel()? ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-08-03 06:01 Message: Can you test the unpatched version with the java client? I just found that when I use your 7/23 post-beta and my java client, I can't reproduce the crash. If the unpatched Xvnc works with both clients then I certainly have no objection to backing out the patch. In either case, I suspect that this is still unresolved though. ---------------------------------------------------------------------- Comment By: D. R. Commander (dcommander) Date: 2011-08-03 00:49 Message: I guess my main point is-- I think the original bug is somewhere other than in ZlibOutStream.cxx. The unpatched version of that class works fine in isolation. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-08-02 23:15 Message: Looking at that patch, I think that the flush parameter in the "else" block of checkCompressionLevel should be Z_NO_FLUSH rather than Z_SYNC_FLUSH. It doesn't seem like that alone should cause the server to bail out though. It's probably is degrading performance though. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-08-02 23:04 Message: Yes, I get essentially the same behavior. I'll keep poking around to see if I can make any headway with this. ---------------------------------------------------------------------- Comment By: D. R. Commander (dcommander) Date: 2011-08-02 21:43 Message: Let's focus on the 1.1 branch right now to avoid confusion. Do you still observe the crash using the 7/23 1.1 post-beta? When I use that build, I definitely do observe a crash when setting compress level=1-4, and the error message in the server's log is identical to the one that the encoder gives me when running at the low level. Nothing has changed in the 1.1 branch between 6/14 and 7/23 that would account for this. I also observe the crash in 6/14, but oddly, it is harder to reproduce in that build. 7/23 fails almost instantly, whereas I had to play with the 6/14 build for a while to make it fail. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-08-02 21:16 Message: OK, I was using one of your older (June 14) pre-release builds. With the latest pre-alpha it crashes even just going to 1. ---------------------------------------------------------------------- Comment By: D. R. Commander (dcommander) Date: 2011-08-02 20:59 Message: How are you building TigerVNC? It is definitely reproducible in my builds. ---------------------------------------------------------------------- Comment By: D. R. Commander (dcommander) Date: 2011-08-02 20:58 Message: It's a hidden option. 0 pipes the data through the Zlib compressor, which doesn't actually compress anything. However, 1 or any other number <= 4 also produces the error. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-08-02 20:57 Message: Correction, "-1" is the default, 0 is "no compression". So is there any reason to enable 0? I can't reproduce it by going between 1 and 4, it seems like it's specific to 0. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-08-02 20:51 Message: > The easiest way to repro is to set the custom level to 0, then back up to > 1. You might also try disabling JPEG compression before doing that, as it > seems to make it happen more readily. Should "0" be an option? I know it's actually the default, but the viewer dialog says "1= fast, 9=best". Perhaps like you say it's at a higher level and the server doesn't expect to receive anything outside the range 1-9? (I don't remember seeing anything like that). ---------------------------------------------------------------------- Comment By: D. R. Commander (dcommander) Date: 2011-08-02 20:24 Message: More information on this. In the process of mocking up the TigerVNC encoder at the lowest levels using the compare-encodings benchmark (which is used to model low-level encoder performance using captured VNC sessions), I observed that I would get an error in deflateParams() whenever setting the compression level to 4 or lower. Backing out the patch we made to attempt to fix this bug seems to make everything work fine at the low level. In short, the original ZlibOutStream implementation seems to be correct. Perhaps the bug is at a higher level of the program. ---------------------------------------------------------------------- Comment By: D. R. Commander (dcommander) Date: 2011-07-28 12:40 Message: Something else I noticed, at least in trunk, is that there still seems to be a dependency on libz.so.1 even though USE_INCLUDED_ZLIB=1. I'm investigating that. It may be that this is a conflict between the static and shared lib versions. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-07-28 10:08 Message: Are the in-tree zlib source files 1:1 copies of the upstream source? I see a note in r4026 that says "Remove unneeded parts of embedded zlib.", however this was prior to r4168 which upgraded the zlib version to 1.2.5 (but also says "Unneeded parts are removed"). Are you sure that there isn't a dependency being dropped? Maybe reaching at straws here... ---------------------------------------------------------------------- Comment By: D. R. Commander (dcommander) Date: 2011-07-23 15:35 Message: Re-opening. Unfortunately, I am still able to make it crash in the latest 1.1 pre-release build: http://www.virtualgl.org/DeveloperInfo/TigerVNCPreReleases It also crashes quite readily in the FLTK viewer, even though the same patch was applied to trunk. The easiest way to repro is to set the custom level to 0, then back up to 1. You might also try disabling JPEG compression before doing that, as it seems to make it happen more readily. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-06-17 08:21 Message: Seems good. No problems at all on RHEL4 for several days now, limited testing with RHEL5, but so far so good. I say go ahead and close it. ---------------------------------------------------------------------- Comment By: D. R. Commander (dcommander) Date: 2011-06-14 22:25 Message: Try the latest pre-release build at: http://www.virtualgl.org/DeveloperInfo/TigerVNCPreReleases Seems to be fixed as far as I can tell. If it works for you, I'll go ahead and close the issue. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-06-14 14:20 Message: Can you try applying the patch that I uploaded (rev2) and see if it fixes the issue? I've been chugging along on RHEL4 (x86_64) for about 4 hours now, periodically changing the compression level, and have not been able to reproduce the error. I was not previously linking against the static libraries, but this time I added "GNUTLS_FLAGS='/usr/lib64/libgnutls.a /usr/lib64/libgcrypt.a /usr/lib64/libgpg-error.a /usr/lib64/libgnutls-extra.a' --with-included-zlib" to 'build-xorg build' (the --with-included-zlib should be redundant because of '-static', but I left it there for good measure). I won't be able to test this on RHEL5 until later tonight, but it seems to me that the error is more reproducible on RHEL5 than RHEL4(?). FYI, the patch alone did not cure the issue for me, so if it does work, it seems to be due to some combination of the patch and the requirement to link against the static libraries... Thanks, -brian ---------------------------------------------------------------------- Comment By: D. R. Commander (dcommander) Date: 2011-06-14 13:16 Message: I don't think it will. I link against static everything, and I still get the error. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-06-14 09:06 Message: I'm still struggling to figure this out, but I wonder if it's related to which version of zlib we're linking against in the legacy build. I'm using the statically linked binaries produced by the build-xorg script, which links Xvnc against the system version of gnutls, and the in-tree version of zlib. However, the system version of gnutls already depend on the system version of zlib. The in-tree version of zlib appears to be 1.2.5, while the system version of zlib is 1.2.1 and 1.2.3 on RHEL4 and RHEL5 respectively. I'm going to try rebuilding and linking everything against the static versions of gnutls, libgcrypt, and libgpg-error along with the in-tree zlib and see if that helps. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-06-02 18:53 Message: Don't commit it yet, there's still something wrong... Setting the compression level to 1 still crashes the server. ---------------------------------------------------------------------- Comment By: D. R. Commander (dcommander) Date: 2011-06-02 13:00 Message: Seems OK to me. I'd like to hear from Adam before committing it. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-05-28 09:15 Message: Sorry, SYNC_FLUSH does seem to work. FULL_FLUSH causes a segfault when the client chooses compression level 1. Attaching new patch. ---------------------------------------------------------------------- Comment By: Brian Hinz (bphinz) Date: 2011-05-22 14:47 Message: Can someone review the attached patch? It seems to resolve the issue, but to be honest I don't know much about compression. The libz spec says the following: <snip> Applications should ensure that the stream is flushed, e.g. by a call to deflate(stream, Z_SYNC_FLUSH) before calling deflateParams(), or ensure that there is sufficient space in next_out (as identified by avail_out) to ensure that all pending output and all uncompressed input can be flushed in a single call to deflate(). Rationale: Although the deflateParams() function should flush pending output and compress all pending input, the result is unspecified if there is insufficient space in the output buffer. Applications should only call deflateParams() when the stream is effectively empty (flushed). </snip> So it seems like the Z_FULL_FLUSH is not necessary, however a Z_SYNC_FLUSH didn't work. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126848&aid=3305357&group_id=254363 ------------------------------------------------------------------------------ uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev _______________________________________________ Tigervnc-devel mailing list Tigervnc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tigervnc-devel