Bug#423141: more questions about #423141
I'm CCing ratpoison's mailinglist, so people there can look at this, too. For those on the list: the history of this thread can be found at http://bugs.debian.org/423141 * H. S. Teoh [EMAIL PROTECTED] [070814 06:13]: Now that I know where the bug is, I've managed to reproduce it with a debug build of ratpoison. Using gdb backtrace, I've located the problematic code: the xvsprintf() function in main.c. Towards the end of this function, there is this bit of code: - nchars = vsnprintf (buffer, size, fmt, ap_copy); #if defined(va_copy) || defined(__va_copy) va_end (ap_copy); #endif if (nchars -1 nchars size) return buffer; else if (nchars -1) size = nchars + 1; else size *= 2; - The last else clause is blatantly wrong: if vsnprintf() returns a negative value (according to the manpage, this occurs if there is an output error), then this code will double the size and try to format it again. Since the problem is not caused by the buffer size, but by an output error (which I suppose is caused by the presence of characters in the output which for some reason is in the wrong encoding), doubling the buffer size solves nothing at all. So it just loops until size is larger than the amount of available RAM, and then the allocator calls abort() and dies. I'm guessing the reason for this is history (from vsprintf manpage): | NOTES | The glibc implementation of the functions snprintf() and vsnprintf() | conforms to the C99 standard, i.e., behaves as described above, since | glibc version 2.1. Until glibc 2.0.6 they would return -1 when the output | was truncated. I'd like to confirm my analysis with an actual code fix, but I'm not sure exactly what to do if vsnprintf() returns a negative value. The only clean way I know of is to use wide-character formatting functions, but it's non-trivial to do this with this code, and I'm not even sure if ratpoison is setting the locale properly in the first place. Maybe when I get more time I'll take another look at this. I guess there are three issues here: 1) I'd guess it is better to saveguard this and add some maximum number of characters to test for (at least with negative return values). This would help against other possible future problems causing -1 returns. 2) It might be some problem with giving invalid utf-8 characters into this function. While searching for the reason of this bug I found some problems when both ratpoison and the clients use a UTF-8 locale (though that causes only misdisplays for me). It's in cvs since July 19th. (I also though I uploaded a Debian package with a backport of that patch, but I seem to have forgotten it). 3) finally vsnprinf might indeed the wrong function for that. I've not yet looked into that. But looking at the manpage I'd guess that that should only generate displaying problems (as the number means bytes and not characters). I guess this needs someone knowing this issues to look into it. In the meantime, I hope this info may help you find a fix. Thanks a lot for tracking this down. Hochachtungsvoll, Bernhard R. Link -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#423141: more questions about #423141
package ratpoison tags 423141 = pending thanks * H. S. Teoh [EMAIL PROTECTED] [070814 06:13]: In the meantime, I hope this info may help you find a fix. Thanks again for following up, now I can reproduce it and the patch[1] to fix utf-8 window characters with ratpoison running in utf-8 locale does indeed removed the provocaton for this bug. (I'm still wondering why I was not able to reproduce it earlier. Perhaps when I tried the exact same config with the exact same version of the libraries and all possible locale settings, I must have somehow mixed up something up. (Perhaps some filename for the config was wrong when restarting in unicode locale, or I was just too fixated on the font and forget to repeat to set the format when testing in utf locale. Well, at least it is found...).) I'll upload a new Debian package with that fix and something against other causes to make sprintf fail soon. Hochachtungsvoll, Bernhard R. Link [1] http://lists.nongnu.org/archive/html/ratpoison-devel/2007-07/txtmZVayY63PB.txt -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#423141: more questions about #423141
Hi, sorry for taking so long to reply. Things just kept coming up. On Wed, Jul 11, 2007 at 11:06:16AM +0200, Bernhard R. Link wrote: [...] That may need some days before I can tell more. (Or have ideas for new testcases...) * H. S. Teoh [EMAIL PROTECTED] [070701 15:40]: Yep, it crashes. Although, the way the program is written is rather tricky to test, since it exits on keystrokes, so I have to hit C-t w before starting it. :-) Attached is a new version of the test program, that should be less sesnsitive to key events. Could you test if it also shows the behaviour of aborting when showing the window list after program started? (and not while the program is running). Ratpoison crashes both when I start the program then hit C-t w, and when I hit C-t w then run the program while the window list is still visible. There is a loop to 1000 near the end of the program. Does it also happen if that loop is removed? Yes, still happens. Does it also happen without the font or the winfmt set? (try starting ratpoison with -f /dev/null to make it omit your .ratpoisonrc). Both the test program and opera would be intresting. Actually, I think I found the real bug!! My .ratpoisonrc looks like: set winfmt %n%s%80t set wingravity center set font -misc-fixed-medium-r-normal-*-18-*-*-*-*-*-iso10646-1 After commenting out the winfmt line, ratpoison doesn't crash anymore with the test program, or with Opera. I see that the special symbols don't show up in the window list (e.g., on the Intel page, the registered trademark symbol is omitted from the window title). Looks like a pointer bug in format_string() (src/format.c) perhaps? What locale is your ratpoison running with? en_US.UTF-8 I'll try to run through the rest of your suggestions later, but I think we've found the culprit. I'll take a peek at format_string() to see if there's anything obviously wrong with it. T -- Error: Keyboard not attached. Press F1 to continue. -- Yoon Ha Lee, CONLANG -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#423141: more questions about #423141
Hi, Now that I know where the bug is, I've managed to reproduce it with a debug build of ratpoison. Using gdb backtrace, I've located the problematic code: the xvsprintf() function in main.c. Towards the end of this function, there is this bit of code: - nchars = vsnprintf (buffer, size, fmt, ap_copy); #if defined(va_copy) || defined(__va_copy) va_end (ap_copy); #endif if (nchars -1 nchars size) return buffer; else if (nchars -1) size = nchars + 1; else size *= 2; - The last else clause is blatantly wrong: if vsnprintf() returns a negative value (according to the manpage, this occurs if there is an output error), then this code will double the size and try to format it again. Since the problem is not caused by the buffer size, but by an output error (which I suppose is caused by the presence of characters in the output which for some reason is in the wrong encoding), doubling the buffer size solves nothing at all. So it just loops until size is larger than the amount of available RAM, and then the allocator calls abort() and dies. I'd like to confirm my analysis with an actual code fix, but I'm not sure exactly what to do if vsnprintf() returns a negative value. The only clean way I know of is to use wide-character formatting functions, but it's non-trivial to do this with this code, and I'm not even sure if ratpoison is setting the locale properly in the first place. Maybe when I get more time I'll take another look at this. In the meantime, I hope this info may help you find a fix. T -- Why are you blatanly misspelling blatant? -- Branden Robinson -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#423141: more questions about #423141
* Bernhard R. Link [EMAIL PROTECTED] [070701 15:55]: Thanks for testing this. Now I 'only' have to find out why it crashes for you but not for me. (Though I have not yet tested with your fonts or with a real X server (opposed to Xephyr) or the same libs (except libx11)). After learning a lot about how Xlib caches its events, extending xtrace to tell me more about some obscure extensions xlib is using and finding (most likely totally) unrelated bugs in ratpoison while searching for this one, and trying to reproduce with all things the same version I'm still lost. So here some more tests and questions, though I have to admit they are mostly guessing into the dark... That may need some days before I can tell more. (Or have ideas for new testcases...) * H. S. Teoh [EMAIL PROTECTED] [070701 15:40]: Yep, it crashes. Although, the way the program is written is rather tricky to test, since it exits on keystrokes, so I have to hit C-t w before starting it. :-) Attached is a new version of the test program, that should be less sesnsitive to key events. Could you test if it also shows the behaviour of aborting when showing the window list after program started? (and not while the program is running). There is a loop to 1000 near the end of the program. Does it also happen if that loop is removed? Does it also happen without the font or the winfmt set? (try starting ratpoison with -f /dev/null to make it omit your .ratpoisonrc). Both the test program and opera would be intresting. What locale is your ratpoison running with? Does the bug still show up, if ratpoison is running in en_US (without .UTF-8) locale? (You might need to generate that first to make sure it can be used, if it is not in /etc/locale.gen yet). Is a locally build ratpoison still unable to reproduce the problem? (if it is now with more knowledge what triggers it, a backtrace with debug symbols would be intresting). Even without debug symbols for ratpoison, does the backtrace give any further hints where this happens? (well, there is a good chance that even if this place is located, it might just be something else wanting memory after the faulty part ate all of it, but at least there is a chance it is the buggy part). Installing libx11-dbg might give more hints if xlib calls are involved. Could you try running ratpoison in valgrind --tool=massif and send me the .txt files it generates? (Best one time for opera and one time for the xfakewindow program) Thanks in advance, Bernhard R. Link #include assert.h #include stdio.h #include stdbool.h #include stdlib.h #include unistd.h #include X11/X.h #include X11/Xatom.h #include X11/Xutil.h int main(int argc, char *argv[]) { Display *display; int screen; Visual *visual; Window w; XSetWindowAttributes attributes = { .background_pixel= 0, .event_mask = KeyPressMask, }; Status s; char *names[] = { Intel\302\256 Motherboards - Opera}; XTextProperty text,tt; XEvent event; struct { const char *name; XICCEncodingStyle style; char *list[2]; int nlist; Atom atom; } properties[] = { {WM_CLIENT_MACHINE, XStringStyle, {crystal}, 1}, {WM_LOCALE_NAME, XStringStyle, {en_US.UTF-8},1}, {WM_WINDOW_ROLE, XStringStyle, {opera-mainwindow#1},1}, {OPERA_VERSION, XStringStyle, {Opera 9.21},1}, {OPERA_USER, XStringStyle, {1000},1}, {OPERA_PREFERENCE, XStringStyle, {/home/hsteoh/.opera},1}, {_NET_WM_NAME, XUTF8StringStyle, {Intel\302\256 Motherboards - Opera}, 1} }; XClassHint classhint = {opera, Opera}; Atom _NET_WM_PID; long pid = getpid(); int i; display = XOpenDisplay(getenv(DISPLAY)); if( display == NULL ) return 1; screen = DefaultScreen(display); visual = DefaultVisual(display,screen); for( i = 0 ; i sizeof(properties)/sizeof(properties[0]) ; i++ ) { properties[i].atom = XInternAtom(display, properties[i].name, False); } _NET_WM_PID = XInternAtom(display, _NET_WM_PID, False); w = XCreateWindow(display, RootWindow(display,screen), 0, 0, 100, 100, 1, DefaultDepth(display,screen), InputOutput, visual, CWBackPixel|CWEventMask, attributes); s = Xutf8TextListToTextProperty(display, names, 1, XStringStyle, text); XSetClassHint(display, w, classhint); XSetWMName(display, w, text); for( i = 0 ; i sizeof(properties)/sizeof(properties[0]) ; i++ ) { properties[i].atom = XInternAtom(display, properties[i].name, False); s = Xutf8TextListToTextProperty(display, properties[i].list, properties[i].nlist, properties[i].style, tt); assert( s == 0 ); XSetTextProperty(display, w, tt, properties[i].atom); } XChangeProperty(display, w, _NET_WM_PID, XA_CARDINAL, 32, PropModeReplace, (unsigned char*)pid, 1); XMapWindow(display, w); XSync(display, False); for( i = 0 ; i 1000 ; i++ ) { XSetWMName(display, w, text); XSync(display, False); } while( 1 ) { XNextEvent(display,event); if( event.type == KeyPress event.xkey.keycode == 24) break; } XUnmapWindow(display, w); XDestroyWindow(display, w);