Bug#423141: more questions about #423141

2007-08-14 Thread Bernhard R. Link
I'm CCing ratpoison's mailinglist, so people there can look at this, too.
For those on the list: the history of this thread can be found at
http://bugs.debian.org/423141

* H. S. Teoh [EMAIL PROTECTED] [070814 06:13]:
 Now that I know where the bug is, I've managed to reproduce it with a
 debug build of ratpoison. Using gdb backtrace, I've located the
 problematic code: the xvsprintf() function in main.c. Towards the end of
 this function, there is this bit of code:
 
 -
   nchars = vsnprintf (buffer, size, fmt, ap_copy);
 #if defined(va_copy) || defined(__va_copy)
   va_end (ap_copy);
 #endif
 
   if (nchars  -1  nchars  size)
 return buffer;
   else if (nchars  -1)
 size = nchars + 1;
   else
 size *= 2;
 -
 
 The last else clause is blatantly wrong: if vsnprintf() returns a
 negative value (according to the manpage, this occurs if there is an
 output error), then this code will double the size and try to format it
 again.  Since the problem is not caused by the buffer size, but by an
 output error (which I suppose is caused by the presence of characters in
 the output which for some reason is in the wrong encoding), doubling the
 buffer size solves nothing at all. So it just loops until size is larger
 than the amount of available RAM, and then the allocator calls abort()
 and dies.

I'm guessing the reason for this is history (from vsprintf manpage):
| NOTES
|   The glibc implementation of the functions snprintf() and vsnprintf()
|   conforms to the C99 standard, i.e., behaves as described above, since
|   glibc version 2.1. Until glibc 2.0.6 they would return -1 when the output
|   was truncated.

 I'd like to confirm my analysis with an actual code fix, but I'm not
 sure exactly what to do if vsnprintf() returns a negative value. The
 only clean way I know of is to use wide-character formatting functions,
 but it's non-trivial to do this with this code, and I'm not even sure if
 ratpoison is setting the locale properly in the first place. Maybe when
 I get more time I'll take another look at this.

I guess there are three issues here:

1) I'd guess it is better to saveguard this and add some maximum number
   of characters to test for (at least with negative return values).
   This would help against other possible future problems causing -1
   returns.

2) It might be some problem with giving invalid utf-8 characters into
   this function. While searching for the reason of this bug I found
   some problems when both ratpoison and the clients use a UTF-8
   locale (though that causes only misdisplays for me).
   It's in cvs since July 19th. (I also though I uploaded a Debian
   package with a backport of that patch, but I seem to have forgotten
   it).

3) finally vsnprinf might indeed the wrong function for that. I've not
   yet looked into that. But looking at the manpage I'd guess that that
   should only generate displaying problems (as the number means bytes
   and not characters). I guess this needs someone knowing this issues
   to look into it.

 In the meantime, I hope this info may help you find a fix.

Thanks a lot for tracking this down.

Hochachtungsvoll,
Bernhard R. Link


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#423141: more questions about #423141

2007-08-14 Thread Bernhard R. Link
package ratpoison
tags 423141 = pending
thanks

* H. S. Teoh [EMAIL PROTECTED] [070814 06:13]:
 In the meantime, I hope this info may help you find a fix.

Thanks again for following up, now I can reproduce it and
the patch[1] to fix utf-8 window characters with ratpoison running
in utf-8 locale does indeed removed the provocaton for this bug.

(I'm still wondering why I was not able to reproduce it earlier.
Perhaps when I tried the exact same config with the exact same
version of the libraries and all possible locale settings, I must
have somehow mixed up something up. (Perhaps some filename for the
config was wrong when restarting in unicode locale, or I was just
too fixated on the font and forget to repeat to set the format when
testing in utf locale. Well, at least it is found...).)

I'll upload a new Debian package with that fix and something against
other causes to make sprintf fail soon.

Hochachtungsvoll,
Bernhard R. Link

[1] 
http://lists.nongnu.org/archive/html/ratpoison-devel/2007-07/txtmZVayY63PB.txt


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#423141: more questions about #423141

2007-08-13 Thread H. S. Teoh
Hi, sorry for taking so long to reply. Things just kept coming up.


On Wed, Jul 11, 2007 at 11:06:16AM +0200, Bernhard R. Link wrote:
[...]
  That may need some days before I can tell more. (Or have ideas for
  new testcases...)
  * H. S. Teoh [EMAIL PROTECTED] [070701 15:40]:
   Yep, it crashes. Although, the way the program is written is rather
   tricky to test, since it exits on keystrokes, so I have to hit C-t w
   before starting it. :-)
 
 Attached is a new version of the test program, that should be less
 sesnsitive to key events.
 
 Could you test if it also shows the behaviour of aborting when showing
 the window list after program started? (and not while the program is
 running).

Ratpoison crashes both when I start the program then hit C-t w, and when
I hit C-t w then run the program while the window list is still visible.


 There is a loop to 1000 near the end of the program. Does it also
 happen if that loop is removed?

Yes, still happens.


 Does it also happen without the font or the winfmt set?
 (try starting ratpoison with -f /dev/null to make it omit your
 .ratpoisonrc). Both the test program and opera would be intresting.

Actually, I think I found the real bug!! My .ratpoisonrc looks like:

set winfmt %n%s%80t
set wingravity center
set font -misc-fixed-medium-r-normal-*-18-*-*-*-*-*-iso10646-1

After commenting out the winfmt line, ratpoison doesn't crash anymore
with the test program, or with Opera. I see that the special symbols
don't show up in the window list (e.g., on the Intel page, the
registered trademark symbol is omitted from the window title). Looks
like a pointer bug in format_string() (src/format.c) perhaps?


 What locale is your ratpoison running with?

en_US.UTF-8


I'll try to run through the rest of your suggestions later, but I think
we've found the culprit. I'll take a peek at format_string() to see if
there's anything obviously wrong with it.


T

-- 
Error: Keyboard not attached. Press F1 to continue. -- Yoon Ha Lee, CONLANG


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#423141: more questions about #423141

2007-08-13 Thread H. S. Teoh
Hi,

Now that I know where the bug is, I've managed to reproduce it with a
debug build of ratpoison. Using gdb backtrace, I've located the
problematic code: the xvsprintf() function in main.c. Towards the end of
this function, there is this bit of code:

-
  nchars = vsnprintf (buffer, size, fmt, ap_copy);
#if defined(va_copy) || defined(__va_copy)
  va_end (ap_copy);
#endif

  if (nchars  -1  nchars  size)
return buffer;
  else if (nchars  -1)
size = nchars + 1;
  else
size *= 2;
-

The last else clause is blatantly wrong: if vsnprintf() returns a
negative value (according to the manpage, this occurs if there is an
output error), then this code will double the size and try to format it
again.  Since the problem is not caused by the buffer size, but by an
output error (which I suppose is caused by the presence of characters in
the output which for some reason is in the wrong encoding), doubling the
buffer size solves nothing at all. So it just loops until size is larger
than the amount of available RAM, and then the allocator calls abort()
and dies.

I'd like to confirm my analysis with an actual code fix, but I'm not
sure exactly what to do if vsnprintf() returns a negative value. The
only clean way I know of is to use wide-character formatting functions,
but it's non-trivial to do this with this code, and I'm not even sure if
ratpoison is setting the locale properly in the first place. Maybe when
I get more time I'll take another look at this.

In the meantime, I hope this info may help you find a fix.


T

-- 
Why are you blatanly misspelling blatant? -- Branden Robinson


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#423141: more questions about #423141

2007-07-11 Thread Bernhard R. Link
* Bernhard R. Link [EMAIL PROTECTED] [070701 15:55]:
 Thanks for testing this. Now I 'only' have to find out why it crashes for
 you but not for me. (Though I have not yet tested with your fonts or
 with a real X server (opposed to Xephyr) or the same libs (except
 libx11)).

After learning a lot about how Xlib caches its events, extending xtrace
to tell me more about some obscure extensions xlib is using and finding
(most likely totally) unrelated bugs in ratpoison while searching for
this one, and trying to reproduce with all things the same version I'm
still lost. So here some more tests and questions, though I have to
admit they are mostly guessing into the dark...

 That may need some days before I can tell more. (Or have ideas for new 
 testcases...)
 * H. S. Teoh [EMAIL PROTECTED] [070701 15:40]:
  Yep, it crashes. Although, the way the program is written is rather
  tricky to test, since it exits on keystrokes, so I have to hit C-t w
  before starting it. :-)

Attached is a new version of the test program, that should be less
sesnsitive to key events.

Could you test if it also shows the behaviour of aborting when showing
the window list after program started? (and not while the program is
running).

There is a loop to 1000 near the end of the program. Does it also happen
if that loop is removed?

Does it also happen without the font or the winfmt set?
(try starting ratpoison with -f /dev/null to make it omit your
.ratpoisonrc). Both the test program and opera would be intresting.

What locale is your ratpoison running with? Does the bug still show up,
if ratpoison is running in en_US (without .UTF-8) locale?
(You might need to generate that first to make sure it can be used, if
it is not in /etc/locale.gen yet).

Is a locally build ratpoison still unable to reproduce the problem?
(if it is now with more knowledge what triggers it, a backtrace with
 debug symbols would be intresting).

Even without debug symbols for ratpoison, does the backtrace give any
further hints where this happens? (well, there is a good chance that
even if this place is located, it might just be something else wanting
memory after the faulty part ate all of it, but at least there is a
chance it is the buggy part).
Installing libx11-dbg might give more hints if xlib calls are involved.

Could you try running ratpoison in valgrind --tool=massif and send me
the .txt files it generates? (Best one time for opera and one time for
the xfakewindow program)

Thanks in advance,
Bernhard R. Link
#include assert.h
#include stdio.h
#include stdbool.h
#include stdlib.h
#include unistd.h

#include X11/X.h
#include X11/Xatom.h
#include X11/Xutil.h

int main(int argc, char *argv[]) {
	Display *display;
	int screen;
	Visual *visual;
	Window w;
	XSetWindowAttributes attributes = {
		.background_pixel= 0,
		.event_mask = KeyPressMask,
	};
	Status s;
	char *names[] = { Intel\302\256 Motherboards - Opera};
	XTextProperty text,tt;
	XEvent event;
	struct {
		const char *name;
		XICCEncodingStyle style;
		char *list[2]; int nlist;
		Atom atom;
	} properties[] = {
		{WM_CLIENT_MACHINE, XStringStyle, {crystal}, 1},
		{WM_LOCALE_NAME, XStringStyle, {en_US.UTF-8},1},
		{WM_WINDOW_ROLE, XStringStyle, {opera-mainwindow#1},1},
		{OPERA_VERSION, XStringStyle, {Opera 9.21},1},
		{OPERA_USER, XStringStyle, {1000},1},
		{OPERA_PREFERENCE, XStringStyle, {/home/hsteoh/.opera},1},
		{_NET_WM_NAME, XUTF8StringStyle,
			{Intel\302\256 Motherboards - Opera},
		1}
	};
	XClassHint classhint = {opera, Opera};
	Atom _NET_WM_PID;
	long pid = getpid();
	int i;

	display = XOpenDisplay(getenv(DISPLAY));
	if( display == NULL )
		return 1;
	screen = DefaultScreen(display);
	visual = DefaultVisual(display,screen);

	for( i = 0 ; i  sizeof(properties)/sizeof(properties[0]) ; i++ ) {
		properties[i].atom = XInternAtom(display, properties[i].name, False);
	}
	_NET_WM_PID = XInternAtom(display, _NET_WM_PID, False);

	w = XCreateWindow(display, RootWindow(display,screen), 0, 0, 100, 100, 1,
			DefaultDepth(display,screen), InputOutput,
			visual, CWBackPixel|CWEventMask, attributes);
	s = Xutf8TextListToTextProperty(display, names, 1, XStringStyle, text);
	XSetClassHint(display, w, classhint);
	XSetWMName(display, w, text);
	for( i = 0 ; i  sizeof(properties)/sizeof(properties[0]) ; i++ ) {
		properties[i].atom = XInternAtom(display, properties[i].name, False);
		s = Xutf8TextListToTextProperty(display,
properties[i].list,
properties[i].nlist,
properties[i].style, tt);
		assert( s == 0 );
		XSetTextProperty(display, w, tt, properties[i].atom);
	}
	XChangeProperty(display, w, _NET_WM_PID, XA_CARDINAL, 32,
			PropModeReplace, (unsigned char*)pid, 1);
	XMapWindow(display, w);
	XSync(display, False);
	for( i = 0 ; i  1000 ; i++ ) {
		XSetWMName(display, w, text);
		XSync(display, False);
	}
	while( 1 ) {
		XNextEvent(display,event);
		if( event.type == KeyPress  event.xkey.keycode == 24)
			break;
	}
	XUnmapWindow(display, w);
	XDestroyWindow(display, w);