Re: ABI is broken??
-On [20001102 05:30], Garrett Wollman ([EMAIL PROTECTED]) wrote: On Wed, 1 Nov 2000 14:43:55 -0800, "David O'Brien" [EMAIL PROTECTED] said: Any reason to not get [libc ABI changes] in -current now and make the bump? Mostly because they're too small to be worth the pain. I'm waiting for something more significant that I can piggy-back on. Which of course has the implicit risk that if something big doesn't show up these fixes will be added only at the nearing of 5.0-RELEASE and thus with less shake-down time. I also gather it has to do with the Austin project Garrett? -- Jeroen Ruigrok van der Werven Network- and systemadministrator [EMAIL PROTECTED]VIA Net.Works The Netherlands BSD: Technical excellence at its best http://www.via-net-works.nl In my mind nothing makes sense... To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
On Sun, 5 Nov 2000 13:51:09 +0100, Jeroen Ruigrok van der Werven [EMAIL PROTECTED] said: I also gather it has to do with the Austin project Garrett? Yes and no. The errors have been there since the beginning of time, but I actually noticed them doing review of our implementation wrt the new standard. For example, the System V IPC implementation uses a data structure bogusly copied bitwise from SVR3, which only had 16-bit [ug]id_t's. The patch I sent out before BSDcon contains a number of things noted in this regard. The other thing that we need to do is to hide the DB 1.85 library that's currently in libc, so that it doesn't prevent third-party applications from making full use of DB 3.x (when installed). This will involve renaming all of the public identifiers in the DB library, and converting the libc code to use the internal names. -GAWollman -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same [EMAIL PROTECTED] | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
John Polstra wrote: In article [EMAIL PROTECTED], Maxim Sobolev [EMAIL PROTECTED] wrote: John Polstra wrote: Overall I would lean toward putting the hack into pthread_mutex_lock. Comments? Huh, why we can't just bump libc_r version number and put older (buggy) version into lib/compat as usually? This would not require any ugly hacks at all. The bug wasn't in libc_r -- it was in libgcc_r. That's a static library, so it doesn't have a version number. And it is statically linked into old executables. Nothing we do to libgcc_r will help old executables, because they won't even use the new libgcc_r. Nope it should help, because the bug is triggered if someone tries to use old executables with new libc_r. -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
In article [EMAIL PROTECTED], Maxim Sobolev [EMAIL PROTECTED] wrote: John Polstra wrote: The bug wasn't in libc_r -- it was in libgcc_r. That's a static library, so it doesn't have a version number. And it is statically linked into old executables. Nothing we do to libgcc_r will help old executables, because they won't even use the new libgcc_r. Nope it should help, because the bug is triggered if someone tries to use old executables with new libc_r. Yes, I think you're right after all. But since I've already worked around the problem in libc_r, there's no need to do anything else at this point. John To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
In article [EMAIL PROTECTED], Max Khon [EMAIL PROTECTED] wrote: do we still need uthread_autoinit.cc? It still might be needed by old executables. Anyway I don't see a good reason to get rid of it. John To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
ABI is broken??
Hi, I'm not sure what exactly caused this behaviour (I can guess two potential victims: O'Brien's changes in crt stuff and recent Polstra's changes in libgcc_r), but it seems that some programs built on the previous -current from 27 October immediately segfault when I'm trying to run then on system installed from today's sources. The segfault disappeared when I recompiled affected program. With this message I'm attaching short backtrace. -Maxim root@notebook# galeon GNU gdb 4.18 This GDB was configured as "i386-unknown-freebsd"... (no debugging symbols found)... (gdb) r Starting program: /usr/X11R6/bin/galeon-bin (no debugging symbols found)... [...] (no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 (gdb) bt #0 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 #1 0x806e782 in __register_frame_info () #2 0x287a3137 in _init () from /usr/lib/libc_r.so.4 #3 0x2879ffe5 in _init () from /usr/lib/libc_r.so.4 #4 0x280797fd in _rtld () from /usr/libexec/ld-elf.so.1 (gdb) q To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
On Wed, Nov 01, 2000 at 19:17:26 +0200, Maxim Sobolev wrote: Hi, I'm not sure what exactly caused this behaviour (I can guess two potential victims: O'Brien's changes in crt stuff and recent Polstra's changes in libgcc_r), but it seems that some programs built on the previous -current from 27 October immediately segfault when I'm trying to run then on system installed from today's sources. The segfault disappeared when I recompiled affected program. With this message I'm attaching short backtrace. -Maxim #0 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 Same for me in -stable (4.2-BETA) and python-1.6. After rebuilding the port this disappeared. My gdb showed the same error message as the quoted above. Regards. -- Udo Schweigert, Siemens AG | Voice : +49 89 636 42170 ZT IK 3, Siemens CERT| Fax: +49 89 636 41166 D-81730 Muenchen / Germany | email : [EMAIL PROTECTED] PGP-2/5 fingerprint | D8 A5 DF 34 EC 87 E8 C6 E2 26 C4 D0 EE 80 36 B2 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
On Wed, 1 Nov 2000, John Polstra wrote: In article [EMAIL PROTECTED], Maxim Sobolev [EMAIL PROTECTED] wrote: I'm not sure what exactly caused this behaviour (I can guess two potential victims: O'Brien's changes in crt stuff and recent Polstra's changes in libgcc_r), but it seems that some programs built on the previous -current from 27 October immediately segfault when I'm trying to run then on system installed from today's sources. The segfault disappeared when I recompiled affected program. With this message I'm attaching short backtrace. [...] Program received signal SIGSEGV, Segmentation fault. 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 (gdb) bt #0 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 #1 0x806e782 in __register_frame_info () #2 0x287a3137 in _init () from /usr/lib/libc_r.so.4 #3 0x2879ffe5 in _init () from /usr/lib/libc_r.so.4 #4 0x280797fd in _rtld () from /usr/libexec/ld-elf.so.1 Here are all the random facts which, when put together, explain what is going on. Your old application was (like all -pthread programs) linked with "/usr/lib/libgcc_r.a". That library contains a function "__register_frame_info" which uses some of the facilities of the pthreads library "libc_r". The pthreads library has to be initialized before it can be used, by a call to _thread_init. If some functions such as pthread_mutex_lock are called before the library has been initialized, a segmentation violation results. _thread_init is called automatically from libc_r's _init function when the dynamic linker loads the library. Unfortunately, that isn't early enough. libgcc_r is the first thing to be initialized, and it calls pthread_mutex_lock before _thread_init has been called. Or rather I should say that OLD versions of libgcc_r did that -- because they were buggy. In other words, your old application was linked with a buggy version of libgcc_r, but it didn't become apparent until now. It didn't become apparent until now because our crtbegin.o and crtend.o were also buggy. They failed to call __register_frame_info. This was a problem for C++ programs using exceptions, especially when the gcc port was used and DWARF2 exception handling was selected. Now we have fixed crtbegin.o and crtend.o, and we have fixed libgcc_r.a. But it causes problems for your old application because the new crtbegin.o and crtend.o (linked into the new shared libraries such as libc_r) call __register_frame_info in your old, buggy, statically linked libgcc_r.a. Are you dizzy yet? Yes ;-) To sum up, your old executable contains the bug but it wasn't triggered until the recent changes. Now, what can or should we do about this? Arguably we should simply say in the release notes, "Relink your old multithreaded applications. They had a bug which is now fixed." But if there are binary-only commercial apps which exhibit the problem, this solution is useless. I don't know whether there are any such apps, but I doubt it. N.B., Linux apps don't count because they were never linked with our libgcc_r in the first place. Or we can try to work around it, but there aren't any perfectly nice ways to do so. Here are some possibilities: - Put a hack in the threads library so that whenever pthread_mutex_lock is called it checks to make sure that the threads library has been initialized, and if not, it calls _thread_init. This is a poor solution because it adds overhead to a rather performance-critical function -- though admittedly the overhead is very small. Another potential problem is that there could be a race condition if several threads all called pthread_mutex_lock at once before the threads library had been initialized. I don't think the race condition would materialize, though, since the first call would come from libgcc_r, well before the application had gotten control. - Put a hack into the dynamic linker to call _thread_init very early if that symbol was defined. I like this solution even less, because it's too hackish. The dynamic linker isn't the place for special hooks like that. - Put a hack into crtbegin.o or crtend.o. But we are using the standard GNU versions of these, and I really really don't want to change that. In any case, it's the wrong place for the work-around. Overall I would lean toward putting the hack into pthread_mutex_lock. Comments? If that's the lesser evil, then I guess it's OK with me. -- Dan Eischen To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
In article [EMAIL PROTECTED], Daniel Eischen [EMAIL PROTECTED] wrote: Overall I would lean toward putting the hack into pthread_mutex_lock. Comments? If that's the lesser evil, then I guess it's OK with me. Thanks for replying so quickly. I'll test this to make sure it really works, and then commit it. John -- John Polstra [EMAIL PROTECTED] John D. Polstra Co., Inc.Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
John Polstra wrote: In article [EMAIL PROTECTED], Maxim Sobolev [EMAIL PROTECTED] wrote: I'm not sure what exactly caused this behaviour (I can guess two potential victims: O'Brien's changes in crt stuff and recent Polstra's changes in libgcc_r), but it seems that some programs built on the previous -current from 27 October immediately segfault when I'm trying to run then on system installed from today's sources. The segfault disappeared when I recompiled affected program. With this message I'm attaching short backtrace. [...] Program received signal SIGSEGV, Segmentation fault. 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 (gdb) bt #0 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 #1 0x806e782 in __register_frame_info () #2 0x287a3137 in _init () from /usr/lib/libc_r.so.4 #3 0x2879ffe5 in _init () from /usr/lib/libc_r.so.4 #4 0x280797fd in _rtld () from /usr/libexec/ld-elf.so.1 Here are all the random facts which, when put together, explain what is going on. Your old application was (like all -pthread programs) linked with "/usr/lib/libgcc_r.a". That library contains a function "__register_frame_info" which uses some of the facilities of the pthreads library "libc_r". The pthreads library has to be initialized before it can be used, by a call to _thread_init. If some functions such as pthread_mutex_lock are called before the library has been initialized, a segmentation violation results. _thread_init is called automatically from libc_r's _init function when the dynamic linker loads the library. Unfortunately, that isn't early enough. libgcc_r is the first thing to be initialized, and it calls pthread_mutex_lock before _thread_init has been called. Or rather I should say that OLD versions of libgcc_r did that -- because they were buggy. In other words, your old application was linked with a buggy version of libgcc_r, but it didn't become apparent until now. It didn't become apparent until now because our crtbegin.o and crtend.o were also buggy. They failed to call __register_frame_info. This was a problem for C++ programs using exceptions, especially when the gcc port was used and DWARF2 exception handling was selected. Now we have fixed crtbegin.o and crtend.o, and we have fixed libgcc_r.a. But it causes problems for your old application because the new crtbegin.o and crtend.o (linked into the new shared libraries such as libc_r) call __register_frame_info in your old, buggy, statically linked libgcc_r.a. Are you dizzy yet? To sum up, your old executable contains the bug but it wasn't triggered until the recent changes. Now, what can or should we do about this? Arguably we should simply say in the release notes, "Relink your old multithreaded applications. They had a bug which is now fixed." But if there are binary-only commercial apps which exhibit the problem, this solution is useless. I don't know whether there are any such apps, but I doubt it. N.B., Linux apps don't count because they were never linked with our libgcc_r in the first place. Or we can try to work around it, but there aren't any perfectly nice ways to do so. Here are some possibilities: - Put a hack in the threads library so that whenever pthread_mutex_lock is called it checks to make sure that the threads library has been initialized, and if not, it calls _thread_init. This is a poor solution because it adds overhead to a rather performance-critical function -- though admittedly the overhead is very small. Another potential problem is that there could be a race condition if several threads all called pthread_mutex_lock at once before the threads library had been initialized. I don't think the race condition would materialize, though, since the first call would come from libgcc_r, well before the application had gotten control. - Put a hack into the dynamic linker to call _thread_init very early if that symbol was defined. I like this solution even less, because it's too hackish. The dynamic linker isn't the place for special hooks like that. - Put a hack into crtbegin.o or crtend.o. But we are using the standard GNU versions of these, and I really really don't want to change that. In any case, it's the wrong place for the work-around. Overall I would lean toward putting the hack into pthread_mutex_lock. Comments? Huh, why we can't just bump libc_r version number and put older (buggy) version into lib/compat as usually? This would not require any ugly hacks at all. -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
In article [EMAIL PROTECTED], Maxim Sobolev [EMAIL PROTECTED] wrote: John Polstra wrote: Overall I would lean toward putting the hack into pthread_mutex_lock. Comments? Huh, why we can't just bump libc_r version number and put older (buggy) version into lib/compat as usually? This would not require any ugly hacks at all. The bug wasn't in libc_r -- it was in libgcc_r. That's a static library, so it doesn't have a version number. And it is statically linked into old executables. Nothing we do to libgcc_r will help old executables, because they won't even use the new libgcc_r. John -- John Polstra [EMAIL PROTECTED] John D. Polstra Co., Inc.Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
On Wed, 01 Nov 2000 21:09:12 +0200, Maxim Sobolev [EMAIL PROTECTED] said: Huh, why we can't just bump libc_r version number and put older (buggy) version into lib/compat as usually? This would not require any ugly hacks at all. If you want to bump libc_r's version, we should do it to libc as well, and in that case there are a large number of ABI fixes that I have queued up which should be done at the same time. -GAWollman To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
On Wed, Nov 01, 2000 at 03:19:36PM -0500, Garrett Wollman wrote: If you want to bump libc_r's version, we should do it to libc as well, and in that case there are a large number of ABI fixes that I have queued up which should be done at the same time. Any reason to not get them in -current now and make the bump? -- -- David ([EMAIL PROTECTED]) GNU is Not Unix / Linux Is Not UniX To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
On Wed, 1 Nov 2000 14:43:55 -0800, "David O'Brien" [EMAIL PROTECTED] said: Any reason to not get [libc ABI changes] in -current now and make the bump? Mostly because they're too small to be worth the pain. I'm waiting for something more significant that I can piggy-back on. -GAWollman -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same [EMAIL PROTECTED] | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
hi, there! On Wed, 1 Nov 2000, John Polstra wrote: Here are all the random facts which, when put together, explain what is going on. Your old application was (like all -pthread programs) linked with "/usr/lib/libgcc_r.a". That library contains a function "__register_frame_info" which uses some of the facilities of the pthreads library "libc_r". The pthreads library has to be initialized before it can be used, by a call to _thread_init. If some functions such as pthread_mutex_lock are called before the library has been initialized, a segmentation violation results. [...] Overall I would lean toward putting the hack into pthread_mutex_lock. Comments? do we still need uthread_autoinit.cc? /fjoe To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ABI is broken??
In article [EMAIL PROTECTED], Maxim Sobolev [EMAIL PROTECTED] wrote: I'm not sure what exactly caused this behaviour (I can guess two potential victims: O'Brien's changes in crt stuff and recent Polstra's changes in libgcc_r), but it seems that some programs built on the previous -current from 27 October immediately segfault when I'm trying to run then on system installed from today's sources. The segfault disappeared when I recompiled affected program. With this message I'm attaching short backtrace. [...] Program received signal SIGSEGV, Segmentation fault. 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 (gdb) bt #0 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 #1 0x806e782 in __register_frame_info () #2 0x287a3137 in _init () from /usr/lib/libc_r.so.4 #3 0x2879ffe5 in _init () from /usr/lib/libc_r.so.4 #4 0x280797fd in _rtld () from /usr/libexec/ld-elf.so.1 Here are all the random facts which, when put together, explain what is going on. Your old application was (like all -pthread programs) linked with "/usr/lib/libgcc_r.a". That library contains a function "__register_frame_info" which uses some of the facilities of the pthreads library "libc_r". The pthreads library has to be initialized before it can be used, by a call to _thread_init. If some functions such as pthread_mutex_lock are called before the library has been initialized, a segmentation violation results. _thread_init is called automatically from libc_r's _init function when the dynamic linker loads the library. Unfortunately, that isn't early enough. libgcc_r is the first thing to be initialized, and it calls pthread_mutex_lock before _thread_init has been called. Or rather I should say that OLD versions of libgcc_r did that -- because they were buggy. In other words, your old application was linked with a buggy version of libgcc_r, but it didn't become apparent until now. It didn't become apparent until now because our crtbegin.o and crtend.o were also buggy. They failed to call __register_frame_info. This was a problem for C++ programs using exceptions, especially when the gcc port was used and DWARF2 exception handling was selected. Now we have fixed crtbegin.o and crtend.o, and we have fixed libgcc_r.a. But it causes problems for your old application because the new crtbegin.o and crtend.o (linked into the new shared libraries such as libc_r) call __register_frame_info in your old, buggy, statically linked libgcc_r.a. Are you dizzy yet? To sum up, your old executable contains the bug but it wasn't triggered until the recent changes. Now, what can or should we do about this? Arguably we should simply say in the release notes, "Relink your old multithreaded applications. They had a bug which is now fixed." But if there are binary-only commercial apps which exhibit the problem, this solution is useless. I don't know whether there are any such apps, but I doubt it. N.B., Linux apps don't count because they were never linked with our libgcc_r in the first place. Or we can try to work around it, but there aren't any perfectly nice ways to do so. Here are some possibilities: - Put a hack in the threads library so that whenever pthread_mutex_lock is called it checks to make sure that the threads library has been initialized, and if not, it calls _thread_init. This is a poor solution because it adds overhead to a rather performance-critical function -- though admittedly the overhead is very small. Another potential problem is that there could be a race condition if several threads all called pthread_mutex_lock at once before the threads library had been initialized. I don't think the race condition would materialize, though, since the first call would come from libgcc_r, well before the application had gotten control. - Put a hack into the dynamic linker to call _thread_init very early if that symbol was defined. I like this solution even less, because it's too hackish. The dynamic linker isn't the place for special hooks like that. - Put a hack into crtbegin.o or crtend.o. But we are using the standard GNU versions of these, and I really really don't want to change that. In any case, it's the wrong place for the work-around. Overall I would lean toward putting the hack into pthread_mutex_lock. Comments? John -- John Polstra [EMAIL PROTECTED] John D. Polstra Co., Inc.Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message