Bug#607368: Please decide how kernel ABI should be managed
So, since something drew my attention to this bug again We made a decision by default to not override the kernel team for squeeze already. Reviewing the thread, it seems to me like the kernel team both has good reasons for their decisions and has a reasonable grasp of the issues, and is evaluating possible alternative solutions going forward. I don't see a need for the technical committee to override their decisions here. Would anyone like to put forward any alternative proposed actions besides declining to override the kernel team? Should we have a vote? -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Sun, Dec 19, 2010 at 19:30:58 +0100, Julien BLACHE wrote: I think it would be best if this matter would be decided upon before the release of Squeeze, or not too long after it, so as to avoid further breakages in early kernel updates for Squeeze. We're getting close to the squeeze release. Is the technical committee going to reach a decision on this? Cheers, Julien signature.asc Description: Digital signature
Bug#607368: Please decide how kernel ABI should be managed
Don Armstrong d...@debian.org wrote: Hi, Ok. My main concern here is what exactly would happen if we were to ignore the ABI change for this particular issue, and then put in place some kind of a process where the kernel team could be informed of downstream users of the ABI. The harm is done now, reverting or bumping the ABI at this point only makes things worse. Full deployment involves over a thousand workstations. But presumably they're not running a testing version affected by this. At this time I have no assurance that this issue or a similar issue with another symbol won't happen again during the Squeeze lifetime, so they are potentially affected until proven otherwise as far as I'm concerned. To the thousand machines given above, you can add several hundred machines part of several HPC clusters; the nodes use external InfiniBand drivers from ofa-kernel 1.5.2 in the pkg-ofed repository. Having the cluster fail to come online after a kernel upgrade would be interesting. We also have servers using the Brocade FC HBA/CNA drivers from Brocade, due to the 2.6.32 drivers being way out of date (2.6.32-2.6.37 is ca. 100 commits and needs new firmware files with new names, if anyone is interested). package is upgraded, we'd still have issues with on-disk modules not matching the running kernel ABI until the machine is rebooted. This can sometimes take two or three weeks if a long-running computation is running on the machine. Presumably this wouldn't be much of an issue, unless users are going to be newly loading these modules. [Which I would hope wouldn't be the case if you were running a long-running computation.] Modules get loaded automatically pretty much all the time on a workstation: filesystem modules for a USB key or when upgrading grub, drivers for USB devices, you name it. And I'll ask again: what's the point of the kernel ABI number if we have to use strict dependencies? Some modules may need strict dependencies if they are using symbols not covered by the ABI; this is one possible way that we can resolve this issue. The issue I have with that, other than the fact that it is just plain wrong, is that all the module packaging tools were built on the premise that changes to the kernel ABI are reflected by the ABI number. None of the tools work if that premise doesn't hold true. JB. -- Julien BLACHE jbla...@debian.org | Debian, because code matters more Debian GNU/Linux Developer| http://www.debian.org Public key available on http://www.jblache.org - KeyID: F5D6 5169 GPG Fingerprint : 935A 79F1 C8B3 3521 FD62 7CC7 CD61 4FD7 F5D6 5169 -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
Don Armstrong d...@debian.org wrote: Hi, Ok. For some reason, I hadn't originally noticed that this was concerning an OOT module which Debian itself didn't actually distribute. [Julien: I'm correct in that, right?] But that's probably fine. You are correct. Julien: Are you currently shipping a kernel in production which would be affected by this change if we don't change the ABI number? Or does this only affect cases where you are testing squeeze? Could it be I have 30 beta-testers that are affected by this issue on the workstations they have started using for their everyday work. Although it's still a beta phase, at this point, these workstations are to be considered in production given the users have basically made the switch now. Full deployment involves over a thousand workstations. worked around by using DKMS or similar with prebuilt binaries and requiring exact kernel version dependencies? DKMS is useless if the ABI number doesn't change, in its current form. If DKMS was changed to rebuild all modules when the kernel package is upgraded, we'd still have issues with on-disk modules not matching the running kernel ABI until the machine is rebooted. This can sometimes take two or three weeks if a long-running computation is running on the machine. We switched to DKMS to reduce the maintenance cost associated with prebuilt binaries. We'd rather not come back to that if we can help it. It also adds a delay to kernel updates that we'd rather avoid. As to using strict dependencies... it makes all of the above even worse. And I'll ask again: what's the point of the kernel ABI number if we have to use strict dependencies? Seriously? We need a kernel ABI numbering we can rely on. JB. -- Julien BLACHE jbla...@debian.org | Debian, because code matters more Debian GNU/Linux Developer| http://www.debian.org Public key available on http://www.jblache.org - KeyID: F5D6 5169 GPG Fingerprint : 935A 79F1 C8B3 3521 FD62 7CC7 CD61 4FD7 F5D6 5169 -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Tue, Jan 04, 2011 at 12:28:22PM +0100, Julien BLACHE wrote: Don Armstrong d...@debian.org wrote: [...] worked around by using DKMS or similar with prebuilt binaries and requiring exact kernel version dependencies? DKMS is useless if the ABI number doesn't change, in its current form. If DKMS was changed to rebuild all modules when the kernel package is upgraded, we'd still have issues with on-disk modules not matching the running kernel ABI until the machine is rebooted. This can sometimes take two or three weeks if a long-running computation is running on the machine. We switched to DKMS to reduce the maintenance cost associated with prebuilt binaries. We'd rather not come back to that if we can help it. It also adds a delay to kernel updates that we'd rather avoid. As to using strict dependencies... it makes all of the above even worse. And I'll ask again: what's the point of the kernel ABI number if we have to use strict dependencies? Seriously? [...] Do pay attention. We were discussing the implications of changing our current practice of trying to avoid ABI bumps during freeze and stable updates. We would then probably change the uname release (the ABI identifier) in each version of the package. Ben. -- Ben Hutchings We get into the habit of living before acquiring the habit of thinking. - Albert Camus -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Tue, 04 Jan 2011, Julien BLACHE wrote: Don Armstrong d...@debian.org wrote: Julien: Are you currently shipping a kernel in production which would be affected by this change if we don't change the ABI number? Or does this only affect cases where you are testing squeeze? Could it be I have 30 beta-testers that are affected by this issue on the workstations they have started using for their everyday work. Although it's still a beta phase, at this point, these workstations are to be considered in production given the users have basically made the switch now. Ok. My main concern here is what exactly would happen if we were to ignore the ABI change for this particular issue, and then put in place some kind of a process where the kernel team could be informed of downstream users of the ABI. From my current understanding, the ABI number is only meant to cover some of the symbols which can be used externally, not all of them. [Specifically, those that the kernel team are aware of being used externally.] Full deployment involves over a thousand workstations. But presumably they're not running a testing version affected by this. worked around by using DKMS or similar with prebuilt binaries and requiring exact kernel version dependencies? DKMS is useless if the ABI number doesn't change, in its current form. If DKMS was changed to rebuild all modules when the kernel package is upgraded, we'd still have issues with on-disk modules not matching the running kernel ABI until the machine is rebooted. This can sometimes take two or three weeks if a long-running computation is running on the machine. Presumably this wouldn't be much of an issue, unless users are going to be newly loading these modules. [Which I would hope wouldn't be the case if you were running a long-running computation.] As to using strict dependencies... it makes all of the above even worse. Certainly; there's a cost to be born on both sides. The most important thing to avoid from my perspective is a kernel which when booted has modules that cannot be loaded. And I'll ask again: what's the point of the kernel ABI number if we have to use strict dependencies? Some modules may need strict dependencies if they are using symbols not covered by the ABI; this is one possible way that we can resolve this issue. Seriously? Lets restrict ourselves to discussing the technical issues and possible solutions instead of rhetorical flourishes. Don Armstrong -- The computer allows you to make mistakes faster than any other invention, with the possible exception of handguns and tequila -- Mitch Ratcliffe http://www.donarmstrong.com http://rzlab.ucr.edu -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
Ben Hutchings b...@decadent.org.uk writes: Do pay attention. We were discussing the implications of changing our current practice of trying to avoid ABI bumps during freeze and stable updates. We would then probably change the uname release (the ABI identifier) in each version of the package. This is certainly becoming more appealing with DKMS, but with my Stanford sysadmin hat on, I have to admit that we'd find it rather annoying if the ABI changed in stable. I think that may be a good way to go in unstable and testing up to the release, but it would be very nice to not do that after the release. With hundreds of servers, we'd rather not install compilers and DKMS on every one of them, and with lots of machines, the loss of reproducibility from separately compiling the modules on every system is an increasingly large drawback. We currently build internal packages (from the *-source packages provided by Debian) for those external modules that we use so that we can deploy the same thing everywhere, and having to rebuild modules for every kernel update and deploy those new builds with the kernel update would be fairly annoying. With that system, we know for sure that if the module mysteriously fails on one system but not on others, it's not because it's a weird build or has some other compilation issue. In fact, we know almost exactly how annoying it would be, since Red Hat has this policy, and it's been a major pain. The handling of the kernel versioning in stable is currently one of the major selling points for Debian over Red Hat for us. The very few times an ABI change was forced in Debian stable due to some security issue, we had to put a fair bit of work into making sure that everything was upgraded properly everywhere to the new ABI. (So thank you very much for all the work that you put into maintaining the ABI!) -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Tue, 2011-01-04 at 17:23 -0800, Russ Allbery wrote: Ben Hutchings b...@decadent.org.uk writes: Do pay attention. We were discussing the implications of changing our current practice of trying to avoid ABI bumps during freeze and stable updates. We would then probably change the uname release (the ABI identifier) in each version of the package. This is certainly becoming more appealing with DKMS, but with my Stanford sysadmin hat on, I have to admit that we'd find it rather annoying if the ABI changed in stable. I think that may be a good way to go in unstable and testing up to the release, but it would be very nice to not do that after the release. However, the upstream policy for stable updates does not support this. With hundreds of servers, we'd rather not install compilers and DKMS on every one of them, and with lots of machines, the loss of reproducibility from separately compiling the modules on every system is an increasingly large drawback. This is why DKMS has the facility to build packages for installation elsewhere. We currently build internal packages (from the *-source packages provided by Debian) for those external modules that we use so that we can deploy the same thing everywhere, and having to rebuild modules for every kernel update and deploy those new builds with the kernel update would be fairly annoying. With that system, we know for sure that if the module mysteriously fails on one system but not on others, it's not because it's a weird build or has some other compilation issue. In fact, we know almost exactly how annoying it would be, since Red Hat has this policy, and it's been a major pain. The handling of the kernel versioning in stable is currently one of the major selling points for Debian over Red Hat for us. [...] Note that Red Hat does maintain the ABI for most functions, even though it change the uname release. If you package OOT modules using the 'KMP' macros for RPM, binary modules will be sym-linked into a 'weak-updates' subdirectory for a newer kernel if their symbol dependencies are still met. We could try to implement something like that in Debian. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#607368: Please decide how kernel ABI should be managed
Ben Hutchings b...@decadent.org.uk writes: On Tue, 2011-01-04 at 17:23 -0800, Russ Allbery wrote: With hundreds of servers, we'd rather not install compilers and DKMS on every one of them, and with lots of machines, the loss of reproducibility from separately compiling the modules on every system is an increasingly large drawback. This is why DKMS has the facility to build packages for installation elsewhere. But there would be no purpose served in using DKMS for this. The only place where DKMS has an advantage over building real Debian packages for the modules is if you're going to let every machine build its own modules. As soon as you are distributing modules built once to multiple machines, using DKMS to do that is vaguely absurd: you have to reinvent all the mechanisms of a repository and package upgrade system, when we already have a perfectly useful and reasonable one in apt repositories with package versioning and proper dependencies. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Tue, 2011-01-04 at 17:55 -0800, Russ Allbery wrote: Ben Hutchings b...@decadent.org.uk writes: On Tue, 2011-01-04 at 17:23 -0800, Russ Allbery wrote: With hundreds of servers, we'd rather not install compilers and DKMS on every one of them, and with lots of machines, the loss of reproducibility from separately compiling the modules on every system is an increasingly large drawback. This is why DKMS has the facility to build packages for installation elsewhere. But there would be no purpose served in using DKMS for this. The only place where DKMS has an advantage over building real Debian packages for the modules is if you're going to let every machine build its own modules. [...] DKMS does build real Debian packages. And that means that OOT module sources do not need to be packaged differently depending on where the modules will be built. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#607368: Please decide how kernel ABI should be managed
Ben Hutchings b...@decadent.org.uk writes: DKMS does build real Debian packages. And that means that OOT module sources do not need to be packaged differently depending on where the modules will be built. Oh, huh, I hadn't noticed that. Thanks for the pointer! I'll have to play with that; I'd only previously seen the tarball distribution and installation mechanism. The work of providing both the -dkms and the traditional -source package is fairly trivial and not much of a drain on the packager's time once the original -source rules have been written. I'm doing it right now for multiple packages. But writing the original -source package rules file is arcane and very under-documented, so this is potentially a long-term improvement. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Mon, 27 Dec 2010, Ben Hutchings wrote: On Sun, 2010-12-26 at 15:55 -0800, Don Armstrong wrote: Ok. And am I correct in assuming that if the ABI change would break an OOT module, you would normally change the ABI number? In the time I've been involved in the kernel team, I haven't yet seen a case where a bug fix required an ABI change that I knew would break an OOT module. So in this case, if it was clear that the change would have broken an OOT module, the kernel team would normally either postpone the change, or change the ABI number. Anything distributed by Debian should meet those qualifications, but users such as Julien also care about modules from other sources. I normally use Google Code Search to check for OOT modules using symbols that have changed ABI and which I think might be ignorable. Ok. For some reason, I hadn't originally noticed that this was concerning an OOT module which Debian itself didn't actually distribute. [Julien: I'm correct in that, right?] But that's probably fine. How are the symbols that those OOT modules use communicated to the kernel team? They aren't. Would putting the onus on OOT maintainers to maintain such a list be of benefit to the kernel maintainer team? What does the kernel maintainer team feel should be done by the maintainer in this case to ensure continuity of upgrades and rebuilds of the OOT modules? [...] We recommend that OOT module package makes use of DKMS. DKMS includes hook scripts to trigger rebuilding OOT modules automatically for each new kernel ABI version, if the end user or administrator installs the module source and the appropriate linux-headers package. In a more tightly controlled environment where such packages should not be installed on production servers, the administrator must rebuild modules elsewhere and deploy them along with the kernel upgrade. DKMS provides various means for this. Makes sense. What about this case? What should Julien do? Julien: Are you currently shipping a kernel in production which would be affected by this change if we don't change the ABI number? Or does this only affect cases where you are testing squeeze? Could it be worked around by using DKMS or similar with prebuilt binaries and requiring exact kernel version dependencies? Don Armstrong -- I don't care how poor and inefficient a little country is; they like to run their own business. I know men that would make my wife a better husband than I am; but, darn it, I'm not going to give her to 'em. -- The Best of Will Rogers http://www.donarmstrong.com http://rzlab.ucr.edu -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Thu, 2010-12-23 at 12:08 -0800, Don Armstrong wrote: On Sun, 19 Dec 2010, Julien BLACHE wrote: I think it would be best if this matter would be decided upon before the release of Squeeze, or not too long after it, so as to avoid further breakages in early kernel updates for Squeeze. I have a couple of (possibly naïve) questions that would help me understand the space of solutions here. 1) What is the kernel ABI currently used to indicate? The ABI *number* indicates a range of versions within which newer versions are likely to remain compatible with modules built for an older version. Where do we specify what it guarantees? We don't. 2) What are all of the options for handling this situation? Specifically, how should a package maintainer who is maintaining a out-of-tree module which uses symbols from the kernel handle them through an upgrade which changes the symbols? If the symbols need to be covered by the ABI, how can the maintainer get them covered by ABI? What should they do in cases when they are not covered by the ABI? My main concern is that there seems to be no way for oot modules like the vmware modules to sanely keep in step with the kernel ABI. While this may not be a concern for kernel upstream, it's something that we would ideally deal with to avoid issues for our users on upgrades. I think I should explain at this point the trade-off we're trying to make. As you know, the kernel-space ABI is volatile and upstream has no intention of maintaining it, even within a stable/long-term series. Build configuration changes may also change the ABI in unexpected ways. Therefore it is generally not practical to maintain ABI within a single upstream version. Changing the ABI number requires (1) changing the package names and (2) rebuilding out-of-tree modules. (1) means linux-2.6 must go through the NEW queue and also disrupts d-i development (the latter problem may be reduced within the wheezy release cycle). It also requires end users and administrators to explicitly remove old kernel image packages. (2) should not be a huge burden so long as the modules are packaged using dkms, but auto- rebuilding relies on having a toolchain installed. Therefore we do not like to change the ABI number during a stable release or the preceding freeze. The result of these competing pressures is that we have to fudge ABI changes. Where possible, we adjust upstream fixes to remain backward-compatible. In other cases we revert fixes or ignore the ABI changes, based on our judgement of the costs and benefits. --- If people don't like this compromise, then I think the only reasonable alternative is to do what most other distributions do: set the kernel version (as shown by uname -r) to the package version. This means that each new upload will have new package names (and will require an upload of linux-latest-2.6). APT should also be fixed to allow auto-removal of old kernel images. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#607368: Please decide how kernel ABI should be managed
On Sun, 26 Dec 2010, Ben Hutchings wrote: On Thu, 2010-12-23 at 12:08 -0800, Don Armstrong wrote: On Sun, 19 Dec 2010, Julien BLACHE wrote: I think it would be best if this matter would be decided upon before the release of Squeeze, or not too long after it, so as to avoid further breakages in early kernel updates for Squeeze. I have a couple of (possibly naïve) questions that would help me understand the space of solutions here. 1) What is the kernel ABI currently used to indicate? The ABI *number* indicates a range of versions within which newer versions are likely to remain compatible with modules built for an older version. So currently there is no guarantee that a specific ABI maintains any kind of compatibility for out of tree modules; it is a best effort based on the kernel maintainer's understanding of what symbols have changed and what out of tree (or even in-tree) modules are affected. Do the kernel maintainers currently track compatibility of in-tree modules for modules which may reasonably be loaded during the lifetime of the install? [I'm thinking of removable device drivers, things like KVM, etc.] I think I should explain at this point the trade-off we're trying to make. As you know, the kernel-space ABI is volatile and upstream has no intention of maintaining it, even within a stable/long-term series. Build configuration changes may also change the ABI in unexpected ways. Therefore it is generally not practical to maintain ABI within a single upstream version. Right. Changing the ABI number requires (1) changing the package names and (2) rebuilding out-of-tree modules. (1) means linux-2.6 must go through the NEW queue and also disrupts d-i development (the latter problem may be reduced within the wheezy release cycle). It also requires end users and administrators to explicitly remove old kernel image packages. (2) should not be a huge burden so long as the modules are packaged using dkms, but auto- rebuilding relies on having a toolchain installed. Therefore we do not like to change the ABI number during a stable release or the preceding freeze. So from what I can see, the ideal situation is to not change the kernel ABI number unless we absolutely have to. What I think is missing now, is a discussion of which cases where changing the ABI number is necessary for proper functioning, and which cases of malfunction we feel are acceptable, and which are not. For in tree modules, all of the problems that would occur from upgrading a kernel where the ABI had changed (but not the number) can be resolved by rebooting. I'm personally a bit concerned that these errors may be a bit disconcerting to our users, but that may be something we decide to live with and document. For out of tree modules, these problems can either be resolved by changing the ABI number, or possibly by using Breaks: for all of the affected out-of-tree modules where the change wasn't wide-spread enough to bump the ABI number. A slightly wilder alternative, is to Provides: linux-kernel-abi-2.6.32-vmware-5 or something for out-of-tree modules which aren't going to be covered by the main ABI, but are important enough to require compatibility. Alternatively, we can ignore them, and require that end-users of these out of tree modules know that they must upgrade their out-of-tree modules in lockstep with the kernel. Which in-tree modules should we change the ABI number for? Which out-of-tree modules? How does an out-of-tree module writer know? How can they promote their module to get a Breaks or Provides or whatever? Don Armstrong -- It has always been Debian's philosophy in the past to stick to what makes sense, regardless of what crack the rest of the universe is smoking. -- Andrew Suffield in 20030403211305.gd29...@doc.ic.ac.uk http://www.donarmstrong.com http://rzlab.ucr.edu -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
Don Armstrong d...@debian.org writes: So currently there is no guarantee that a specific ABI maintains any kind of compatibility for out of tree modules; it is a best effort based on the kernel maintainer's understanding of what symbols have changed and what out of tree (or even in-tree) modules are affected. I feel like I should note here that I've been maintaining a complex out-of-tree kernel module for Debian for many years now (openafs) and am also involved in maintaining the non-free NVIDIA modules, and I can't remember ever having the kernel ABI break for those modules without the ABI number changing. It's probably happened and I just don't remember it, but certainly not enough to be memorable. *Upstream* has caused us all sorts of problems from time to time because of taking public symbols and making them GPL-only (OpenAFS predates Linux and the core of the source is licensed under a free but GPL-incompatible license, which also affects the kernel module), but the Debian kernel maintainers have always done a great job at maintaining ABI guarantees, insofar as my packages are affected. The only problem that I recall with the ABI numbering was the unfortunate use of -trunk as an ABI version during the squeeze development cycle, and there mostly because -trunk sorted inappropriately after regular ABI numbers were introduced, not because of an inherent problem with the use of that technique in unstable. So while I do recognize that there was a problem with an out-of-tree module that brought this particular bug to the technical committee, I have to say that with my out-of-tree module maintainer hat on the kernel team seems to, by and large, be doing a good job of maintaining the kernel ABI already. That inclines me against supporting any major change in how this is handled. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
Don Armstrong d...@debian.org wrote: Hi, For out of tree modules, these problems can either be resolved by changing the ABI number, or possibly by using Breaks: for all of the affected out-of-tree modules where the change wasn't wide-spread enough to bump the ABI number. A slightly wilder alternative, is to Provides: linux-kernel-abi-2.6.32-vmware-5 or something for out-of-tree modules which aren't going to be covered by the main ABI, but are important enough to require compatibility. Alternatively, we This doesn't work for modules packaged/installed with DKMS, which is slowly replacing module-assistant (and is not Debian-specific, this is important to keep in mind here). Unless DKMS in Debian switches to building modules at boot time, which it currently doesn't do - and that would not solve the issue for modules needed in the initrd. Not to mention that it would lengthen the boot time and could break the boot for any number of reasons [1]. As you noted, silently breaking the ABI opens up a window during which modules on-disk are potentially incompatible with the running kernel. Not ideal and not easy to diagnose if you don't have some kernel knowledge. JB. [1] Like running into an endless loop while attempting to build a module, as happened to me with blcr, which would be pretty inconvenient at boot time. -- Julien BLACHE jbla...@debian.org | Debian, because code matters more Debian GNU/Linux Developer| http://www.debian.org Public key available on http://www.jblache.org - KeyID: F5D6 5169 GPG Fingerprint : 935A 79F1 C8B3 3521 FD62 7CC7 CD61 4FD7 F5D6 5169 -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Sun, 2010-12-26 at 12:23 -0800, Don Armstrong wrote: On Sun, 26 Dec 2010, Ben Hutchings wrote: On Thu, 2010-12-23 at 12:08 -0800, Don Armstrong wrote: On Sun, 19 Dec 2010, Julien BLACHE wrote: I think it would be best if this matter would be decided upon before the release of Squeeze, or not too long after it, so as to avoid further breakages in early kernel updates for Squeeze. I have a couple of (possibly naïve) questions that would help me understand the space of solutions here. 1) What is the kernel ABI currently used to indicate? The ABI *number* indicates a range of versions within which newer versions are likely to remain compatible with modules built for an older version. So currently there is no guarantee that a specific ABI maintains any kind of compatibility for out of tree modules; it is a best effort based on the kernel maintainer's understanding of what symbols have changed and what out of tree (or even in-tree) modules are affected. Do the kernel maintainers currently track compatibility of in-tree modules for modules which may reasonably be loaded during the lifetime of the install? [I'm thinking of removable device drivers, things like KVM, etc.] Not specifically. *Most* modules will remain compatible, but we expect users to reboot shortly after a kernel upgrade. [...] What I think is missing now, is a discussion of which cases where changing the ABI number is necessary for proper functioning, and which cases of malfunction we feel are acceptable, and which are not. For in tree modules, all of the problems that would occur from upgrading a kernel where the ABI had changed (but not the number) can be resolved by rebooting. I'm personally a bit concerned that these errors may be a bit disconcerting to our users, but that may be something we decide to live with and document. For out of tree modules, these problems can either be resolved by changing the ABI number, Yes. or possibly by using Breaks: for all of the affected out-of-tree modules where the change wasn't wide-spread enough to bump the ABI number. No. Firstly, if we know that an ABI change would break an OOT module then we try to avoid making that change. Therefore, if an ABI change does break an OOT module then we would not know that we should add the Breaks relation. Also, we now recommend that OOT module sources are packaged using dkms, which means the module binaries are *not* packaged and no such relation can be declared. A slightly wilder alternative, is to Provides: linux-kernel-abi-2.6.32-vmware-5 or something for out-of-tree modules which aren't going to be covered by the main ABI, but are important enough to require compatibility. [...] I refuse to support any specific OOT module in this way unless paid to do so. I expect that other kernel team members will tell you the same. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#607368: Please decide how kernel ABI should be managed
On Sun, 26 Dec 2010, Ben Hutchings wrote: On Sun, 2010-12-26 at 12:23 -0800, Don Armstrong wrote: or possibly by using Breaks: for all of the affected out-of-tree modules where the change wasn't wide-spread enough to bump the ABI number. No. Firstly, if we know that an ABI change would break an OOT module then we try to avoid making that change. Ok. And am I correct in assuming that if the ABI change would break an OOT module, you would normally change the ABI number? Which OOT modules are important enough to result in ABI number changes? How are the symbols that those OOT modules use communicated to the kernel team? What does the kernel maintainer team feel should be done by the maintainer in this case to ensure continuity of upgrades and rebuilds of the OOT modules? A slightly wilder alternative, is to Provides: linux-kernel-abi-2.6.32-vmware-5 or something for out-of-tree modules which aren't going to be covered by the main ABI, but are important enough to require compatibility. I refuse to support any specific OOT module in this way unless paid to do so. I expect that other kernel team members will tell you the same. I personally don't think a Provides: solution is going to be feasible for technical reasons, and coordination reasons. Lets restrict ourselves to discussing the technical reasons why a solution is infeasible, rather than possible monetary impetus required to implement them. Don Armstrong -- No matter how many instances of white swans we may have observed, this does not justify the conclusion that all swans are white. -- Sir Karl Popper _Logic of Scientific Discovery_ http://www.donarmstrong.com http://rzlab.ucr.edu -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Sun, 2010-12-26 at 15:55 -0800, Don Armstrong wrote: On Sun, 26 Dec 2010, Ben Hutchings wrote: On Sun, 2010-12-26 at 12:23 -0800, Don Armstrong wrote: or possibly by using Breaks: for all of the affected out-of-tree modules where the change wasn't wide-spread enough to bump the ABI number. No. Firstly, if we know that an ABI change would break an OOT module then we try to avoid making that change. Ok. And am I correct in assuming that if the ABI change would break an OOT module, you would normally change the ABI number? In the time I've been involved in the kernel team, I haven't yet seen a case where a bug fix required an ABI change that I knew would break an OOT module. I understand that in the past the kernel team has deferred such bug fixes and eventually applied such deferred changes as a batch while changing the ABI number, after coordinating with affected people (such as the d-i and CD teams). Which OOT modules are important enough to result in ABI number changes? We don't have a formal policy but I think we consider OOT modules that (1) appear to be used in production and (2) have published source code for at least the part that directly uses kernel symbols. Anything distributed by Debian should meet those qualifications, but users such as Julien also care about modules from other sources. I normally use Google Code Search to check for OOT modules using symbols that have changed ABI and which I think might be ignorable. How are the symbols that those OOT modules use communicated to the kernel team? They aren't. What does the kernel maintainer team feel should be done by the maintainer in this case to ensure continuity of upgrades and rebuilds of the OOT modules? [...] We recommend that OOT module package makes use of DKMS. DKMS includes hook scripts to trigger rebuilding OOT modules automatically for each new kernel ABI version, if the end user or administrator installs the module source and the appropriate linux-headers package. In a more tightly controlled environment where such packages should not be installed on production servers, the administrator must rebuild modules elsewhere and deploy them along with the kernel upgrade. DKMS provides various means for this. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#607368: Please decide how kernel ABI should be managed
Don Armstrong d...@debian.org wrote: Hi Don, You should bounce your mail to the kernel team as they were not Cc:ed and the questions are directed to them. My main concern is that there seems to be no way for oot modules like the vmware modules to sanely keep in step with the kernel ABI. While Correct. this may not be a concern for kernel upstream, it's something that we would ideally deal with to avoid issues for our users on upgrades. Upstream doesn't have a notion of kernel ABI, this is left for the distributors to handle. This can only work if changes to the ABI don't get ignored for convenience or any other equally bad reason. JB. -- Julien BLACHE jbla...@debian.org | Debian, because code matters more Debian GNU/Linux Developer| http://www.debian.org Public key available on http://www.jblache.org - KeyID: F5D6 5169 GPG Fingerprint : 935A 79F1 C8B3 3521 FD62 7CC7 CD61 4FD7 F5D6 5169 -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Sun, 19 Dec 2010, Julien BLACHE wrote: I think it would be best if this matter would be decided upon before the release of Squeeze, or not too long after it, so as to avoid further breakages in early kernel updates for Squeeze. I have a couple of (possibly naïve) questions that would help me understand the space of solutions here. 1) What is the kernel ABI currently used to indicate? Where do we specify what it guarantees? 2) What are all of the options for handling this situation? Specifically, how should a package maintainer who is maintaining a out-of-tree module which uses symbols from the kernel handle them through an upgrade which changes the symbols? If the symbols need to be covered by the ABI, how can the maintainer get them covered by ABI? What should they do in cases when they are not covered by the ABI? My main concern is that there seems to be no way for oot modules like the vmware modules to sanely keep in step with the kernel ABI. While this may not be a concern for kernel upstream, it's something that we would ideally deal with to avoid issues for our users on upgrades. Don Armstrong -- He no longer wished to be dead. At the same time, it cannot be said that he was glad to be alive. But at least he did not resent it. He was alive, and the stubbornness of this fact had little by little begun to fascinate him -- as if he had managed to outlive himself, as if he were somehow living a posthumous life. -- Paul Auster _City of Glass_ http://www.donarmstrong.com http://rzlab.ucr.edu -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
Julien BLACHE jbla...@debian.org wrote: Hi, Furthermore it is indeed quite unclear if said company is not effectively violating GPL and several core dev do indeed think so. Uh? [citation needed] please, especially given VMware modules ship as source although I can't remember their licensing terms right now. I've done that now and all the modules are GPL. There goes your claim. JB. -- Julien BLACHE - Debian GNU/Linux Developer - jbla...@debian.org Public key available on http://www.jblache.org - KeyID: F5D6 5169 GPG Fingerprint : 935A 79F1 C8B3 3521 FD62 7CC7 CD61 4FD7 F5D6 5169 -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
maximilian attems m...@stro.at wrote: Hi, The submitter shows a clear confusion between the requirements of a shared lib userspace and the linux-2.6 kernel. Be assured there is no confusion on my end on this topic. Furthermore it is indeed quite unclear if said company is not effectively violating GPL and several core dev do indeed think so. Uh? [citation needed] please, especially given VMware modules ship as source although I can't remember their licensing terms right now. JB. -- Julien BLACHE jbla...@debian.org | Debian, because code matters more Debian GNU/Linux Developer| http://www.debian.org Public key available on http://www.jblache.org - KeyID: F5D6 5169 GPG Fingerprint : 935A 79F1 C8B3 3521 FD62 7CC7 CD61 4FD7 F5D6 5169 -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
reopen 607368 tags 607368 - wontfix reassign 607368 tech-ctte retitle 607368 Please decide how kernel ABI should be managed thanks Hi, I am hereby asking the tech-ctte to decide how the kernel ABI should be managed. Case in point: the kernel team decided to ignore changes to the smp_ops symbol in 2.6.32-28 which broke external modules (vmware) without any prior warning. I am worried that this is going to happen again during the lifetime of Squeeze, silently breaking working setups upon reboot after a kernel update, even though the new kernel carries the same ABI number as the previous one. I do agree that it is fine to ignore changes to symbols that are only exported and used inside a self-contained group of modules to which no additional modules will ever need to be added. I disagree with the kernel team's take that it is OK for them to ignore symbol changes in all other cases, especially for symbols exported by the core kernel (like smp_ops). This kind of silent breakage is a nightmare from an ops standpoint and it does have a cost for our users. The ABI number should guarantee that upgrading from a revision of linux-image to another carrying the same ABI number will not cause any breakage with external modules built for this ABI. As the kernel team made it clear that they make their decision partly based on symbol usage, I'd like to highlight once again, for the specific case of smp_ops, that VMware modules aren't exactly pet modules that only a few of our users care about. There is ample proof of this on several web forums and mailing-lists dedicated to either VMware or Debian. I am seeking a generic ruling by the tech-ctte to ensure that the kernel ABI number remains meaningful and dependable. I think it would be best if this matter would be decided upon before the release of Squeeze, or not too long after it, so as to avoid further breakages in early kernel updates for Squeeze. Thanks, JB. -- Julien BLACHE jbla...@debian.org | Debian, because code matters more Debian GNU/Linux Developer| http://www.debian.org Public key available on http://www.jblache.org - KeyID: F5D6 5169 GPG Fingerprint : 935A 79F1 C8B3 3521 FD62 7CC7 CD61 4FD7 F5D6 5169 -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Sun, Dec 19, 2010 at 07:30:58PM +0100, Julien BLACHE wrote: reopen 607368 tags 607368 - wontfix reassign 607368 tech-ctte retitle 607368 Please decide how kernel ABI should be managed thanks Hi, I am hereby asking the tech-ctte to decide how the kernel ABI should be managed. Case in point: the kernel team decided to ignore changes to the smp_ops symbol in 2.6.32-28 which broke external modules (vmware) without any prior warning. FWIW; the ABI handling has been fairly strict during the lifetime of a stable release. I'm not aware that the same situation has occured during the Etch or Lenny lifetime. Cheers, Moritz -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#607368: Please decide how kernel ABI should be managed
On Sun, Dec 19, 2010 at 07:30:58PM +0100, Julien BLACHE wrote: I am hereby asking the tech-ctte to decide how the kernel ABI should be managed. Hi Julien, from the bug log it's pretty clear that there was no possibilities of agreement between you and the kernel team, so thanks for bringing this issue to tech-ctte. I've a question for the kernel team, which might help some investigation of the tech-ctte. There seem to be two intertwined issue here: 1) the general policy of kernel ABI maintenance 2) the specific smp_ops issue You asked ruling about (1), on which there is a clear divergence of opinions between you (as bug reporter / user) and the kernel team (as package maintainers). Of course ruling about (1) will also address (2), one way or the other. Still, (2) is more urgent, as (I agree on that) it will impact upgrade experience of Debian users like Julien, who are forced to use VMWare. No matter who is at fault, the choice about (2) will have an impact on a specific class of users. My question to the kernel team is if, no matter (2), there are *technical* reasons for not reverting the removal of the smp_send_stop symbol. I understand there are political reasons for *not* reverting the change, like reinforcing the position that people should not rely on symbols not exported for out-of-tree modules. I believe it would help the discussion to know whether there are technical blockers to the revert. I think it would be best if this matter would be decided upon before the release of Squeeze, or not too long after it, so as to avoid further breakages in early kernel updates for Squeeze. +1 Just my 0.02€, Cheers. -- Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7 z...@{upsilon.cc,pps.jussieu.fr,debian.org} -- http://upsilon.cc/zack/ Quando anche i santi ti voltano le spalle, | . |. I've fans everywhere ti resta John Fante -- V. Capossela ...| ..: |.. -- C. Adams signature.asc Description: Digital signature
Bug#607368: Please decide how kernel ABI should be managed
On Sun, 2010-12-19 at 20:19 +0100, Stefano Zacchiroli wrote: On Sun, Dec 19, 2010 at 07:30:58PM +0100, Julien BLACHE wrote: I am hereby asking the tech-ctte to decide how the kernel ABI should be managed. Hi Julien, from the bug log it's pretty clear that there was no possibilities of agreement between you and the kernel team, so thanks for bringing this issue to tech-ctte. I've a question for the kernel team, which might help some investigation of the tech-ctte. There seem to be two intertwined issue here: 1) the general policy of kernel ABI maintenance 2) the specific smp_ops issue You asked ruling about (1), on which there is a clear divergence of opinions between you (as bug reporter / user) and the kernel team (as package maintainers). Of course ruling about (1) will also address (2), one way or the other. Still, (2) is more urgent, as (I agree on that) it will impact upgrade experience of Debian users like Julien, who are forced to use VMWare. No matter who is at fault, the choice about (2) will have an impact on a specific class of users. My question to the kernel team is if, no matter (2), there are *technical* reasons for not reverting the removal of the smp_send_stop symbol. I understand there are political reasons for *not* reverting the change, like reinforcing the position that people should not rely on symbols not exported for out-of-tree modules. I believe it would help the discussion to know whether there are technical blockers to the revert. [...] smp_send_stop was never exported in its own right. The change to smp_ops was made as part of this bug fix: commit ae832c21a08514fd11d2d1d6e217c8a537764bb0 Author: Alok Kataria akata...@vmware.com Date: Mon Oct 11 14:37:08 2010 -0700 x86, kexec: Make sure to stop all CPUs before exiting the kernel commit 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce upstream. x86 smp_ops now has a new op, stop_other_cpus which takes a parameter wait this allows the caller to specify if it wants to stop until all the cpus have processed the stop IPI. This is required specifically for the kexec case where we should wait for all the cpus to be stopped before starting the new kernel. We now wait for the cpus to stop in all cases except for panic/kdump where we expect things to be broken and we are doing our best to make things work anyway. This patch fixes a legitimate regression, which was introduced during 2.6.30, by commit id 4ef702c10b5df18ab04921fc252c26421d4d6c75. Signed-off-by: Alok N Kataria akata...@vmware.com LKML-Reference: 1286833028.1372.20.ca...@ank32.eng.vmware.com Cc: Eric W. Biederman ebied...@xmission.com Cc: Jeremy Fitzhardinge jer...@xensource.com Signed-off-by: H. Peter Anvin h...@linux.intel.com Signed-off-by: Greg Kroah-Hartman gre...@suse.de (ooh, irony). Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#607368: Please decide how kernel ABI should be managed
On Sun, Dec 19, 2010 at 08:19:22PM +0100, Stefano Zacchiroli wrote: On Sun, Dec 19, 2010 at 07:30:58PM +0100, Julien BLACHE wrote: I am hereby asking the tech-ctte to decide how the kernel ABI should be managed. Hi Julien, from the bug log it's pretty clear that there was no possibilities of agreement between you and the kernel team, so thanks for bringing this issue to tech-ctte. I've a question for the kernel team, which might help some investigation of the tech-ctte. There seem to be two intertwined issue here: 1) the general policy of kernel ABI maintenance we try to avoid ABI bumps at our best. especially in times of release the ABI is kind of frozen due to d-i requirements. There is no way so shortly before the release we would bump ABI. upstream has no ABI rule best read in Documentation/stable_api_nonsense.txt thus stable updates to indeed change ABI. 2) the specific smp_ops issue You asked ruling about (1), on which there is a clear divergence of opinions between you (as bug reporter / user) and the kernel team (as package maintainers). Of course ruling about (1) will also address (2), one way or the other. Still, (2) is more urgent, as (I agree on that) it will impact upgrade experience of Debian users like Julien, who are forced to use VMWare. No matter who is at fault, the choice about (2) will have an impact on a specific class of users. The submitter shows a clear confusion between the requirements of a shared lib userspace and the linux-2.6 kernel. Furthermore it is indeed quite unclear if said company is not effectively violating GPL and several core dev do indeed think so. -- maks -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org