Re: [RFC 0/3] extend kexec_file_load system call
On Wed, Jul 20, 2016 at 09:35:30AM +0100, Russell King - ARM Linux wrote: > On Wed, Jul 20, 2016 at 01:45:42PM +1000, Balbir Singh wrote: > > > IOW, if your kernel forced signature verification, you should not be > > > able to do sig_enforce=0. If you kernel did not have > > > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway > > > and you are not making it worse using command line. > > > > OK.. I checked and you are right, but that is an example and there are > > other things like security=, thermal.*, nosmep, nosmap that need auditing > > for safety and might hurt the system security if used. I still think > > think that assuming you can pass any command line without breaking security > > is a broken argument. > > Quite, and you don't need to run code in a privileged environment to do > any of that. > > It's also not trivial to protect against: new kernels gain new arguments > which older kernels may not know about. No matter how much protection > is built into older kernels, newer kernels can become vulnerable through > the addition of further arguments. If a new kernel command line option becomes an issue, new kernel can block that in secureboot environment. That way it helps kexec boot as well as regular boot. Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Wed, Jul 20, 2016 at 01:45:42PM +1000, Balbir Singh wrote: > > > > Command line options are not signed. I thought idea behind secureboot > > was to execute only trusted code and command line options don't enforce > > you to execute unsigned code. > > > >> > >> You can set module.sig_enforce=0 and open up the system a bit assuming > >> that you can get a module to load with another attack > > > > IIUC, sig_enforce bool_enable_only so it can only be enabled. Default > > value of it is 0 if CONFIG_MODULE_SIG_FORCE=n. > > > > IOW, if your kernel forced signature verification, you should not be > > able to do sig_enforce=0. If you kernel did not have > > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway > > and you are not making it worse using command line. > > > > OK.. I checked and you are right, but that is an example and there are > other things like security=, thermal.*, nosmep, nosmap that need auditing > for safety and might hurt the system security if used. I still think > think that assuming you can pass any command line without breaking security > is a broken argument. I agree that if some command line option allows running unsigned code at ring 0, then we probably should disable that on secureboot enabled boot. In fact, there were bunch of patches which made things tighter on secureboot enabled machines from matthew garrett. AFAIK, these patches never went upstream. Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 3/3] kexec: extend kexec_file_load system call
On Tue, Jul 19, 2016 at 01:47:28PM +0100, Mark Rutland wrote: > On Tue, Jul 19, 2016 at 08:24:06AM -0400, Vivek Goyal wrote: > > On Tue, Jul 19, 2016 at 11:52:00AM +0100, Mark Rutland wrote: > > > Regardless, this extended syscall changes some underlying assumptions > > > made with the development of kexec_file_load, and I think treating this > > > as an extension is not a great idea. From a user's perspective there is > > > little difference between passing an additional flag or using a > > > different syscall number, so I don't think that we gain much by altering > > > the existing prototype relative to allocating a new syscall number. > > > > If we are providing/opening up additional flags, I can't think what will > > it break. Same flag was invalid in old kernel but new kernel supports > > it and will accept it. So it sounds reasonable to me to add new flags. > > > > If existing users are not broken, then I think it might be a good idea > > to extend existing syscall. Otherwise userspace will have to be modified > > to understand a 3rd syscall also and an additional option will show up > > which asks users to specify which syscall to use. So extending existing > > syscall might keep it little simple for users. > > I don't follow. > > To use the new feature, you have to modify userspace anyway, as you > require userspace to pass information which it did not previously pass > (in the new arguments added to the syscall). > > The presence of a new syscall does not imply the absence of the old > syscall, so you can always use that be default unless the user asks for > asomething only the new syscall provides. Regardless of the > syscall/flags difference, you still have to detect whether the new > functionality is present somehow. > Hmm., so current idea is that we have two syscalls() which are *ideally* supposed to work for all arches. Difference between two is that first one does not support kernel signature verification while second one does. By default old syscall is used and user can force using new syscall using option --kexec-file-load. If a user DTB is present, I was hoping that it will continue to work the same way. Both the sycalls can be used and can handle DTB. If we introduce a 3rd syscall, that means only first and 3rd syscall can handle DTB and we need to introduce one more option which tells whether to use kexec_load() or use the 3rd new syscall. And that's what I am trying to avoid. Vivek > > BTW, does kexec_load() needs to be modified too to handle DT? > > No, at least for arm64. In the kexec_load case userspace provides the > DTB as a raw segment, and the user-provided purgatory sets up registers > to pass that to the new kernel. > > Thanks, > Mark. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 3/3] kexec: extend kexec_file_load system call
On Tue, Jul 19, 2016 at 11:52:00AM +0100, Mark Rutland wrote: > On Tue, Jul 19, 2016 at 08:55:56AM +0800, Dave Young wrote: > > On 07/18/16 at 11:07am, Mark Rutland wrote: > > > On Mon, Jul 18, 2016 at 10:30:24AM +0800, Dave Young wrote: > > > > I do not think it is worth to add another syscall for extra fds. > > > > We have open(2) as an example for different numbers of arguments > > > > already. > > > > > > Did we change the syscall interface for that? > > > > > > I was under the impression that there was always one underlying syscall, > > > and the C library did the right thing to pass the expected information > > > to the underlying syscall. > > > > I'm not sure kexec_load and kexec_file_load were included in glibc, we use > > syscall directly in kexec-tools. > > > > kexec_load man pages says there are no wrappers for both kexec_load and > > kexec_file_load in glibc. > > For the above, I was talking about how open() was handled. > > If there are no userspace wrappers, then the two cases aren't comparable > in the first place... > > > > That's rather different to changing the underlying syscall. > > > > > > Regardless of how this is wrapped in userspace, I do not think modifying > > > the existing prototype is a good idea, and I think this kind of > > > extension needs to be a new syscall. > > > > Hmm, as I replied to Vivek, there is one case about the flags, previously > > the new flag will be regarded as invalid, but not we extend it it will be > > valid, this maybe the only potential bad case. > > It's true that adding suport for new flags will change the behaviour of > what used to be error cases. We generally expect real users to not be > making pointless calls for which they rely on an error being returned in > all cases. > > Regardless, this extended syscall changes some underlying assumptions > made with the development of kexec_file_load, and I think treating this > as an extension is not a great idea. From a user's perspective there is > little difference between passing an additional flag or using a > different syscall number, so I don't think that we gain much by altering > the existing prototype relative to allocating a new syscall number. If we are providing/opening up additional flags, I can't think what will it break. Same flag was invalid in old kernel but new kernel supports it and will accept it. So it sounds reasonable to me to add new flags. If existing users are not broken, then I think it might be a good idea to extend existing syscall. Otherwise userspace will have to be modified to understand a 3rd syscall also and an additional option will show up which asks users to specify which syscall to use. So extending existing syscall might keep it little simple for users. This is only if conclusion in the end is that DT needs to be passed in from user space. BTW, does kexec_load() needs to be modified too to handle DT? Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Mon, Jul 18, 2016 at 09:26:29AM -0400, Vivek Goyal wrote: > On Mon, Jul 18, 2016 at 10:46:04PM +1000, Balbir Singh wrote: > > On Wed, 2016-07-13 at 14:22 -0400, Vivek Goyal wrote: > > > On Wed, Jul 13, 2016 at 06:40:10PM +0100, Russell King - ARM Linux wrote: > > > > > > > > On Wed, Jul 13, 2016 at 09:03:38AM -0400, Vivek Goyal wrote: > > > > > > > > > > On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux > > > > > wrote: > > > > > > > > > > > > Indeed - maybe Eric knows better, but I can't see any situation > > > > > > where > > > > > > the dtb we load via kexec should ever affect "the bootloader", > > > > > > unless > > > > > > the "kernel" that's being loaded into kexec is "the bootloader". > > > > > > > > > > > > Now, going back to the more fundamental issue raised in my first > > > > > > reply, > > > > > > about the kernel command line. > > > > > > > > > > > > On x86, I can see that it _is_ possible for userspace to specify a > > > > > > command line, and the kernel loading the image provides the command > > > > > > line to the to-be-kexeced kernel with very little checking. So, if > > > > > > your kernel is signed, what stops the "insecure userspace" loading > > > > > > a signed kernel but giving it an insecure rootfs and/or console? > > > > > It is not kexec specific. I could do this for regular boot too, right? > > > > > > > > > > Command line options are not signed. I thought idea behind secureboot > > > > > was to execute only trusted code and command line options don't > > > > > enforce > > > > > you to execute unsigned code. > > > > > > > > > You can set module.sig_enforce=0 and open up the system a bit assuming > > that you can get a module to load with another attack > > IIUC, sig_enforce bool_enable_only so it can only be enabled. Default > value of it is 0 if CONFIG_MODULE_SIG_FORCE=n. > > IOW, if your kernel forced signature verification, you should not be > able to do sig_enforce=0. If you kernel did not have > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway > and you are not making it worse using command line. [ CC Matthew Garrett ] I think on top of this there were patches by Matthew Garrett, which disallowed loading of unsigned modules if booted with secureboot on. I think those patches never made upstream though. Vivek > > > > > > > > So it sounds like different class of security problems which you are > > > > > referring to and not necessarily covered by secureboot or signed > > > > > kernel. > > > > Let me give you an example. > > > > > > > > You have a secure boot setup, where the firmware/ROM validates the boot > > > > loader. Good, the boot loader hasn't been tampered with. > > > > > > > > You interrupt the boot loader and are able to modify the command line > > > > for the booted kernel. > > > > > > > > The boot loader loads the kernel and verifies the kernel's signature. > > > > Good, the kernel hasn't been tampered with. The kernel starts running. > > > > > > > > You've plugged in a USB drive to the device, and specified a partition > > > > containing a root filesystem that you control to the kernel. The > > > > validated kernel finds the USB drive, and mounts it, and executes > > > > your own binaries on the USB drive. > > > You will require physical access to the machine to be able to > > > insert your usb drive. And IIRC, argument was that if attacker has > > > physical access to machine, all bets are off anyway. > > > > > > > You don't need physical access -- your machine controller BMC can > > do the magic for you. So its not always physical access, is it? > > Well, idea was that if you have physical access to machine, then all > bets are off. If BMC can do something which allows running unsigned > code at ring level 0, its a problem I think from secureboot model of > security. > > > > > > > > > > > > > > > You run a shell on the console. You now have control of the system, > > > > and can mount the real rootfs, inspect it, and work out what it does, > > > > etc. > > > > > > > > At this point, what use was all the validation that the secure boot > > > > has done? Absolutely useless. > > > > > > > > If you can change the command line arguments given to the kernel, you > > > > have no security, no matter how much you verify signatures. It's > > > > the illusion of security, nothing more, nothing less. > > > > > > > > I agree, if you can change command line arguments, all bets are of lesser > > value > > If changing command line allows execution of unsigned code at ring level > 0, then it is a problem. Otherwise we are talking of security issues which > are not covered by secureboot model. > > Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Mon, Jul 18, 2016 at 10:46:04PM +1000, Balbir Singh wrote: > On Wed, 2016-07-13 at 14:22 -0400, Vivek Goyal wrote: > > On Wed, Jul 13, 2016 at 06:40:10PM +0100, Russell King - ARM Linux wrote: > > > > > > On Wed, Jul 13, 2016 at 09:03:38AM -0400, Vivek Goyal wrote: > > > > > > > > On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux > > > > wrote: > > > > > > > > > > Indeed - maybe Eric knows better, but I can't see any situation where > > > > > the dtb we load via kexec should ever affect "the bootloader", unless > > > > > the "kernel" that's being loaded into kexec is "the bootloader". > > > > > > > > > > Now, going back to the more fundamental issue raised in my first > > > > > reply, > > > > > about the kernel command line. > > > > > > > > > > On x86, I can see that it _is_ possible for userspace to specify a > > > > > command line, and the kernel loading the image provides the command > > > > > line to the to-be-kexeced kernel with very little checking. So, if > > > > > your kernel is signed, what stops the "insecure userspace" loading > > > > > a signed kernel but giving it an insecure rootfs and/or console? > > > > It is not kexec specific. I could do this for regular boot too, right? > > > > > > > > Command line options are not signed. I thought idea behind secureboot > > > > was to execute only trusted code and command line options don't enforce > > > > you to execute unsigned code. > > > > > > You can set module.sig_enforce=0 and open up the system a bit assuming > that you can get a module to load with another attack IIUC, sig_enforce bool_enable_only so it can only be enabled. Default value of it is 0 if CONFIG_MODULE_SIG_FORCE=n. IOW, if your kernel forced signature verification, you should not be able to do sig_enforce=0. If you kernel did not have CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway and you are not making it worse using command line. > > > > > So it sounds like different class of security problems which you are > > > > referring to and not necessarily covered by secureboot or signed > > > > kernel. > > > Let me give you an example. > > > > > > You have a secure boot setup, where the firmware/ROM validates the boot > > > loader. Good, the boot loader hasn't been tampered with. > > > > > > You interrupt the boot loader and are able to modify the command line > > > for the booted kernel. > > > > > > The boot loader loads the kernel and verifies the kernel's signature. > > > Good, the kernel hasn't been tampered with. The kernel starts running. > > > > > > You've plugged in a USB drive to the device, and specified a partition > > > containing a root filesystem that you control to the kernel. The > > > validated kernel finds the USB drive, and mounts it, and executes > > > your own binaries on the USB drive. > > You will require physical access to the machine to be able to > > insert your usb drive. And IIRC, argument was that if attacker has > > physical access to machine, all bets are off anyway. > > > > You don't need physical access -- your machine controller BMC can > do the magic for you. So its not always physical access, is it? Well, idea was that if you have physical access to machine, then all bets are off. If BMC can do something which allows running unsigned code at ring level 0, its a problem I think from secureboot model of security. > > > > > > > > > > You run a shell on the console. You now have control of the system, > > > and can mount the real rootfs, inspect it, and work out what it does, > > > etc. > > > > > > At this point, what use was all the validation that the secure boot > > > has done? Absolutely useless. > > > > > > If you can change the command line arguments given to the kernel, you > > > have no security, no matter how much you verify signatures. It's > > > the illusion of security, nothing more, nothing less. > > > > > I agree, if you can change command line arguments, all bets are of lesser > value If changing command line allows execution of unsigned code at ring level 0, then it is a problem. Otherwise we are talking of security issues which are not covered by secureboot model. Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Fri, Jul 15, 2016 at 09:31:02AM +0200, Arnd Bergmann wrote: > On Thursday, July 14, 2016 10:44:14 PM CEST Thiago Jung Bauermann wrote: > > Am Donnerstag, 14 Juli 2016, 10:29:11 schrieb Arnd Bergmann: > > > > > > > Right, but the question remains whether this helps while you allow the > > > boot loader to modify the dtb. If an attacker gets in and cannot modify > > > the kernel or initid but can modify the DT, a successful attack would > > > be a bit harder than having a modified kernel, but you may still need > > > to treat the system as compromised. > > > > Yes, and the same question also remains regarding the kernel command line. > > > > We can have the kernel perform sanity checks on the device tree, just as > > the > > kernel needs to sanity check the command line. > > > > There's the point that was raised about not wanting to increase the attack > > surface, and that's a valid point. But at least in the way Petitboot works > > today, it needs to modify the device tree and pass it to the kernel. > > > > One thing that is unavoidable to come from userspace is > > /chosen/linux,stdout-path, because it's Petitboot that knows from which > > console the user is interacting with. The other modification to set > > properties in vga@0 can be done in the kernel. > > > > Given that on DTB-based systems /chosen is an important and established way > > to pass information to the operating system being booted, I'd like to > > suggest the following, then: > > > > Extend the syscall as shown in this RFC from Takahiro AKASHI, but instead > > of > > accepting a complete DTB from userspace, the syscall would accept a DTB > > containing only a /chosen node. If the DTB contains any other node, the > > syscall fails with EINVAL. The kernel can then add the properties in > > /chosen > > to the device tree that it will pass to the next kernel. > > > > What do you think? > > I think that helps, as it makes the problem space correspond to that > of modifying the command line, but I can still come up with countless > attacks based on modifications of the /chosen node and/or the command > line, in fact it's probably easier than any other node. I don't know anything about DTB. So here comes a very basic question. Does DTB allow passing an executable blob to kernel or pass the location of some unsigned executable code at kernel level. I think from secureboot point of view that would be a concern. Being able to trick kernel to execute an unsigned code at privileged level. Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 3/3] kexec: extend kexec_file_load system call
On Tue, Jul 12, 2016 at 10:42:01AM +0900, AKASHI Takahiro wrote: [..] > -SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, > +SYSCALL_DEFINE6(kexec_file_load, int, kernel_fd, int, initrd_fd, > unsigned long, cmdline_len, const char __user *, cmdline_ptr, > - unsigned long, flags) > + unsigned long, flags, const struct kexec_fdset __user *, ufdset) Can one add more parameters to existing syscall. Can it break existing programs with new kernel? I was of the impression that one can't do that. But may be I am missing something. Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Fri, Jul 15, 2016 at 09:49:25AM +0100, Russell King - ARM Linux wrote: > On Wed, Jul 13, 2016 at 03:13:42PM +0200, Arnd Bergmann wrote: > > On Wednesday, July 13, 2016 10:41:28 AM CEST Mark Rutland wrote: > > > The big question is whether this is a realistic case on a secure boot > > > system. > > > > What does x86 do here? I assume changes to the command line are also > > limited. > > They aren't. You can specify /anything/ even with a fully-signed kernel > and initrd, which was one of the things I pointed out in my previous > set of responses. Yes, kernel command line is not signed. For that matter even initird is not signed. Just kernel is signed and its signatures are verified. Idea is an unsigned code should not be able to execute in kernel space. Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Wed, Jul 13, 2016 at 06:40:10PM +0100, Russell King - ARM Linux wrote: > On Wed, Jul 13, 2016 at 09:03:38AM -0400, Vivek Goyal wrote: > > On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux wrote: > > > Indeed - maybe Eric knows better, but I can't see any situation where > > > the dtb we load via kexec should ever affect "the bootloader", unless > > > the "kernel" that's being loaded into kexec is "the bootloader". > > > > > > Now, going back to the more fundamental issue raised in my first reply, > > > about the kernel command line. > > > > > > On x86, I can see that it _is_ possible for userspace to specify a > > > command line, and the kernel loading the image provides the command > > > line to the to-be-kexeced kernel with very little checking. So, if > > > your kernel is signed, what stops the "insecure userspace" loading > > > a signed kernel but giving it an insecure rootfs and/or console? > > > > It is not kexec specific. I could do this for regular boot too, right? > > > > Command line options are not signed. I thought idea behind secureboot > > was to execute only trusted code and command line options don't enforce > > you to execute unsigned code. > > > > So it sounds like different class of security problems which you are > > referring to and not necessarily covered by secureboot or signed > > kernel. > > Let me give you an example. > > You have a secure boot setup, where the firmware/ROM validates the boot > loader. Good, the boot loader hasn't been tampered with. > > You interrupt the boot loader and are able to modify the command line > for the booted kernel. > > The boot loader loads the kernel and verifies the kernel's signature. > Good, the kernel hasn't been tampered with. The kernel starts running. > > You've plugged in a USB drive to the device, and specified a partition > containing a root filesystem that you control to the kernel. The > validated kernel finds the USB drive, and mounts it, and executes > your own binaries on the USB drive. You will require physical access to the machine to be able to insert your usb drive. And IIRC, argument was that if attacker has physical access to machine, all bets are off anyway. > > You run a shell on the console. You now have control of the system, > and can mount the real rootfs, inspect it, and work out what it does, > etc. > > At this point, what use was all the validation that the secure boot > has done? Absolutely useless. > > If you can change the command line arguments given to the kernel, you > have no security, no matter how much you verify signatures. It's > the illusion of security, nothing more, nothing less. > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Wed, Jul 13, 2016 at 09:45:22AM +1000, Stewart Smith wrote: > Vivek Goyal <vgo...@redhat.com> writes: > > On Tue, Jul 12, 2016 at 10:58:09AM -0300, Thiago Jung Bauermann wrote: > >> Hello Eric, > >> > >> Am Dienstag, 12 Juli 2016, 08:25:48 schrieb Eric W. Biederman: > >> > AKASHI Takahiro <takahiro.aka...@linaro.org> writes: > >> > > Device tree blob must be passed to a second kernel on DTB-capable > >> > > archs, like powerpc and arm64, but the current kernel interface > >> > > lacks this support. > >> > > > >> > > This patch extends kexec_file_load system call by adding an extra > >> > > argument to this syscall so that an arbitrary number of file > >> > > descriptors > >> > > can be handed out from user space to the kernel. > >> > > > >> > > See the background [1]. > >> > > > >> > > Please note that the new interface looks quite similar to the current > >> > > system call, but that it won't always mean that it provides the "binary > >> > > compatibility." > >> > > > >> > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html > >> > > >> > So this design is wrong. The kernel already has the device tree blob, > >> > you should not be extracting it from the kernel munging it, and then > >> > reinserting it in the kernel if you want signatures and everything to > >> > pass. > >> > > >> > What x86 does is pass it's equivalent of the device tree blob from one > >> > kernel to another directly and behind the scenes. It does not go > >> > through userspace for this. > >> > > >> > Until a persuasive case can be made for going around the kernel and > >> > probably adding a feature (like code execution) that can be used to > >> > defeat the signature scheme I am going to nack this. > >> > >> There are situations where userspace needs to change things in the device > >> tree to be used by the next kernel. > >> > >> For example, Petitboot (the boot loader used in OpenPOWER machines) is a > >> userspace application running in an intermediary Linux instance and uses > >> kexec to load the target OS. It has to modify the device tree that will be > >> used by the next kernel so that the next kernel uses the same console that > >> petitboot was configured to use (i.e., set the /chosen/linux,stdout-path > >> property). It also modifies the device tree to allow the kernel to inherit > >> Petitboot's Openfirmware framebuffer. > > > > Can some of this be done with the help of kernel command line options for > > second kernel? > > how would this be any more secure? > > Passing in an address for a framebuffer via command line option means > you could scribble over any bit of memory, which is the same kind of > damage you could do by modifying the device tree. It is not necessarily safer but works with given framework and we don't have to modify existing system call. Also it will allow you to pass in only one thing at a time instead of allowing passing in new unsigned DTB, which can potentially do lot more. Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Wed, Jul 13, 2016 at 09:41:39AM +1000, Stewart Smith wrote: > Petr Tesarikwrites: > > On Tue, 12 Jul 2016 13:25:11 -0300 > > Thiago Jung Bauermann wrote: > > > >> Hi Eric, > >> > >> I'm trying to understand your concerns leading to your nack. I hope you > >> don't mind expanding your thoughts on them a bit. > >> > >> Am Dienstag, 12 Juli 2016, 08:25:48 schrieb Eric W. Biederman: > >> > AKASHI Takahiro writes: > >> > > Device tree blob must be passed to a second kernel on DTB-capable > >> > > archs, like powerpc and arm64, but the current kernel interface > >> > > lacks this support. > >> > > > >> > > This patch extends kexec_file_load system call by adding an extra > >> > > argument to this syscall so that an arbitrary number of file > >> > > descriptors > >> > > can be handed out from user space to the kernel. > >> > > > >> > > See the background [1]. > >> > > > >> > > Please note that the new interface looks quite similar to the current > >> > > system call, but that it won't always mean that it provides the "binary > >> > > compatibility." > >> > > > >> > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html > >> > > >> > So this design is wrong. The kernel already has the device tree blob, > >> > you should not be extracting it from the kernel munging it, and then > >> > reinserting it in the kernel if you want signatures and everything to > >> > pass. > >> > >> I don't understand how the kernel signature will be invalidated. > >> > >> There are some types of boot images that can embed a device tree blob in > >> them, but the kernel can also be handed a separate device tree blob from > >> firmware, the boot loader, or kexec. This latter case is what we are > >> discussing, so we are not talking about modifying an embedded blob in the > >> kernel image. > >> > >> > What x86 does is pass it's equivalent of the device tree blob from one > >> > kernel to another directly and behind the scenes. It does not go > >> > through userspace for this. > >> > > >> > Until a persuasive case can be made for going around the kernel and > >> > probably adding a feature (like code execution) that can be used to > >> > defeat the signature scheme I am going to nack this. > >> > >> I also don't understand what you mean by code execution. How does passing > >> a > >> device tree blob via kexec enables code execution? How can the signature > >> scheme be defeated? > > > > I'm not an expert on DTB, so I can't provide an example of code > > execution, but you have already mentioned the /chosen/linux,stdout-path > > property. If an attacker redirects the bootloader to an insecure > > console, they may get access to the system that would otherwise be > > impossible. > > In this case, the user is sitting at the (or one of the) console(s) of > the machine. There could be petitboot UIs running on the VGA display, > IPMI serial over lan, local serial port. The logic behind setting > /chosen/linux,stdout-path is (currently) mostly to set it for the kernel > to what the user is interacting with. i.e. if you select an OS installer > to boot from the VGA console, you get a graphical installer running and > if you selected it from a text console, you get a text installer running > (on the appropriate console). > > So the bootloader (petitboot) needs to work out which console is being > interacted with in order to set up /chosen/linux,stdout-path correctly. > > This specific option could be passed as a kernel command line to the > next kernel, yes. However, isn't the kernel command line also an attack > vector? Is *every* command line option safe? I don't think kernel command line is signed. And we will have to define what is considered *unsafe*. I am working on the assumption that a user should not be able to force execution of unsigned code at provileged level. And passing console on kernel command line should be safe in that respect? Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux wrote: > On Wed, Jul 13, 2016 at 05:55:33PM +1000, Stewart Smith wrote: > > Russell King - ARM Linuxwrites: > > > On Wed, Jul 13, 2016 at 02:59:51PM +1000, Stewart Smith wrote: > > >> Russell King - ARM Linux writes: > > >> > On Tue, Jul 12, 2016 at 10:58:05PM +0200, Petr Tesarik wrote: > > >> >> I'm not an expert on DTB, so I can't provide an example of code > > >> >> execution, but you have already mentioned the > > >> >> /chosen/linux,stdout-path > > >> >> property. If an attacker redirects the bootloader to an insecure > > >> >> console, they may get access to the system that would otherwise be > > >> >> impossible. > > >> > > > >> > I fail to see how kexec connects with the boot loader - the DTB image > > >> > that's being talked about is one which is passed from the currently > > >> > running kernel to the to-be-kexec'd kernel. For ARM (and I suspect > > >> > also ARM64) that's a direct call chain which doesn't involve any > > >> > boot loader or firmware, and certainly none that would involve the > > >> > passed DTB image. > > >> > > >> For OpenPOWER machines, kexec is the bootloader. Our bootloader is a > > >> linux kernel and initramfs with a UI (petitboot) - this means we never > > >> have to write a device driver twice: write a kernel one and you're done > > >> (for booting from the device and using it in your OS). > > > > > > I think you misunderstood my point. > > > > > > On ARM, we do not go: > > > > > > kernel (kexec'd from) -> boot loader -> kernel (kexec'd to) > > > > > > but we go: > > > > > > kernel (kexec'd from) -> kernel (kexec'd to) > > > > > > There's no intermediate step involving any bootloader. > > > > > > Hence, my point is that the dtb loaded by kexec is _only_ used by the > > > kernel which is being kexec'd to, not by the bootloader, nor indeed > > > the kernel which it is loaded into. > > > > > > Moreover, if you read the bit that I quoted (which is what I was > > > replying to), you'll notice that it is talking about the DTB loaded > > > by kexec somehow causing the _bootloader_ to be redirected to an > > > alternative console. This point is wholely false on ARM. > > > > Ahh.. I missed the bootloader bit there. > > > > In which case, we're the same on OpenPOWER, there is no intermediate > > bootloader - in our case we have linux (with kexec) taking on what uboot > > or grub is typically used for on other platforms. > > Indeed - maybe Eric knows better, but I can't see any situation where > the dtb we load via kexec should ever affect "the bootloader", unless > the "kernel" that's being loaded into kexec is "the bootloader". > > Now, going back to the more fundamental issue raised in my first reply, > about the kernel command line. > > On x86, I can see that it _is_ possible for userspace to specify a > command line, and the kernel loading the image provides the command > line to the to-be-kexeced kernel with very little checking. So, if > your kernel is signed, what stops the "insecure userspace" loading > a signed kernel but giving it an insecure rootfs and/or console? It is not kexec specific. I could do this for regular boot too, right? Command line options are not signed. I thought idea behind secureboot was to execute only trusted code and command line options don't enforce you to execute unsigned code. So it sounds like different class of security problems which you are referring to and not necessarily covered by secureboot or signed kernel. Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Tue, Jul 12, 2016 at 04:02:46PM +0200, Arnd Bergmann wrote: > On Tuesday, July 12, 2016 8:25:48 AM CEST Eric W. Biederman wrote: > > AKASHI Takahirowrites: > > > > > Device tree blob must be passed to a second kernel on DTB-capable > > > archs, like powerpc and arm64, but the current kernel interface > > > lacks this support. > > > > > > This patch extends kexec_file_load system call by adding an extra > > > argument to this syscall so that an arbitrary number of file descriptors > > > can be handed out from user space to the kernel. > > > > > > See the background [1]. > > > > > > Please note that the new interface looks quite similar to the current > > > system call, but that it won't always mean that it provides the "binary > > > compatibility." > > > > > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html > > > > So this design is wrong. The kernel already has the device tree blob, > > you should not be extracting it from the kernel munging it, and then > > reinserting it in the kernel if you want signatures and everything to > > pass. > > > > What x86 does is pass it's equivalent of the device tree blob from one > > kernel to another directly and behind the scenes. It does not go > > through userspace for this. > > > > Until a persuasive case can be made for going around the kernel and > > probably adding a feature (like code execution) that can be used to > > defeat the signature scheme I am going to nack this. > > > > Nacked-by: "Eric W. Biederman" > > > > I am happy to see support for other architectures, but for the sake of > > not moving some code in the kernel let's not build an attackable > > infrastructure. > > > > For historic context, the flattened devicetree format that we now use > to pass data about the system from boot loader to kernel was initially > introduced specifically for the purpose of enabling kexec: > > On Open Firmware, the DT is extracted from running firmware and copied > into dynamically allocated data structures. After a kexec, the runtime > interface to the firmware is not available, so the flattened DT format > was created as a way to pass the same data in a binary blob to the new > kernel in a format that can be read from the kernel by walking the > directories in /proc/device-tree/*. So this DT is available inside kernel and running kernel can still retrieve it and pass it to second kernel? > > There are a couple of reasons for modifying the devicetree: > > - For kboot/petitboot, you can have a kernel that is not booted through > DT at all but hardwired to a particular machine, and that passes > a DT for the entire hardware to the kernel that you actually want to > run. > > - for kdump, you need to tell the new kernel about the modified location > of the memory, so the dump kernel doesn't overwrite the contents > it wants to dump In x86 we do this with the help of kernel command line options. > > - we typically ship devicetree sources for embedded machines with the > kernel sources. As more hardware of the system gets enabled, the > devicetree gains extra nodes and properties that describe the hardware > more completely, so we need to use the latest DT blob to use all > the drivers > > - in some cases, kernels will fail to boot at all with an older version > of the DT, or fail to use the devices that were working on the > earlier kernel. This is usually considered a bug, but it's not rare > > - In some cases, the kernel can update its DT at runtime, and the new > settings are expected to be available in the new kernel too, though > there are cases where you actually don't want the modified contents. I am assuming that modified DT and unmodifed one both are accessible to kernel. And if user space can make decisions which modfied fields to use for new kernels and which ones not, then same can be done in kernel too? Vivek > > Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 0/3] extend kexec_file_load system call
On Tue, Jul 12, 2016 at 10:58:09AM -0300, Thiago Jung Bauermann wrote: > Hello Eric, > > Am Dienstag, 12 Juli 2016, 08:25:48 schrieb Eric W. Biederman: > > AKASHI Takahirowrites: > > > Device tree blob must be passed to a second kernel on DTB-capable > > > archs, like powerpc and arm64, but the current kernel interface > > > lacks this support. > > > > > > This patch extends kexec_file_load system call by adding an extra > > > argument to this syscall so that an arbitrary number of file descriptors > > > can be handed out from user space to the kernel. > > > > > > See the background [1]. > > > > > > Please note that the new interface looks quite similar to the current > > > system call, but that it won't always mean that it provides the "binary > > > compatibility." > > > > > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html > > > > So this design is wrong. The kernel already has the device tree blob, > > you should not be extracting it from the kernel munging it, and then > > reinserting it in the kernel if you want signatures and everything to > > pass. > > > > What x86 does is pass it's equivalent of the device tree blob from one > > kernel to another directly and behind the scenes. It does not go > > through userspace for this. > > > > Until a persuasive case can be made for going around the kernel and > > probably adding a feature (like code execution) that can be used to > > defeat the signature scheme I am going to nack this. > > There are situations where userspace needs to change things in the device > tree to be used by the next kernel. > > For example, Petitboot (the boot loader used in OpenPOWER machines) is a > userspace application running in an intermediary Linux instance and uses > kexec to load the target OS. It has to modify the device tree that will be > used by the next kernel so that the next kernel uses the same console that > petitboot was configured to use (i.e., set the /chosen/linux,stdout-path > property). It also modifies the device tree to allow the kernel to inherit > Petitboot's Openfirmware framebuffer. Can some of this be done with the help of kernel command line options for second kernel? Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available
On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote: On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote: dwal...@fifo99.com writes: On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote: Hidehiro Kawai hidehiro.kawai...@hitachi.com writes: You can call panic notifiers and kmsg dumpers before kdump by specifying crash_kexec_post_notifiers as a boot parameter. However, it doesn't make sense if kdump is not available. In that case, disable crash_kexec_post_notifiers boot parameter so that you can't change the value of the parameter. Nacked-by: Eric W. Biederman ebied...@xmission.com I think it would make sense if he just replaced kdump with kexec. It would be less insane, however it still makes no sense as without kexec on panic support crash_kexec is a noop. So the value of the seeting makes no difference. Can you explain more, I don't really understand what you mean. Are you suggesting the whole crash_kexec_post_notifiers feature has no value ? Daniel, BTW, why are you using crash_kexec_post_notifiers commandline? Why not without it? Thanks Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available
On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote: On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote: dwal...@fifo99.com writes: On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote: Hidehiro Kawai hidehiro.kawai...@hitachi.com writes: You can call panic notifiers and kmsg dumpers before kdump by specifying crash_kexec_post_notifiers as a boot parameter. However, it doesn't make sense if kdump is not available. In that case, disable crash_kexec_post_notifiers boot parameter so that you can't change the value of the parameter. Nacked-by: Eric W. Biederman ebied...@xmission.com I think it would make sense if he just replaced kdump with kexec. It would be less insane, however it still makes no sense as without kexec on panic support crash_kexec is a noop. So the value of the seeting makes no difference. Can you explain more, I don't really understand what you mean. Are you suggesting the whole crash_kexec_post_notifiers feature has no value ? If CONFIG_KEXEC=n, then crash_kexec() is a nop. So it does not matter whether crash_kexec() is called before panic notifiers or after. IOW, what do you gain by disabling crash_kexec_post_notifiers, in case of CONFIG_KEXEC=n? Thanks Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available
On Tue, Jul 14, 2015 at 03:34:30PM +, dwal...@fifo99.com wrote: On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote: On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote: On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote: dwal...@fifo99.com writes: On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote: Hidehiro Kawai hidehiro.kawai...@hitachi.com writes: You can call panic notifiers and kmsg dumpers before kdump by specifying crash_kexec_post_notifiers as a boot parameter. However, it doesn't make sense if kdump is not available. In that case, disable crash_kexec_post_notifiers boot parameter so that you can't change the value of the parameter. Nacked-by: Eric W. Biederman ebied...@xmission.com I think it would make sense if he just replaced kdump with kexec. It would be less insane, however it still makes no sense as without kexec on panic support crash_kexec is a noop. So the value of the seeting makes no difference. Can you explain more, I don't really understand what you mean. Are you suggesting the whole crash_kexec_post_notifiers feature has no value ? Daniel, BTW, why are you using crash_kexec_post_notifiers commandline? Why not without it? It was explained in the prior thread but to rehash, the notifiers are used to do a switch over from the crashed machine to another redundant machine. So why not detect failure using polling or issue notifications from second kernel. IOW, expecting that a crashed machine will be able to deliver notification reliably is falwed to begin with, IMHO. If a machine is failing, there are high chance it can't deliver you the notification. Detecting that failure suing some kind of polling mechanism might be more reliable. And it will make even kdump mechanism more reliable so that it does not have to run panic notifiers after the crash. Thanks Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available
On Tue, Jul 14, 2015 at 03:48:33PM +, dwal...@fifo99.com wrote: On Tue, Jul 14, 2015 at 11:40:40AM -0400, Vivek Goyal wrote: On Tue, Jul 14, 2015 at 03:34:30PM +, dwal...@fifo99.com wrote: On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote: On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote: On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote: dwal...@fifo99.com writes: On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote: Hidehiro Kawai hidehiro.kawai...@hitachi.com writes: You can call panic notifiers and kmsg dumpers before kdump by specifying crash_kexec_post_notifiers as a boot parameter. However, it doesn't make sense if kdump is not available. In that case, disable crash_kexec_post_notifiers boot parameter so that you can't change the value of the parameter. Nacked-by: Eric W. Biederman ebied...@xmission.com I think it would make sense if he just replaced kdump with kexec. It would be less insane, however it still makes no sense as without kexec on panic support crash_kexec is a noop. So the value of the seeting makes no difference. Can you explain more, I don't really understand what you mean. Are you suggesting the whole crash_kexec_post_notifiers feature has no value ? Daniel, BTW, why are you using crash_kexec_post_notifiers commandline? Why not without it? It was explained in the prior thread but to rehash, the notifiers are used to do a switch over from the crashed machine to another redundant machine. So why not detect failure using polling or issue notifications from second kernel. IOW, expecting that a crashed machine will be able to deliver notification reliably is falwed to begin with, IMHO. It's flawed to think you can kexec, but you still do it right ? I've not gotten into the deep details of this switching process, but that's how this interface is used. Sure. But the deal here is that users of interface know that sometimes it can be unreliable. And in the absence of more reliable mechanism, somewhat less reliable mechanism is fine. If a machine is failing, there are high chance it can't deliver you the notification. Detecting that failure suing some kind of polling mechanism might be more reliable. And it will make even kdump mechanism more reliable so that it does not have to run panic notifiers after the crash. I think what your suggesting is that my company should change how it's hardware works and that's not really an option for me. This isn't a simple thing like checking over the network if the machine is down or not, this is way more complex hardware design. That means you are ready to live with an unreliable design. There might be cases where notifier does not get run properly and you will not do switch despite the fact that OS has failed. I was just trying to nudge you in a direction which could be more reliable mechanism. Thanks Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] kexec: Change the timing of callbacks related to crash_kexec_post_notifiers boot option
On Fri, Jul 10, 2015 at 08:33:31PM +0900, Hidehiro Kawai wrote: This patch fixes problems reported by Daniel Walker (https://lkml.org/lkml/2015/6/24/44), and also replaces the bug fix commits 5375b70 and f45d85f. If crash_kexec_post_notifiers boot option is specified, other cpus are stopped by smp_send_stop() before entering crash_kexec(), while usually machine_crash_shutdown() called by crash_kexec() does that. This behavior change leads two problems. Problem 1: Some function in the crash_kexec() path depend on other cpus being still online. If other cpus have been offlined already, they doesn't work properly. Example: panic() crash_kexec() machine_crash_shutdown() octeon_generic_shutdown() // shutdown watchdog for ONLINE cpus machine_kexec() Problem 2: Most of architectures stop other cpus in the machine_crash_shutdown() path and save register information at the same time. However, if smp_send_stop() is called before that, we can't save the register information. To solve these problems, this patch changes the timing of calling the callbacks instead of changing the timing of crash_kexec() if crash_kexec_post_notifiers boot option is specified. Before: if (!crash_kexec_post_notifiers) crash_kexec() smp_send_stop() atomic_notifier_call_chain() kmsg_dump() if (crash_kexec_post_notifiers) crash_kexec() After: crash_kexec() machine_crash_shutdown() if (crash_kexec_post_notifiers) { atomic_notifier_call_chain() kmsg_dump() } machine_kexec() smp_send_stop() if (!crash_kexec_post_notifiers) { atomic_notifier_call_chain() kmsg_dump() } I think this new code flow looks bad. Now we are calling kmsg_dump() and atomic_notifier_call_chain() from inside the crash_kexec() as well as from inside panic(). This is bad. So basic problem seems to be that cpus need to be stopped once (with or without panic notifiers. So why don't we look into desiginig a function which stops cpus, saves register states first and then does rest of the processing. Something like. stop_cpus_save_register_state; if (!crash_kexec_post_notifiers) crash_kexec() atomic_notifier_call_chain() kmsg_dump() Here crash_kexec() will have to be modified and it will assume that cpus have already been stopped and register states have already been saved. IOW, is there a reason that we can't get rid of smp_send_stop() and use the mechanism crash_kexec() is using to stop cpus after panic()? Thanks Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available
On Tue, Jul 14, 2015 at 01:01:12PM -0500, Eric W. Biederman wrote: Vivek Goyal vgo...@redhat.com writes: On Tue, Jul 14, 2015 at 05:29:53PM +, dwal...@fifo99.com wrote: [..] If a machine is failing, there are high chance it can't deliver you the notification. Detecting that failure suing some kind of polling mechanism might be more reliable. And it will make even kdump mechanism more reliable so that it does not have to run panic notifiers after the crash. I think what your suggesting is that my company should change how it's hardware works and that's not really an option for me. This isn't a simple thing like checking over the network if the machine is down or not, this is way more complex hardware design. That means you are ready to live with an unreliable design. There might be cases where notifier does not get run properly and you will not do switch despite the fact that OS has failed. I was just trying to nudge you in a direction which could be more reliable mechanism. Sigh I see some deep confusion going on here. The panic notifiers are just that panic notifiers. They have not been nor should they be tied to kexec. If those notifiers force a switch over of between machines I fail to see why you would care if it was kexec or another panic situation that is forcing that switchover. Hidehiro isn't fixing the failover situation on my side, he's fixing register information collection when crash_kexec_post_notifiers is used. Sure. Given that we have created this new parameter, let us fix it so that we can capture the other cpu register state in crash dump. I am little disappointed that it was not tested well when this parameter was introuced. We should have atleast tested it to the extent to see if there is proper cpu state present for all cpus in the crash dump. At that point of time it looked like a simple modification to allow panic notifiers before crash_kexec(). Either that or we say no one cares enough, and it known broken so let's just revert the fool thing. Masami, you introduced this option. Are you fine with the revert? Is it really being used and tested? I honestly can't see how to support panic notifiers, before kexec. There is no way to tell what is being done and all of the pieces including smp_send_stop are known to be buggy. we should be able to replace smp_send_stop() with what crash_kexec() is doing to stop the machine? If yes, then it should be fine I guess. This parameter description clearly says that specify it at your own risk. So we are not issuing a big support statement for successful kdump after panic notifiers. If it is something fixable, otherwise user needs to deal with it. Thanks Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available
On Tue, Jul 14, 2015 at 05:29:53PM +, dwal...@fifo99.com wrote: [..] If a machine is failing, there are high chance it can't deliver you the notification. Detecting that failure suing some kind of polling mechanism might be more reliable. And it will make even kdump mechanism more reliable so that it does not have to run panic notifiers after the crash. I think what your suggesting is that my company should change how it's hardware works and that's not really an option for me. This isn't a simple thing like checking over the network if the machine is down or not, this is way more complex hardware design. That means you are ready to live with an unreliable design. There might be cases where notifier does not get run properly and you will not do switch despite the fact that OS has failed. I was just trying to nudge you in a direction which could be more reliable mechanism. Sigh I see some deep confusion going on here. The panic notifiers are just that panic notifiers. They have not been nor should they be tied to kexec. If those notifiers force a switch over of between machines I fail to see why you would care if it was kexec or another panic situation that is forcing that switchover. Hidehiro isn't fixing the failover situation on my side, he's fixing register information collection when crash_kexec_post_notifiers is used. Sure. Given that we have created this new parameter, let us fix it so that we can capture the other cpu register state in crash dump. I am little disappointed that it was not tested well when this parameter was introuced. We should have atleast tested it to the extent to see if there is proper cpu state present for all cpus in the crash dump. At that point of time it looked like a simple modification to allow panic notifiers before crash_kexec(). Thanks Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V25/5] powerpc/kexec: Use global IND_FLAGS macro
On Tue, Oct 07, 2014 at 12:21:30AM +, Geoff Levand wrote: linux/kexec.h now defines an IND_FLAGS macro. Remove the local powerpc definition and use the generic one. Signed-off-by: Geoff Levand ge...@infradead.org I think this patch should be merged in previous patch. I guess after applying patch4, series might not be compilable as there are two definitions of IND_FLAGS now. Thanks Vivek --- arch/powerpc/kernel/machine_kexec_64.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 879b3aa..75652a32 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -96,8 +96,6 @@ int default_machine_kexec_prepare(struct kimage *image) return 0; } -#define IND_FLAGS (IND_DESTINATION | IND_INDIRECTION | IND_DONE | IND_SOURCE) - static void copy_segments(unsigned long ind) { unsigned long entry; -- 1.9.1 ___ kexec mailing list ke...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/5] kexec: minor fixups and enhancements
On Tue, Aug 26, 2014 at 05:33:28PM -0700, Geoff Levand wrote: Hi Vivek, On Mon, 2014-08-25 at 12:59 -0400, Vivek Goyal wrote: Does arm64 has secureboot? If yes, then it might make sense to enable the new syscall kexec_file_load() on arm64 instead of trying to make old syscall work first. Yes, arm64 should support secureboot, but I have not looked into it. When do you expect syscall kexec_file_load to get merged? I won't wait until then to push my current kexec work. It got merged in 3.17-rc1 Thanks Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/5] kexec: minor fixups and enhancements
On Fri, Aug 22, 2014 at 06:39:47PM +, Geoff Levand wrote: Hi, Here are a few minor fixups and enhancements for kexec support. Patch 3 and 4 that add preprocessor macros for the kimage list flags are ones that I use in the arm64 kexec support I am working on, so it would be nice for those to go in. Please consider. Hi Geoff, Does arm64 has secureboot? If yes, then it might make sense to enable the new syscall kexec_file_load() on arm64 instead of trying to make old syscall work first. Thanks Vivek -Geoff The following changes since commit 7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9: Linux 3.17-rc1 (2014-08-16 10:40:26 -0600) are available in the git repository at: git://git.linaro.org/people/geoff.levand/linux-kexec.git for-kexec for you to fetch changes up to dc3c63e69815e00dce5454bd50e2991e9ea33f64: powerpc/kexec: Use global IND_FLAGS macro (2014-08-22 11:15:18 -0700) Geoff Levand (5): kexec: Fix make headers_check kexec: Simplify conditional kexec: Add bit definitions for kimage entry flags kexec: Add IND_FLAGS macro powerpc/kexec: Use global IND_FLAGS macro arch/powerpc/kernel/machine_kexec_64.c | 2 -- include/linux/kexec.h | 20 include/uapi/linux/kexec.h | 6 -- kernel/kexec.c | 17 ++--- 4 files changed, 26 insertions(+), 19 deletions(-) -- 1.9.1 ___ kexec mailing list ke...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/5] kexec: Fix make headers_check
On Fri, Aug 22, 2014 at 06:39:47PM +, Geoff Levand wrote: Remove the unneded declaration for a kexec_load() routine. Fixes errors like these when running 'make headers_check': include/uapi/linux/kexec.h: userspace cannot reference function or variable defined in the kernel Signed-off-by: Geoff Levand ge...@infradead.org I think Paul Bolle tried to remove this in the past and maximilian had objections. http://lists.infradead.org/pipermail/kexec/2014-January/010902.html I can't see that how exporting kernel prototype helps here. kexec-tools seems to be using syscall(__NR_kexec_load) directly for non-xen case. So I would be fine with removing this definition. Just trying to make sure that it does not break any other library or users of this declaration. Thanks Vivek --- include/uapi/linux/kexec.h | 6 -- 1 file changed, 6 deletions(-) diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 6925f5b..99048e5 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -55,12 +55,6 @@ struct kexec_segment { size_t memsz; }; -/* Load a new kernel image as described by the kexec_segment array - * consisting of passed number of segments at the entry-point address. - * The flags allow different useage types. - */ -extern int kexec_load(void *, size_t, struct kexec_segment *, - unsigned long int); #endif /* __KERNEL__ */ #endif /* _UAPILINUX_KEXEC_H */ -- 1.9.1 ___ kexec mailing list ke...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/5] kexec: Simplify conditional
On Fri, Aug 22, 2014 at 06:39:47PM +, Geoff Levand wrote: Simplify the code around one of the conditionals in the kexec_load syscall routine. The original code was confusing with a redundant check on KEXEC_ON_CRASH and comments outside of the conditional block. This change switches the order of the conditional check, and cleans up the comments for the conditional. There is no functional change to the code. Signed-off-by: Geoff Levand ge...@infradead.org This is simple reorganization. Acked-by: Vivek Goyal vgo...@redhat.com Vivek --- kernel/kexec.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/kernel/kexec.c b/kernel/kexec.c index 0b49a0a..d04b56e 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1282,19 +1282,22 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments, if (nr_segments 0) { unsigned long i; - /* Loading another kernel to reboot into */ - if ((flags KEXEC_ON_CRASH) == 0) - result = kimage_alloc_init(image, entry, nr_segments, -segments, flags); - /* Loading another kernel to switch to if this one crashes */ - else if (flags KEXEC_ON_CRASH) { - /* Free any current crash dump kernel before + if (flags KEXEC_ON_CRASH) { + /* + * Loading another kernel to switch to if this one + * crashes. Free any current crash dump kernel before * we corrupt it. */ + kimage_free(xchg(kexec_crash_image, NULL)); result = kimage_alloc_init(image, entry, nr_segments, segments, flags); crash_map_reserved_pages(); + } else { + /* Loading another kernel to reboot into. */ + + result = kimage_alloc_init(image, entry, nr_segments, +segments, flags); } if (result) goto out; -- 1.9.1 ___ kexec mailing list ke...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] kexec: Add bit definitions for kimage entry flags
On Fri, Aug 22, 2014 at 06:39:47PM +, Geoff Levand wrote: Define new kexec preprocessor macros IND_*_BIT that define the bit position of the kimage entry flags. Change the existing IND_* flag macros to be defined as bit shifts of the corresponding IND_*_BIT macros. Also wrap all C language code in kexec.h with #if !defined(__ASSEMBLY__) so assembly files can include kexec.h to get the IND_* and IND_*_BIT macros. Some CPU instruction sets have tests for bit position which are convenient in implementing routines that operate on the kimage entry list. The addition of these bit position macros in a common location will avoid duplicate definitions and the chance that changes to the IND_* flags will not be propagated to assembly files. Signed-off-by: Geoff Levand ge...@infradead.org Looks good to me. Acked-by: Vivek Goyal vgo...@redhat.com Vivek --- include/linux/kexec.h | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 4b2a0e1..8c628ca 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -1,6 +1,18 @@ #ifndef LINUX_KEXEC_H #define LINUX_KEXEC_H +#define IND_DESTINATION_BIT 0 +#define IND_INDIRECTION_BIT 1 +#define IND_DONE_BIT2 +#define IND_SOURCE_BIT 3 + +#define IND_DESTINATION (1 IND_DESTINATION_BIT) +#define IND_INDIRECTION (1 IND_INDIRECTION_BIT) +#define IND_DONE (1 IND_DONE_BIT) +#define IND_SOURCE (1 IND_SOURCE_BIT) + +#if !defined(__ASSEMBLY__) + #include uapi/linux/kexec.h #ifdef CONFIG_KEXEC @@ -64,10 +76,6 @@ */ typedef unsigned long kimage_entry_t; -#define IND_DESTINATION 0x1 -#define IND_INDIRECTION 0x2 -#define IND_DONE 0x4 -#define IND_SOURCE 0x8 struct kexec_segment { /* @@ -312,4 +320,7 @@ struct task_struct; static inline void crash_kexec(struct pt_regs *regs) { } static inline int kexec_should_crash(struct task_struct *p) { return 0; } #endif /* CONFIG_KEXEC */ + +#endif /* !defined(__ASSEBMLY__) */ + #endif /* LINUX_KEXEC_H */ -- 1.9.1 ___ kexec mailing list ke...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/5] kexec: Add IND_FLAGS macro
On Fri, Aug 22, 2014 at 06:39:47PM +, Geoff Levand wrote: Add a new kexec preprocessor macro IND_FLAGS, which is the bitwise OR of all the possible kexec IND_ kimage_entry indirection flags. Having this macro allows for simplified code in the prosessing of the kexec kimage_entry items. Signed-off-by: Geoff Levand ge...@infradead.org Acked-by: Vivek Goyal vgo...@redhat.com Vivek --- include/linux/kexec.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 8c628ca..a4758f9 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -10,6 +10,7 @@ #define IND_INDIRECTION (1 IND_INDIRECTION_BIT) #define IND_DONE (1 IND_DONE_BIT) #define IND_SOURCE (1 IND_SOURCE_BIT) +#define IND_FLAGS (IND_DESTINATION | IND_INDIRECTION | IND_DONE | IND_SOURCE) #if !defined(__ASSEMBLY__) -- 1.9.1 ___ kexec mailing list ke...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode
On Fri, Jun 06, 2014 at 06:00:43PM +0530, Srivatsa S. Bhat wrote: On 06/04/2014 07:16 PM, Vivek Goyal wrote: On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote: On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote: Yep, that makes sense. But unfortunately I don't have enough insight into why exactly powerpc has to online the CPUs before doing a kexec. I just know from the commit log and the comment mentioned above (and from my own experiments) that the CPUs will get stuck if they were offline. Perhaps somebody more knowledgeable can explain this in detail and suggest a proper long-term solution. Matt, Ben, any thoughts on this? The problem is with our soft offline which we do on some platforms. When we offline we don't actually send the CPUs back to firmware or anything like that. We put them into a very low low power loop inside Linux. The new kernel has no way to extract them from that loop. So we must re-online them before we kexec so they can be passed to the new kernel normally (or returned to firmware like we do on powernv). Srivatsa, Looks like your patch has been merged. I don't like the following change in arch independent code. /* * migrate_to_reboot_cpu() disables CPU hotplug assuming that * no further code needs to use CPU hotplug (which is true in * the reboot case). However, the kexec path depends on using * CPU hotplug again; so re-enable it here. */ cpu_hotplug_enable(); As it is very powerpc specific requirement, can you enable hotplug in powerpc arch dependent code as a short term solution. I didn't do that because that would mean that the _disable() would be performed inside kernel/kexec.c and the corresponding _enable() would be performed in arch/powerpc/kernel/machine_kexec_64.c -- with no apparent connection between them, which would have made them hard to relate. Which we are doing anyway. The difference is that now we are doing it for all arches. If this is powerpc specific requirement, then we should limit this to powerpc only and not let spill over in generic code. And putting a big fat comment should take care of being able to figure out why arch code is overwriting the generic code's decision. By putting it in generic code and enforcing this on all arches does not buy us anything, IMHO. Ideally one needs to fix the requirement of online all cpus in powerpc as a long term solution and then get rid of hotplug enable call. Yes, I agree. I'm trying out a solution at the moment (see the 4 preliminary patches I sent in my reply to Ben). If that works, we won't need the enable call on powerpc. Thanks. This will help. Thanks Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode
On Wed, Jun 04, 2014 at 01:58:40AM +0530, Srivatsa S. Bhat wrote: On 05/28/2014 07:01 PM, Vivek Goyal wrote: On Tue, May 27, 2014 at 04:25:34PM +0530, Srivatsa S. Bhat wrote: If we try to perform a kexec when the machine is in ST (Single-Threaded) mode (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we get the following messages during boot: [0.089866] POWER8 performance monitor hardware support registered [0.089985] power8-pmu: PMAO restore workaround active. [5.095419] Processor 1 is stuck. [ 10.097933] Processor 2 is stuck. [ 15.100480] Processor 3 is stuck. [ 20.102982] Processor 4 is stuck. [ 25.105489] Processor 5 is stuck. [ 30.108005] Processor 6 is stuck. [ 35.110518] Processor 7 is stuck. [ 40.113369] Processor 9 is stuck. [ 45.115879] Processor 10 is stuck. [ 50.118389] Processor 11 is stuck. [ 55.120904] Processor 12 is stuck. [ 60.123425] Processor 13 is stuck. [ 65.125970] Processor 14 is stuck. [ 70.128495] Processor 15 is stuck. [ 75.131316] Processor 17 is stuck. Note that only the sibling threads are stuck, while the primary threads (0, 8, 16 etc) boot just fine. Looking closer at the previous step of kexec, we observe that kexec tries to wakeup (bring online) the sibling threads of all the cores, before performing kexec: [ 9464.131231] Starting new kernel [ 9464.148507] kexec: Waking offline cpu 1. [ 9464.148552] kexec: Waking offline cpu 2. [ 9464.148600] kexec: Waking offline cpu 3. [ 9464.148636] kexec: Waking offline cpu 4. [ 9464.148671] kexec: Waking offline cpu 5. [ 9464.148708] kexec: Waking offline cpu 6. [ 9464.148743] kexec: Waking offline cpu 7. [ 9464.148779] kexec: Waking offline cpu 9. [ 9464.148815] kexec: Waking offline cpu 10. [ 9464.148851] kexec: Waking offline cpu 11. [ 9464.148887] kexec: Waking offline cpu 12. [ 9464.148922] kexec: Waking offline cpu 13. [ 9464.148958] kexec: Waking offline cpu 14. [ 9464.148994] kexec: Waking offline cpu 15. [ 9464.149030] kexec: Waking offline cpu 17. Instrumenting this piece of code revealed that the cpu_up() operation actually fails with -EBUSY. Thus, only the primary threads of all the cores are online during kexec, and hence this is a sure-shot receipe for disaster, as explained in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across kexec), as well as in the comment above wake_offline_cpus(). It turns out that cpu_up() was returning -EBUSY because the variable 'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done by migrate_to_reboot_cpu() inside kernel_kexec(). Now, migrate_to_reboot_cpu() was originally written with the assumption that any further code will not need to perform CPU hotplug, since we are anyway in the reboot path. However, kexec is clearly not such a case, since we depend on onlining CPUs, atleast on powerpc. So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the kexec path, to fix this regression in kexec on powerpc. Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we can catch such issues more easily in the future. Fixes: c97102ba963 (kexec: migrate to reboot cpu) Cc: sta...@vger.kernel.org Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com --- arch/powerpc/kernel/machine_kexec_64.c |2 +- kernel/kexec.c |8 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 59d229a..879b3aa 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -237,7 +237,7 @@ static void wake_offline_cpus(void) if (!cpu_online(cpu)) { printk(KERN_INFO kexec: Waking offline cpu %d.\n, cpu); - cpu_up(cpu); + WARN_ON(cpu_up(cpu)); } } } diff --git a/kernel/kexec.c b/kernel/kexec.c index c8380ad..28c5706 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1683,6 +1683,14 @@ int kernel_kexec(void) kexec_in_progress = true; kernel_restart_prepare(NULL); migrate_to_reboot_cpu(); + + /* + * migrate_to_reboot_cpu() disables CPU hotplug assuming that + * no further code needs to use CPU hotplug (which is true in + * the reboot case). However, the kexec path depends on using + * CPU hotplug again; so re-enable it here. + */ + cpu_hotplug_enable(); printk(KERN_EMERG Starting new kernel\n); machine_shutdown(); After migrate_to_reboot_cpu(), we are calling machine_shutdown() which calls disable_nonboot_cpus() and which
Re: [PATCH] powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode
On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote: On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote: Yep, that makes sense. But unfortunately I don't have enough insight into why exactly powerpc has to online the CPUs before doing a kexec. I just know from the commit log and the comment mentioned above (and from my own experiments) that the CPUs will get stuck if they were offline. Perhaps somebody more knowledgeable can explain this in detail and suggest a proper long-term solution. Matt, Ben, any thoughts on this? The problem is with our soft offline which we do on some platforms. When we offline we don't actually send the CPUs back to firmware or anything like that. We put them into a very low low power loop inside Linux. The new kernel has no way to extract them from that loop. So we must re-online them before we kexec so they can be passed to the new kernel normally (or returned to firmware like we do on powernv). Srivatsa, Looks like your patch has been merged. I don't like the following change in arch independent code. /* * migrate_to_reboot_cpu() disables CPU hotplug assuming that * no further code needs to use CPU hotplug (which is true in * the reboot case). However, the kexec path depends on using * CPU hotplug again; so re-enable it here. */ cpu_hotplug_enable(); As it is very powerpc specific requirement, can you enable hotplug in powerpc arch dependent code as a short term solution. Ideally one needs to fix the requirement of online all cpus in powerpc as a long term solution and then get rid of hotplug enable call. Thanks Vivek ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode
On Tue, May 27, 2014 at 04:25:34PM +0530, Srivatsa S. Bhat wrote: If we try to perform a kexec when the machine is in ST (Single-Threaded) mode (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we get the following messages during boot: [0.089866] POWER8 performance monitor hardware support registered [0.089985] power8-pmu: PMAO restore workaround active. [5.095419] Processor 1 is stuck. [ 10.097933] Processor 2 is stuck. [ 15.100480] Processor 3 is stuck. [ 20.102982] Processor 4 is stuck. [ 25.105489] Processor 5 is stuck. [ 30.108005] Processor 6 is stuck. [ 35.110518] Processor 7 is stuck. [ 40.113369] Processor 9 is stuck. [ 45.115879] Processor 10 is stuck. [ 50.118389] Processor 11 is stuck. [ 55.120904] Processor 12 is stuck. [ 60.123425] Processor 13 is stuck. [ 65.125970] Processor 14 is stuck. [ 70.128495] Processor 15 is stuck. [ 75.131316] Processor 17 is stuck. Note that only the sibling threads are stuck, while the primary threads (0, 8, 16 etc) boot just fine. Looking closer at the previous step of kexec, we observe that kexec tries to wakeup (bring online) the sibling threads of all the cores, before performing kexec: [ 9464.131231] Starting new kernel [ 9464.148507] kexec: Waking offline cpu 1. [ 9464.148552] kexec: Waking offline cpu 2. [ 9464.148600] kexec: Waking offline cpu 3. [ 9464.148636] kexec: Waking offline cpu 4. [ 9464.148671] kexec: Waking offline cpu 5. [ 9464.148708] kexec: Waking offline cpu 6. [ 9464.148743] kexec: Waking offline cpu 7. [ 9464.148779] kexec: Waking offline cpu 9. [ 9464.148815] kexec: Waking offline cpu 10. [ 9464.148851] kexec: Waking offline cpu 11. [ 9464.148887] kexec: Waking offline cpu 12. [ 9464.148922] kexec: Waking offline cpu 13. [ 9464.148958] kexec: Waking offline cpu 14. [ 9464.148994] kexec: Waking offline cpu 15. [ 9464.149030] kexec: Waking offline cpu 17. Instrumenting this piece of code revealed that the cpu_up() operation actually fails with -EBUSY. Thus, only the primary threads of all the cores are online during kexec, and hence this is a sure-shot receipe for disaster, as explained in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across kexec), as well as in the comment above wake_offline_cpus(). It turns out that cpu_up() was returning -EBUSY because the variable 'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done by migrate_to_reboot_cpu() inside kernel_kexec(). Now, migrate_to_reboot_cpu() was originally written with the assumption that any further code will not need to perform CPU hotplug, since we are anyway in the reboot path. However, kexec is clearly not such a case, since we depend on onlining CPUs, atleast on powerpc. So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the kexec path, to fix this regression in kexec on powerpc. Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we can catch such issues more easily in the future. Fixes: c97102ba963 (kexec: migrate to reboot cpu) Cc: sta...@vger.kernel.org Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com --- arch/powerpc/kernel/machine_kexec_64.c |2 +- kernel/kexec.c |8 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 59d229a..879b3aa 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -237,7 +237,7 @@ static void wake_offline_cpus(void) if (!cpu_online(cpu)) { printk(KERN_INFO kexec: Waking offline cpu %d.\n, cpu); - cpu_up(cpu); + WARN_ON(cpu_up(cpu)); } } } diff --git a/kernel/kexec.c b/kernel/kexec.c index c8380ad..28c5706 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1683,6 +1683,14 @@ int kernel_kexec(void) kexec_in_progress = true; kernel_restart_prepare(NULL); migrate_to_reboot_cpu(); + + /* + * migrate_to_reboot_cpu() disables CPU hotplug assuming that + * no further code needs to use CPU hotplug (which is true in + * the reboot case). However, the kexec path depends on using + * CPU hotplug again; so re-enable it here. + */ + cpu_hotplug_enable(); printk(KERN_EMERG Starting new kernel\n); machine_shutdown(); After migrate_to_reboot_cpu(), we are calling machine_shutdown() which calls disable_nonboot_cpus() and which in turn calls _cpu_down(). So it is kind of odd that we first migrate to boot cpu, and then disable all non-boot cpus and after that powerpc goes ahead and onlines all cpus. I think this is not
[PATCH 5/5] sh: Define elfcorehdr_addr in arch dependent section
o Move elfcorehdr_addr definition in arch dependent crash dump file. This is equivalent to defining elfcorehdr_addr under CONFIG_CRASH_DUMP instead of CONFIG_PROC_VMCORE. This is needed by is_kdump_kernel() which can be used irrespective of the fact whether CONFIG_PROC_VMCORE is enabled or not. o I don't see sh setup code parsing the command line for elfcorehdr_addr. I am wondering how does vmcore interface work on sh. Anyway, I am atleast defining elfcoredhr_addr so that compilation is not broken on sh. Signed-off-by: Vivek Goyal [EMAIL PROTECTED] --- arch/sh/kernel/crash_dump.c |3 +++ 1 file changed, 3 insertions(+) diff -puN arch/sh/kernel/crash_dump.c~fix-elfcorehdr_addr-sh arch/sh/kernel/crash_dump.c --- linux-2.6.27-pre-rc1/arch/sh/kernel/crash_dump.c~fix-elfcorehdr_addr-sh 2008-07-28 12:17:12.0 -0400 +++ linux-2.6.27-pre-rc1-root/arch/sh/kernel/crash_dump.c 2008-07-28 12:17:12.0 -0400 @@ -10,6 +10,9 @@ #include linux/io.h #include asm/uaccess.h +/* Stores the physical address of elf header of crash image. */ +unsigned long long elfcorehdr_addr = ELFCORE_ADDR_MAX; + /** * copy_oldmem_page - copy one page from oldmem * @pfn: page frame number to be copied _ ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 3/5] ia64: Define elfcorehdr_addr in arch dependent section
o Move elfcorehdr_addr definition in arch dependent crash dump file. This is equivalent to defining elfcorehdr_addr under CONFIG_CRASH_DUMP instead of CONFIG_PROC_VMCORE. This is needed by is_kdump_kernel() which can be used irrespective of the fact whether CONFIG_PROC_VMCORE is enabled or not. Signed-off-by: Vivek Goyal [EMAIL PROTECTED] --- arch/ia64/kernel/setup.c |9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff -puN arch/ia64/kernel/setup.c~fix-elfcorehdr_addr-parsing-ia64 arch/ia64/kernel/setup.c --- linux-2.6.27-pre-rc1/arch/ia64/kernel/setup.c~fix-elfcorehdr_addr-parsing-ia64 2008-07-28 12:16:40.0 -0400 +++ linux-2.6.27-pre-rc1-root/arch/ia64/kernel/setup.c 2008-07-28 12:16:40.0 -0400 @@ -478,7 +478,12 @@ static __init int setup_nomca(char *s) } early_param(nomca, setup_nomca); -#ifdef CONFIG_PROC_VMCORE +/* + * Note: elfcorehdr_addr is not just limited to vmcore. It is also used by + * is_kdump_kernel() to determine if we are booting after a panic. Hence + * ifdef it under CONFIG_CRASH_DUMP and not CONFIG_PROC_VMCORE. + */ +#ifdef CONFIG_CRASH_DUMP /* elfcorehdr= specifies the location of elf core header * stored by the crashed kernel. */ @@ -491,7 +496,9 @@ static int __init parse_elfcorehdr(char return 0; } early_param(elfcorehdr, parse_elfcorehdr); +#endif +#ifdef CONFIG_PROC_VMCORE int __init reserve_elfcorehdr(unsigned long *start, unsigned long *end) { unsigned long length; _ ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 4/5] powerpc: Define elfcorehdr_addr in arch dependent section
o Move elfcorehdr_addr definition in arch dependent crash dump file. This is equivalent to defining elfcorehdr_addr under CONFIG_CRASH_DUMP instead of CONFIG_PROC_VMCORE. This is needed by is_kdump_kernel() which can be used irrespective of the fact whether CONFIG_PROC_VMCORE is enabled or not. Signed-off-by: Vivek Goyal [EMAIL PROTECTED] --- arch/powerpc/kernel/crash_dump.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff -puN arch/powerpc/kernel/crash_dump.c~fix-elfcorehdr_addr-parsing-ppc64 arch/powerpc/kernel/crash_dump.c --- linux-2.6.27-pre-rc1/arch/powerpc/kernel/crash_dump.c~fix-elfcorehdr_addr-parsing-ppc64 2008-07-28 12:14:22.0 -0400 +++ linux-2.6.27-pre-rc1-root/arch/powerpc/kernel/crash_dump.c 2008-07-28 12:14:22.0 -0400 @@ -27,6 +27,9 @@ #define DBG(fmt...) #endif +/* Stores the physical address of elf header of crash image. */ +unsigned long long elfcorehdr_addr = ELFCORE_ADDR_MAX; + void __init reserve_kdump_trampoline(void) { lmb_reserve(0, KDUMP_RESERVE_LIMIT); @@ -66,7 +69,11 @@ void __init setup_kdump_trampoline(void) DBG( - setup_kdump_trampoline()\n); } -#ifdef CONFIG_PROC_VMCORE +/* + * Note: elfcorehdr_addr is not just limited to vmcore. It is also used by + * is_kdump_kernel() to determine if we are booting after a panic. Hence + * ifdef it under CONFIG_CRASH_DUMP and not CONFIG_PROC_VMCORE. + */ static int __init parse_elfcorehdr(char *p) { if (p) @@ -75,7 +82,6 @@ static int __init parse_elfcorehdr(char return 1; } __setup(elfcorehdr=, parse_elfcorehdr); -#endif static int __init parse_savemaxmem(char *p) { _ ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/5] Move elfcorehdr_addr out of vmcore.c (Was: Re: [patch] crashdump: fix undefined reference to `elfcorehdr_addr')
On Tue, Jul 29, 2008 at 11:22:48AM +1000, Simon Horman wrote: On Mon, Jul 28, 2008 at 03:47:41PM -0700, Eric W. Biederman wrote: Vivek Goyal [EMAIL PROTECTED] writes: Hi All, How does following series of patches look like. I have moved elfcorehdr_addr out of vmcore.c and pushed it to arch dependent section of crash dump to make sure that it can be worked with even when CONFIG_PROC_VMCORE is disabled and CONFIG_CRASH_DUMP is enabled. I tested it on x86_64. Compile tested it on i386 and ppc64. ia64 and sh versions are completely untested. Given the current state of the code: Acked-by: Eric W. Biederman [EMAIL PROTECTED] To process a kernel crash dump we pass the kernel elfcorehdr option, so testing to see if it was passed seems reasonable. In general I think this method of handling the problems with kdump is too brittle to live, but in the case of iommus we certainly need to do something different, and unfortunately iommus were not common on x86 when the original code was merged so we have not handled them well. Agreed, however these patches look like they really ought to be merged into a single patch for the sake of bisect. As things stand, applying the first patch will break the build on each architecture with an architecture specific until the latter is applied. That's a good point. I was not very sure because changes were in different arches and I broke the patch. At the same time changes are really miniscule in each arch. So, for the sake of not breaking compilation for git-bisect, I will generate a single patch tomorrow. (Until and unless somebody has an objection). Thanks Vivek -- Horms ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev