Re: [RFC 0/3] extend kexec_file_load system call

2016-07-20 Thread Vivek Goyal
On Wed, Jul 20, 2016 at 09:35:30AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 20, 2016 at 01:45:42PM +1000, Balbir Singh wrote:
> > > IOW, if your kernel forced signature verification, you should not be
> > > able to do sig_enforce=0. If you kernel did not have
> > > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
> > > and you are not making it worse using command line.
> > 
> > OK.. I checked and you are right, but that is an example and there are
> > other things like security=, thermal.*, nosmep, nosmap that need auditing
> > for safety and might hurt the system security if used. I still think
> > think that assuming you can pass any command line without breaking security
> > is a broken argument.
> 
> Quite, and you don't need to run code in a privileged environment to do
> any of that.
> 
> It's also not trivial to protect against: new kernels gain new arguments
> which older kernels may not know about.  No matter how much protection
> is built into older kernels, newer kernels can become vulnerable through
> the addition of further arguments.

If a new kernel command line option becomes an issue, new kernel can
block that in secureboot environment. That way it helps kexec
boot as well as regular boot.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-20 Thread Vivek Goyal
On Wed, Jul 20, 2016 at 01:45:42PM +1000, Balbir Singh wrote:
> >  
> > Command line options are not signed. I thought idea behind secureboot
> > was to execute only trusted code and command line options don't enforce
> > you to execute unsigned code.
> >  
> >>
> >> You can set module.sig_enforce=0 and open up the system a bit assuming
> >> that you can get a module to load with another attack
> > 
> > IIUC, sig_enforce bool_enable_only so it can only be enabled. Default
> > value of it is 0 if CONFIG_MODULE_SIG_FORCE=n.
> > 
> > IOW, if your kernel forced signature verification, you should not be
> > able to do sig_enforce=0. If you kernel did not have
> > CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
> > and you are not making it worse using command line.
> > 
> 
> OK.. I checked and you are right, but that is an example and there are
> other things like security=, thermal.*, nosmep, nosmap that need auditing
> for safety and might hurt the system security if used. I still think
> think that assuming you can pass any command line without breaking security
> is a broken argument.

I agree that if some command line option allows running unsigned code
at ring 0, then we probably should disable that on secureboot enabled
boot.

In fact, there were bunch of patches which made things tighter on
secureboot enabled machines from matthew garrett. AFAIK, these patches
never went upstream.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 3/3] kexec: extend kexec_file_load system call

2016-07-19 Thread Vivek Goyal
On Tue, Jul 19, 2016 at 01:47:28PM +0100, Mark Rutland wrote:
> On Tue, Jul 19, 2016 at 08:24:06AM -0400, Vivek Goyal wrote:
> > On Tue, Jul 19, 2016 at 11:52:00AM +0100, Mark Rutland wrote:
> > > Regardless, this extended syscall changes some underlying assumptions
> > > made with the development of kexec_file_load, and I think treating this
> > > as an extension is not a great idea. From a user's perspective there is
> > > little difference between passing an additional flag or using a
> > > different syscall number, so I don't think that we gain much by altering
> > > the existing prototype relative to allocating a new syscall number.
> > 
> > If we are providing/opening up additional flags, I can't think what will
> > it break. Same flag was invalid in old kernel but new kernel supports 
> > it and will accept it. So it sounds reasonable to me to add new flags.
> > 
> > If existing users are not broken, then I think it might be a good idea
> > to extend existing syscall. Otherwise userspace will have to be modified
> > to understand a 3rd syscall also and an additional option will show up
> > which asks users to specify which syscall to use. So extending existing
> > syscall might keep it little simple for users.
> 
> I don't follow.
> 
> To use the new feature, you have to modify userspace anyway, as you
> require userspace to pass information which it did not previously pass
> (in the new arguments added to the syscall).
> 
> The presence of a new syscall does not imply the absence of the old
> syscall, so you can always use that be default unless the user asks for
> asomething only the new syscall provides. Regardless of the
> syscall/flags difference, you still have to detect whether the new
> functionality is present somehow.
> 

Hmm., so current idea is that we have two syscalls() which are *ideally*
supposed to work for all arches. Difference between two is that first
one does not support kernel signature verification while second one does.

By default old syscall is used and user can force using new syscall using
option --kexec-file-load.

If a user DTB is present, I was hoping that it will continue to work the
same way. Both the sycalls can be used and can handle DTB. If we introduce
a 3rd syscall, that means only first and 3rd syscall can handle DTB and
we need to introduce one more option which tells whether to use
kexec_load() or use the 3rd new syscall. And that's what I am trying
to avoid.

Vivek

> > BTW, does kexec_load() needs to be modified too to handle DT?
> 
> No, at least for arm64. In the kexec_load case userspace provides the
> DTB as a raw segment, and the user-provided purgatory sets up registers
> to pass that to the new kernel.
> 
> Thanks,
> Mark.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 3/3] kexec: extend kexec_file_load system call

2016-07-19 Thread Vivek Goyal
On Tue, Jul 19, 2016 at 11:52:00AM +0100, Mark Rutland wrote:
> On Tue, Jul 19, 2016 at 08:55:56AM +0800, Dave Young wrote:
> > On 07/18/16 at 11:07am, Mark Rutland wrote:
> > > On Mon, Jul 18, 2016 at 10:30:24AM +0800, Dave Young wrote:
> > > > I do not think it is worth to add another syscall for extra fds.
> > > > We have open(2) as an example for different numbers of arguments
> > > > already.
> > > 
> > > Did we change the syscall interface for that?
> > > 
> > > I was under the impression that there was always one underlying syscall,
> > > and the C library did the right thing to pass the expected information
> > > to the underlying syscall.
> > 
> > I'm not sure kexec_load and kexec_file_load were included in glibc, we use
> > syscall directly in kexec-tools.
> > 
> > kexec_load man pages says there are no wrappers for both kexec_load and
> > kexec_file_load in glibc.
> 
> For the above, I was talking about how open() was handled.
> 
> If there are no userspace wrappers, then the two cases aren't comparable
> in the first place...
> 
> > > That's rather different to changing the underlying syscall.
> > > 
> > > Regardless of how this is wrapped in userspace, I do not think modifying
> > > the existing prototype is a good idea, and I think this kind of
> > > extension needs to be a new syscall.
> > 
> > Hmm, as I replied to Vivek, there is one case about the flags, previously
> > the new flag will be regarded as invalid, but not we extend it it will be
> > valid, this maybe the only potential bad case.
> 
> It's true that adding suport for new flags will change the behaviour of
> what used to be error cases. We generally expect real users to not be
> making pointless calls for which they rely on an error being returned in
> all cases.
> 
> Regardless, this extended syscall changes some underlying assumptions
> made with the development of kexec_file_load, and I think treating this
> as an extension is not a great idea. From a user's perspective there is
> little difference between passing an additional flag or using a
> different syscall number, so I don't think that we gain much by altering
> the existing prototype relative to allocating a new syscall number.

If we are providing/opening up additional flags, I can't think what will
it break. Same flag was invalid in old kernel but new kernel supports 
it and will accept it. So it sounds reasonable to me to add new flags.

If existing users are not broken, then I think it might be a good idea
to extend existing syscall. Otherwise userspace will have to be modified
to understand a 3rd syscall also and an additional option will show up
which asks users to specify which syscall to use. So extending existing
syscall might keep it little simple for users.

This is only if conclusion in the end is that DT needs to be passed in
from user space.

BTW, does kexec_load() needs to be modified too to handle DT?

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-18 Thread Vivek Goyal
On Mon, Jul 18, 2016 at 09:26:29AM -0400, Vivek Goyal wrote:
> On Mon, Jul 18, 2016 at 10:46:04PM +1000, Balbir Singh wrote:
> > On Wed, 2016-07-13 at 14:22 -0400, Vivek Goyal wrote:
> > > On Wed, Jul 13, 2016 at 06:40:10PM +0100, Russell King - ARM Linux wrote:
> > > > 
> > > > On Wed, Jul 13, 2016 at 09:03:38AM -0400, Vivek Goyal wrote:
> > > > > 
> > > > > On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux 
> > > > > wrote:
> > > > > > 
> > > > > > Indeed - maybe Eric knows better, but I can't see any situation 
> > > > > > where
> > > > > > the dtb we load via kexec should ever affect "the bootloader", 
> > > > > > unless
> > > > > > the "kernel" that's being loaded into kexec is "the bootloader".
> > > > > > 
> > > > > > Now, going back to the more fundamental issue raised in my first 
> > > > > > reply,
> > > > > > about the kernel command line.
> > > > > > 
> > > > > > On x86, I can see that it _is_ possible for userspace to specify a
> > > > > > command line, and the kernel loading the image provides the command
> > > > > > line to the to-be-kexeced kernel with very little checking.  So, if
> > > > > > your kernel is signed, what stops the "insecure userspace" loading
> > > > > > a signed kernel but giving it an insecure rootfs and/or console?
> > > > > It is not kexec specific. I could do this for regular boot too, right?
> > > > > 
> > > > > Command line options are not signed. I thought idea behind secureboot
> > > > > was to execute only trusted code and command line options don't 
> > > > > enforce
> > > > > you to execute unsigned code.
> > > > > 
> > 
> > You can set module.sig_enforce=0 and open up the system a bit assuming
> > that you can get a module to load with another attack
> 
> IIUC, sig_enforce bool_enable_only so it can only be enabled. Default
> value of it is 0 if CONFIG_MODULE_SIG_FORCE=n.
> 
> IOW, if your kernel forced signature verification, you should not be
> able to do sig_enforce=0. If you kernel did not have
> CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
> and you are not making it worse using command line.

[ CC Matthew Garrett ]

I think on top of this there were patches by Matthew Garrett, which
disallowed loading of unsigned modules if booted with secureboot on. I
think those patches never made upstream though.

Vivek

> 
> > 
> > > > > So it sounds like different class of security problems which you are
> > > > > referring to and not necessarily covered by secureboot or signed
> > > > > kernel.
> > > > Let me give you an example.
> > > > 
> > > > You have a secure boot setup, where the firmware/ROM validates the boot
> > > > loader.  Good, the boot loader hasn't been tampered with.
> > > > 
> > > > You interrupt the boot loader and are able to modify the command line
> > > > for the booted kernel.
> > > > 
> > > > The boot loader loads the kernel and verifies the kernel's signature.
> > > > Good, the kernel hasn't been tampered with.  The kernel starts running.
> > > > 
> > > > You've plugged in a USB drive to the device, and specified a partition
> > > > containing a root filesystem that you control to the kernel.  The
> > > > validated kernel finds the USB drive, and mounts it, and executes
> > > > your own binaries on the USB drive.
> > > You will require physical access to the machine to be able to
> > > insert your usb drive. And IIRC, argument was that if attacker has
> > > physical access to machine, all bets are off anyway.
> > >
> > 
> > You don't need physical access -- your machine controller BMC can
> > do the magic for you. So its not always physical access, is it?
> 
> Well, idea was that if you have physical access to machine, then all
> bets are off. If BMC can do something which allows running unsigned
> code at ring level 0, its a problem I think from secureboot model of
> security.
> 
> >  
> > > > 
> > > > 
> > > > You run a shell on the console.  You now have control of the system,
> > > > and can mount the real rootfs, inspect it, and work out what it does,
> > > > etc.
> > > > 
> > > > At this point, what use was all the validation that the secure boot
> > > > has done?  Absolutely useless.
> > > > 
> > > > If you can change the command line arguments given to the kernel, you
> > > > have no security, no matter how much you verify signatures.  It's
> > > > the illusion of security, nothing more, nothing less.
> > > > 
> > 
> > I agree, if you can change command line arguments, all bets are of lesser 
> > value
> 
> If changing command line allows execution of unsigned code at ring level
> 0, then it is a problem. Otherwise we are talking of security issues which
> are not covered by secureboot model.
> 
> Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-18 Thread Vivek Goyal
On Mon, Jul 18, 2016 at 10:46:04PM +1000, Balbir Singh wrote:
> On Wed, 2016-07-13 at 14:22 -0400, Vivek Goyal wrote:
> > On Wed, Jul 13, 2016 at 06:40:10PM +0100, Russell King - ARM Linux wrote:
> > > 
> > > On Wed, Jul 13, 2016 at 09:03:38AM -0400, Vivek Goyal wrote:
> > > > 
> > > > On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux 
> > > > wrote:
> > > > > 
> > > > > Indeed - maybe Eric knows better, but I can't see any situation where
> > > > > the dtb we load via kexec should ever affect "the bootloader", unless
> > > > > the "kernel" that's being loaded into kexec is "the bootloader".
> > > > > 
> > > > > Now, going back to the more fundamental issue raised in my first 
> > > > > reply,
> > > > > about the kernel command line.
> > > > > 
> > > > > On x86, I can see that it _is_ possible for userspace to specify a
> > > > > command line, and the kernel loading the image provides the command
> > > > > line to the to-be-kexeced kernel with very little checking.  So, if
> > > > > your kernel is signed, what stops the "insecure userspace" loading
> > > > > a signed kernel but giving it an insecure rootfs and/or console?
> > > > It is not kexec specific. I could do this for regular boot too, right?
> > > > 
> > > > Command line options are not signed. I thought idea behind secureboot
> > > > was to execute only trusted code and command line options don't enforce
> > > > you to execute unsigned code.
> > > > 
> 
> You can set module.sig_enforce=0 and open up the system a bit assuming
> that you can get a module to load with another attack

IIUC, sig_enforce bool_enable_only so it can only be enabled. Default
value of it is 0 if CONFIG_MODULE_SIG_FORCE=n.

IOW, if your kernel forced signature verification, you should not be
able to do sig_enforce=0. If you kernel did not have
CONFIG_MODULE_SIG_FORCE=y, then sig_enforce should be 0 by default anyway
and you are not making it worse using command line.

> 
> > > > So it sounds like different class of security problems which you are
> > > > referring to and not necessarily covered by secureboot or signed
> > > > kernel.
> > > Let me give you an example.
> > > 
> > > You have a secure boot setup, where the firmware/ROM validates the boot
> > > loader.  Good, the boot loader hasn't been tampered with.
> > > 
> > > You interrupt the boot loader and are able to modify the command line
> > > for the booted kernel.
> > > 
> > > The boot loader loads the kernel and verifies the kernel's signature.
> > > Good, the kernel hasn't been tampered with.  The kernel starts running.
> > > 
> > > You've plugged in a USB drive to the device, and specified a partition
> > > containing a root filesystem that you control to the kernel.  The
> > > validated kernel finds the USB drive, and mounts it, and executes
> > > your own binaries on the USB drive.
> > You will require physical access to the machine to be able to
> > insert your usb drive. And IIRC, argument was that if attacker has
> > physical access to machine, all bets are off anyway.
> >
> 
> You don't need physical access -- your machine controller BMC can
> do the magic for you. So its not always physical access, is it?

Well, idea was that if you have physical access to machine, then all
bets are off. If BMC can do something which allows running unsigned
code at ring level 0, its a problem I think from secureboot model of
security.

>  
> > > 
> > > 
> > > You run a shell on the console.  You now have control of the system,
> > > and can mount the real rootfs, inspect it, and work out what it does,
> > > etc.
> > > 
> > > At this point, what use was all the validation that the secure boot
> > > has done?  Absolutely useless.
> > > 
> > > If you can change the command line arguments given to the kernel, you
> > > have no security, no matter how much you verify signatures.  It's
> > > the illusion of security, nothing more, nothing less.
> > > 
> 
> I agree, if you can change command line arguments, all bets are of lesser 
> value

If changing command line allows execution of unsigned code at ring level
0, then it is a problem. Otherwise we are talking of security issues which
are not covered by secureboot model.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Vivek Goyal
On Fri, Jul 15, 2016 at 09:31:02AM +0200, Arnd Bergmann wrote:
> On Thursday, July 14, 2016 10:44:14 PM CEST Thiago Jung Bauermann wrote:
> > Am Donnerstag, 14 Juli 2016, 10:29:11 schrieb Arnd Bergmann:
> 
> > > 
> > > Right, but the question remains whether this helps while you allow the
> > > boot loader to modify the dtb. If an attacker gets in and cannot modify
> > > the kernel or initid but can modify the DT, a successful attack would
> > > be a bit harder than having a modified kernel, but you may still need
> > > to treat the system as compromised.
> > 
> > Yes, and the same question also remains regarding the kernel command line.
> > 
> > We can have the kernel perform sanity checks on the device tree, just as 
> > the 
> > kernel needs to sanity check the command line.
> > 
> > There's the point that was raised about not wanting to increase the attack 
> > surface, and that's a valid point. But at least in the way Petitboot works 
> > today, it needs to modify the device tree and pass it to the kernel.
> > 
> > One thing that is unavoidable to come from userspace is 
> > /chosen/linux,stdout-path, because it's Petitboot that knows from which 
> > console the user is interacting with. The other modification to set 
> > properties in vga@0 can be done in the kernel.
> > 
> > Given that on DTB-based systems /chosen is an important and established way 
> > to pass information to the operating system being booted, I'd like to 
> > suggest the following, then:
> > 
> > Extend the syscall as shown in this RFC from Takahiro AKASHI, but instead 
> > of 
> > accepting a complete DTB from userspace, the syscall would accept a DTB 
> > containing only a /chosen node. If the DTB contains any other node, the 
> > syscall fails with EINVAL. The kernel can then add the properties in 
> > /chosen 
> > to the device tree that it will pass to the next kernel.
> > 
> > What do you think?
> 
> I think that helps, as it makes the problem space correspond to that
> of modifying the command line, but I can still come up with countless
> attacks based on modifications of the /chosen node and/or the command
> line, in fact it's probably easier than any other node.

I don't know anything about DTB. So here comes a very basic question. Does
DTB allow passing an executable blob to kernel or pass the location of
some unsigned executable code at kernel level. I think from secureboot point of
view that would be a concern. Being able to trick kernel to execute an
unsigned code at privileged level.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 3/3] kexec: extend kexec_file_load system call

2016-07-15 Thread Vivek Goyal
On Tue, Jul 12, 2016 at 10:42:01AM +0900, AKASHI Takahiro wrote:

[..]
> -SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
> +SYSCALL_DEFINE6(kexec_file_load, int, kernel_fd, int, initrd_fd,
>   unsigned long, cmdline_len, const char __user *, cmdline_ptr,
> - unsigned long, flags)
> + unsigned long, flags, const struct kexec_fdset __user *, ufdset)

Can one add more parameters to existing syscall. Can it break existing
programs with new kernel? I was of the impression that one can't do that.
But may be I am missing something.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Vivek Goyal
On Fri, Jul 15, 2016 at 09:49:25AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 13, 2016 at 03:13:42PM +0200, Arnd Bergmann wrote:
> > On Wednesday, July 13, 2016 10:41:28 AM CEST Mark Rutland wrote:
> > > The big question is whether this is a realistic case on a secure boot
> > > system.
> > 
> > What does x86 do here? I assume changes to the command line are also
> > limited.
> 
> They aren't.  You can specify /anything/ even with a fully-signed kernel
> and initrd, which was one of the things I pointed out in my previous
> set of responses.

Yes, kernel command line is not signed. For that matter even initird is
not signed. Just kernel is signed and its signatures are verified. Idea
is an unsigned code should not be able to execute in kernel space.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-13 Thread Vivek Goyal
On Wed, Jul 13, 2016 at 06:40:10PM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 13, 2016 at 09:03:38AM -0400, Vivek Goyal wrote:
> > On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux wrote:
> > > Indeed - maybe Eric knows better, but I can't see any situation where
> > > the dtb we load via kexec should ever affect "the bootloader", unless
> > > the "kernel" that's being loaded into kexec is "the bootloader".
> > > 
> > > Now, going back to the more fundamental issue raised in my first reply,
> > > about the kernel command line.
> > > 
> > > On x86, I can see that it _is_ possible for userspace to specify a
> > > command line, and the kernel loading the image provides the command
> > > line to the to-be-kexeced kernel with very little checking.  So, if
> > > your kernel is signed, what stops the "insecure userspace" loading
> > > a signed kernel but giving it an insecure rootfs and/or console?
> > 
> > It is not kexec specific. I could do this for regular boot too, right?
> > 
> > Command line options are not signed. I thought idea behind secureboot
> > was to execute only trusted code and command line options don't enforce
> > you to execute unsigned code.
> > 
> > So it sounds like different class of security problems which you are
> > referring to and not necessarily covered by secureboot or signed
> > kernel.
> 
> Let me give you an example.
> 
> You have a secure boot setup, where the firmware/ROM validates the boot
> loader.  Good, the boot loader hasn't been tampered with.
> 
> You interrupt the boot loader and are able to modify the command line
> for the booted kernel.
> 
> The boot loader loads the kernel and verifies the kernel's signature.
> Good, the kernel hasn't been tampered with.  The kernel starts running.
> 
> You've plugged in a USB drive to the device, and specified a partition
> containing a root filesystem that you control to the kernel.  The
> validated kernel finds the USB drive, and mounts it, and executes
> your own binaries on the USB drive.

You will require physical access to the machine to be able to
insert your usb drive. And IIRC, argument was that if attacker has
physical access to machine, all bets are off anyway.

> 
> You run a shell on the console.  You now have control of the system,
> and can mount the real rootfs, inspect it, and work out what it does,
> etc.
> 
> At this point, what use was all the validation that the secure boot
> has done?  Absolutely useless.
> 
> If you can change the command line arguments given to the kernel, you
> have no security, no matter how much you verify signatures.  It's
> the illusion of security, nothing more, nothing less.
> 
> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-13 Thread Vivek Goyal
On Wed, Jul 13, 2016 at 09:45:22AM +1000, Stewart Smith wrote:
> Vivek Goyal <vgo...@redhat.com> writes:
> > On Tue, Jul 12, 2016 at 10:58:09AM -0300, Thiago Jung Bauermann wrote:
> >> Hello Eric,
> >> 
> >> Am Dienstag, 12 Juli 2016, 08:25:48 schrieb Eric W. Biederman:
> >> > AKASHI Takahiro <takahiro.aka...@linaro.org> writes:
> >> > > Device tree blob must be passed to a second kernel on DTB-capable
> >> > > archs, like powerpc and arm64, but the current kernel interface
> >> > > lacks this support.
> >> > > 
> >> > > This patch extends kexec_file_load system call by adding an extra
> >> > > argument to this syscall so that an arbitrary number of file 
> >> > > descriptors
> >> > > can be handed out from user space to the kernel.
> >> > > 
> >> > > See the background [1].
> >> > > 
> >> > > Please note that the new interface looks quite similar to the current
> >> > > system call, but that it won't always mean that it provides the "binary
> >> > > compatibility."
> >> > > 
> >> > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html
> >> > 
> >> > So this design is wrong.  The kernel already has the device tree blob,
> >> > you should not be extracting it from the kernel munging it, and then
> >> > reinserting it in the kernel if you want signatures and everything to
> >> > pass.
> >> > 
> >> > What x86 does is pass it's equivalent of the device tree blob from one
> >> > kernel to another directly and behind the scenes.  It does not go
> >> > through userspace for this.
> >> > 
> >> > Until a persuasive case can be made for going around the kernel and
> >> > probably adding a feature (like code execution) that can be used to
> >> > defeat the signature scheme I am going to nack this.
> >> 
> >> There are situations where userspace needs to change things in the device 
> >> tree to be used by the next kernel.
> >> 
> >> For example, Petitboot (the boot loader used in OpenPOWER machines) is a 
> >> userspace application running in an intermediary Linux instance and uses 
> >> kexec to load the target OS. It has to modify the device tree that will be 
> >> used by the next kernel so that the next kernel uses the same console that 
> >> petitboot was configured to use (i.e., set the /chosen/linux,stdout-path 
> >> property). It also modifies the device tree to allow the kernel to inherit 
> >> Petitboot's Openfirmware framebuffer.
> >
> > Can some of this be done with the help of kernel command line options for
> > second kernel?
> 
> how would this be any more secure?
> 
> Passing in an address for a framebuffer via command line option means
> you could scribble over any bit of memory, which is the same kind of
> damage you could do by modifying the device tree.

It is not necessarily safer but works with given framework and we don't
have to modify existing system call.

Also it will allow you to pass in only one thing at a time instead of
allowing passing in new unsigned DTB, which can potentially do lot more.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-13 Thread Vivek Goyal
On Wed, Jul 13, 2016 at 09:41:39AM +1000, Stewart Smith wrote:
> Petr Tesarik  writes:
> > On Tue, 12 Jul 2016 13:25:11 -0300
> > Thiago Jung Bauermann  wrote:
> >
> >> Hi Eric,
> >> 
> >> I'm trying to understand your concerns leading to your nack. I hope you 
> >> don't mind expanding your thoughts on them a bit.
> >> 
> >> Am Dienstag, 12 Juli 2016, 08:25:48 schrieb Eric W. Biederman:
> >> > AKASHI Takahiro  writes:
> >> > > Device tree blob must be passed to a second kernel on DTB-capable
> >> > > archs, like powerpc and arm64, but the current kernel interface
> >> > > lacks this support.
> >> > > 
> >> > > This patch extends kexec_file_load system call by adding an extra
> >> > > argument to this syscall so that an arbitrary number of file 
> >> > > descriptors
> >> > > can be handed out from user space to the kernel.
> >> > > 
> >> > > See the background [1].
> >> > > 
> >> > > Please note that the new interface looks quite similar to the current
> >> > > system call, but that it won't always mean that it provides the "binary
> >> > > compatibility."
> >> > > 
> >> > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html
> >> > 
> >> > So this design is wrong.  The kernel already has the device tree blob,
> >> > you should not be extracting it from the kernel munging it, and then
> >> > reinserting it in the kernel if you want signatures and everything to
> >> > pass.
> >> 
> >> I don't understand how the kernel signature will be invalidated. 
> >> 
> >> There are some types of boot images that can embed a device tree blob in 
> >> them, but the kernel can also be handed a separate device tree blob from 
> >> firmware, the boot loader, or kexec. This latter case is what we are 
> >> discussing, so we are not talking about modifying an embedded blob in the 
> >> kernel image.
> >> 
> >> > What x86 does is pass it's equivalent of the device tree blob from one
> >> > kernel to another directly and behind the scenes.  It does not go
> >> > through userspace for this.
> >> > 
> >> > Until a persuasive case can be made for going around the kernel and
> >> > probably adding a feature (like code execution) that can be used to
> >> > defeat the signature scheme I am going to nack this.
> >> 
> >> I also don't understand what you mean by code execution. How does passing 
> >> a 
> >> device tree blob via kexec enables code execution? How can the signature 
> >> scheme be defeated?
> >
> > I'm not an expert on DTB, so I can't provide an example of code
> > execution, but you have already mentioned the /chosen/linux,stdout-path
> > property. If an attacker redirects the bootloader to an insecure
> > console, they may get access to the system that would otherwise be
> > impossible.
> 
> In this case, the user is sitting at the (or one of the) console(s) of
> the machine. There could be petitboot UIs running on the VGA display,
> IPMI serial over lan, local serial port. The logic behind setting
> /chosen/linux,stdout-path is (currently) mostly to set it for the kernel
> to what the user is interacting with. i.e. if you select an OS installer
> to boot from the VGA console, you get a graphical installer running and
> if you selected it from a text console, you get a text installer running
> (on the appropriate console).
> 
> So the bootloader (petitboot) needs to work out which console is being
> interacted with in order to set up /chosen/linux,stdout-path correctly.
> 
> This specific option could be passed as a kernel command line to the
> next kernel, yes. However, isn't the kernel command line also an attack
> vector? Is *every* command line option safe?

I don't think kernel command line is signed. And we will have to define
what is considered *unsafe*. I am working on the assumption that a
user should not be able to force execution of unsigned code at provileged
level. And passing console on kernel command line should be safe in
that respect?

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-13 Thread Vivek Goyal
On Wed, Jul 13, 2016 at 09:26:39AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 13, 2016 at 05:55:33PM +1000, Stewart Smith wrote:
> > Russell King - ARM Linux  writes:
> > > On Wed, Jul 13, 2016 at 02:59:51PM +1000, Stewart Smith wrote:
> > >> Russell King - ARM Linux  writes:
> > >> > On Tue, Jul 12, 2016 at 10:58:05PM +0200, Petr Tesarik wrote:
> > >> >> I'm not an expert on DTB, so I can't provide an example of code
> > >> >> execution, but you have already mentioned the 
> > >> >> /chosen/linux,stdout-path
> > >> >> property. If an attacker redirects the bootloader to an insecure
> > >> >> console, they may get access to the system that would otherwise be
> > >> >> impossible.
> > >> >
> > >> > I fail to see how kexec connects with the boot loader - the DTB image
> > >> > that's being talked about is one which is passed from the currently
> > >> > running kernel to the to-be-kexec'd kernel.  For ARM (and I suspect
> > >> > also ARM64) that's a direct call chain which doesn't involve any
> > >> > boot loader or firmware, and certainly none that would involve the
> > >> > passed DTB image.
> > >> 
> > >> For OpenPOWER machines, kexec is the bootloader. Our bootloader is a
> > >> linux kernel and initramfs with a UI (petitboot) - this means we never
> > >> have to write a device driver twice: write a kernel one and you're done
> > >> (for booting from the device and using it in your OS).
> > >
> > > I think you misunderstood my point.
> > >
> > > On ARM, we do not go:
> > >
> > >   kernel (kexec'd from) -> boot loader -> kernel (kexec'd to)
> > >
> > > but we go:
> > >
> > >   kernel (kexec'd from) -> kernel (kexec'd to)
> > >
> > > There's no intermediate step involving any bootloader.
> > >
> > > Hence, my point is that the dtb loaded by kexec is _only_ used by the
> > > kernel which is being kexec'd to, not by the bootloader, nor indeed
> > > the kernel which it is loaded into.
> > >
> > > Moreover, if you read the bit that I quoted (which is what I was
> > > replying to), you'll notice that it is talking about the DTB loaded
> > > by kexec somehow causing the _bootloader_ to be redirected to an
> > > alternative console.  This point is wholely false on ARM.
> > 
> > Ahh.. I missed the bootloader bit there.
> > 
> > In which case, we're the same on OpenPOWER, there is no intermediate
> > bootloader - in our case we have linux (with kexec) taking on what uboot
> > or grub is typically used for on other platforms.
> 
> Indeed - maybe Eric knows better, but I can't see any situation where
> the dtb we load via kexec should ever affect "the bootloader", unless
> the "kernel" that's being loaded into kexec is "the bootloader".
> 
> Now, going back to the more fundamental issue raised in my first reply,
> about the kernel command line.
> 
> On x86, I can see that it _is_ possible for userspace to specify a
> command line, and the kernel loading the image provides the command
> line to the to-be-kexeced kernel with very little checking.  So, if
> your kernel is signed, what stops the "insecure userspace" loading
> a signed kernel but giving it an insecure rootfs and/or console?

It is not kexec specific. I could do this for regular boot too, right?

Command line options are not signed. I thought idea behind secureboot
was to execute only trusted code and command line options don't enforce
you to execute unsigned code.

So it sounds like different class of security problems which you are
referring to and not necessarily covered by secureboot or signed
kernel.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-12 Thread Vivek Goyal
On Tue, Jul 12, 2016 at 04:02:46PM +0200, Arnd Bergmann wrote:
> On Tuesday, July 12, 2016 8:25:48 AM CEST Eric W. Biederman wrote:
> > AKASHI Takahiro  writes:
> > 
> > > Device tree blob must be passed to a second kernel on DTB-capable
> > > archs, like powerpc and arm64, but the current kernel interface
> > > lacks this support.
> > >   
> > > This patch extends kexec_file_load system call by adding an extra
> > > argument to this syscall so that an arbitrary number of file descriptors
> > > can be handed out from user space to the kernel.
> > >
> > > See the background [1].
> > >
> > > Please note that the new interface looks quite similar to the current
> > > system call, but that it won't always mean that it provides the "binary
> > > compatibility."
> > >
> > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html
> > 
> > So this design is wrong.  The kernel already has the device tree blob,
> > you should not be extracting it from the kernel munging it, and then
> > reinserting it in the kernel if you want signatures and everything to
> > pass.
> > 
> > What x86 does is pass it's equivalent of the device tree blob from one
> > kernel to another directly and behind the scenes.  It does not go
> > through userspace for this.
> > 
> > Until a persuasive case can be made for going around the kernel and
> > probably adding a feature (like code execution) that can be used to
> > defeat the signature scheme I am going to nack this.
> > 
> > Nacked-by: "Eric W. Biederman" 
> > 
> > I am happy to see support for other architectures, but for the sake of
> > not moving some code in the kernel let's not build an attackable
> > infrastructure.
> > 
> 
> For historic context, the flattened devicetree format that we now use
> to pass data about the system from boot loader to kernel was initially
> introduced specifically for the purpose of enabling kexec:
> 
> On Open Firmware, the DT is extracted from running firmware and copied
> into dynamically allocated data structures. After a kexec, the runtime
> interface to the firmware is not available, so the flattened DT format
> was created as a way to pass the same data in a binary blob to the new
> kernel in a format that can be read from the kernel by walking the
> directories in /proc/device-tree/*.

So this DT is available inside kernel and running kernel can still
retrieve it and pass it to second kernel?

> 
> There are a couple of reasons for modifying the devicetree:
> 
> - For kboot/petitboot, you can have a kernel that is not booted through
>   DT at all but hardwired to a particular machine, and that passes
>   a DT for the entire hardware to the kernel that you actually want to
>   run.
> 
> - for kdump, you need to tell the new kernel about the modified location
>   of the memory, so the dump kernel doesn't overwrite the contents
>   it wants to dump

In x86 we do this with the help of kernel command line options.

> 
> - we typically ship devicetree sources for embedded machines with the
>   kernel sources. As more hardware of the system gets enabled, the
>   devicetree gains extra nodes and properties that describe the hardware
>   more completely, so we need to use the latest DT blob to use all
>   the drivers
> 
> - in some cases, kernels will fail to boot at all with an older version
>   of the DT, or fail to use the devices that were working on the
>   earlier kernel. This is usually considered a bug, but it's not rare
> 
> - In some cases, the kernel can update its DT at runtime, and the new
>   settings are expected to be available in the new kernel too, though
>   there are cases where you actually don't want the modified contents.

I am assuming that modified DT and unmodifed one both are accessible to
kernel. And if user space can make decisions which modfied fields to use
for new kernels and which ones not, then same can be done in kernel too?

Vivek
> 
>   Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-12 Thread Vivek Goyal
On Tue, Jul 12, 2016 at 10:58:09AM -0300, Thiago Jung Bauermann wrote:
> Hello Eric,
> 
> Am Dienstag, 12 Juli 2016, 08:25:48 schrieb Eric W. Biederman:
> > AKASHI Takahiro  writes:
> > > Device tree blob must be passed to a second kernel on DTB-capable
> > > archs, like powerpc and arm64, but the current kernel interface
> > > lacks this support.
> > > 
> > > This patch extends kexec_file_load system call by adding an extra
> > > argument to this syscall so that an arbitrary number of file descriptors
> > > can be handed out from user space to the kernel.
> > > 
> > > See the background [1].
> > > 
> > > Please note that the new interface looks quite similar to the current
> > > system call, but that it won't always mean that it provides the "binary
> > > compatibility."
> > > 
> > > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html
> > 
> > So this design is wrong.  The kernel already has the device tree blob,
> > you should not be extracting it from the kernel munging it, and then
> > reinserting it in the kernel if you want signatures and everything to
> > pass.
> > 
> > What x86 does is pass it's equivalent of the device tree blob from one
> > kernel to another directly and behind the scenes.  It does not go
> > through userspace for this.
> > 
> > Until a persuasive case can be made for going around the kernel and
> > probably adding a feature (like code execution) that can be used to
> > defeat the signature scheme I am going to nack this.
> 
> There are situations where userspace needs to change things in the device 
> tree to be used by the next kernel.
> 
> For example, Petitboot (the boot loader used in OpenPOWER machines) is a 
> userspace application running in an intermediary Linux instance and uses 
> kexec to load the target OS. It has to modify the device tree that will be 
> used by the next kernel so that the next kernel uses the same console that 
> petitboot was configured to use (i.e., set the /chosen/linux,stdout-path 
> property). It also modifies the device tree to allow the kernel to inherit 
> Petitboot's Openfirmware framebuffer.

Can some of this be done with the help of kernel command line options for
second kernel?

Vivek

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal
On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote:
 On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
  dwal...@fifo99.com writes:
  
   On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
   Hidehiro Kawai hidehiro.kawai...@hitachi.com writes:
   
You can call panic notifiers and kmsg dumpers before kdump by
specifying crash_kexec_post_notifiers as a boot parameter.
However, it doesn't make sense if kdump is not available.  In that
case, disable crash_kexec_post_notifiers boot parameter so that
you can't change the value of the parameter.
   
   Nacked-by: Eric W. Biederman ebied...@xmission.com
  
   I think it would make sense if he just replaced kdump with kexec.
  
  It would be less insane, however it still makes no sense as without
  kexec on panic support crash_kexec is a noop.  So the value of the
  seeting makes no difference.
 
 Can you explain more, I don't really understand what you mean. Are you 
 suggesting
 the whole crash_kexec_post_notifiers feature has no value ?

Daniel,

BTW, why are you using crash_kexec_post_notifiers commandline? Why not
without it?

Thanks
Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal
On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote:
 On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
  dwal...@fifo99.com writes:
  
   On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
   Hidehiro Kawai hidehiro.kawai...@hitachi.com writes:
   
You can call panic notifiers and kmsg dumpers before kdump by
specifying crash_kexec_post_notifiers as a boot parameter.
However, it doesn't make sense if kdump is not available.  In that
case, disable crash_kexec_post_notifiers boot parameter so that
you can't change the value of the parameter.
   
   Nacked-by: Eric W. Biederman ebied...@xmission.com
  
   I think it would make sense if he just replaced kdump with kexec.
  
  It would be less insane, however it still makes no sense as without
  kexec on panic support crash_kexec is a noop.  So the value of the
  seeting makes no difference.
 
 Can you explain more, I don't really understand what you mean. Are you 
 suggesting
 the whole crash_kexec_post_notifiers feature has no value ?

If CONFIG_KEXEC=n, then crash_kexec() is a nop. So it does not matter
whether crash_kexec() is called before panic notifiers or after.

IOW, what do you gain by disabling crash_kexec_post_notifiers, in 
case of CONFIG_KEXEC=n?

Thanks
Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal
On Tue, Jul 14, 2015 at 03:34:30PM +, dwal...@fifo99.com wrote:
 On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
  On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote:
   On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
dwal...@fifo99.com writes:

 On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
 Hidehiro Kawai hidehiro.kawai...@hitachi.com writes:
 
  You can call panic notifiers and kmsg dumpers before kdump by
  specifying crash_kexec_post_notifiers as a boot parameter.
  However, it doesn't make sense if kdump is not available.  In that
  case, disable crash_kexec_post_notifiers boot parameter so that
  you can't change the value of the parameter.
 
 Nacked-by: Eric W. Biederman ebied...@xmission.com

 I think it would make sense if he just replaced kdump with kexec.

It would be less insane, however it still makes no sense as without
kexec on panic support crash_kexec is a noop.  So the value of the
seeting makes no difference.
   
   Can you explain more, I don't really understand what you mean. Are you 
   suggesting
   the whole crash_kexec_post_notifiers feature has no value ?
  
  Daniel,
  
  BTW, why are you using crash_kexec_post_notifiers commandline? Why not
  without it?
 
 It was explained in the prior thread but to rehash, the notifiers are used to 
 do a switch
 over from the crashed machine to another redundant machine.

So why not detect failure using polling or issue notifications from second
kernel.

IOW, expecting that a crashed machine will be able to deliver notification
reliably is falwed to begin with, IMHO.

If a machine is failing, there are high chance it can't deliver you the
notification. Detecting that failure suing some kind of polling mechanism
might be more reliable. And it will make even kdump mechanism more
reliable so that it does not have to run panic notifiers after the crash.

Thanks
Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal
On Tue, Jul 14, 2015 at 03:48:33PM +, dwal...@fifo99.com wrote:
 On Tue, Jul 14, 2015 at 11:40:40AM -0400, Vivek Goyal wrote:
  On Tue, Jul 14, 2015 at 03:34:30PM +, dwal...@fifo99.com wrote:
   On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
On Tue, Jul 14, 2015 at 01:59:19PM +, dwal...@fifo99.com wrote:
 On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
  dwal...@fifo99.com writes:
  
   On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
   Hidehiro Kawai hidehiro.kawai...@hitachi.com writes:
   
You can call panic notifiers and kmsg dumpers before kdump by
specifying crash_kexec_post_notifiers as a boot parameter.
However, it doesn't make sense if kdump is not available.  In 
that
case, disable crash_kexec_post_notifiers boot parameter so 
that
you can't change the value of the parameter.
   
   Nacked-by: Eric W. Biederman ebied...@xmission.com
  
   I think it would make sense if he just replaced kdump with 
   kexec.
  
  It would be less insane, however it still makes no sense as without
  kexec on panic support crash_kexec is a noop.  So the value of the
  seeting makes no difference.
 
 Can you explain more, I don't really understand what you mean. Are 
 you suggesting
 the whole crash_kexec_post_notifiers feature has no value ?

Daniel,

BTW, why are you using crash_kexec_post_notifiers commandline? Why not
without it?
   
   It was explained in the prior thread but to rehash, the notifiers are 
   used to do a switch
   over from the crashed machine to another redundant machine.
  
  So why not detect failure using polling or issue notifications from second
  kernel.
  
  IOW, expecting that a crashed machine will be able to deliver notification
  reliably is falwed to begin with, IMHO.
 
 It's flawed to think you can kexec, but you still do it right ? I've not 
 gotten into
 the deep details of this switching process, but that's how this interface is 
 used.

Sure. But the deal here is that users of interface know that sometimes it
can be unreliable. And in the absence of more reliable mechanism, somewhat
less reliable mechanism is fine. 

  
  If a machine is failing, there are high chance it can't deliver you the
  notification. Detecting that failure suing some kind of polling mechanism
  might be more reliable. And it will make even kdump mechanism more
  reliable so that it does not have to run panic notifiers after the crash.
 
 I think what your suggesting is that my company should change how it's 
 hardware works
 and that's not really an option for me. This isn't a simple thing like 
 checking over the
 network if the machine is down or not, this is way more complex hardware 
 design.

That means you are ready to live with an unreliable design. There might be
cases where notifier does not get run properly and you will not do switch
despite the fact that OS has failed. I was just trying to nudge you in
a direction which could be more reliable mechanism.

Thanks
Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/3] kexec: Change the timing of callbacks related to crash_kexec_post_notifiers boot option

2015-07-14 Thread Vivek Goyal
On Fri, Jul 10, 2015 at 08:33:31PM +0900, Hidehiro Kawai wrote:
 This patch fixes problems reported by Daniel Walker
 (https://lkml.org/lkml/2015/6/24/44), and also replaces the bug fix
 commits 5375b70 and f45d85f.
 
 If crash_kexec_post_notifiers boot option is specified,
 other cpus are stopped by smp_send_stop() before entering
 crash_kexec(), while usually machine_crash_shutdown() called by
 crash_kexec() does that.  This behavior change leads two problems.
 
  Problem 1:
  Some function in the crash_kexec() path depend on other cpus being
  still online.  If other cpus have been offlined already, they
  doesn't work properly.
 
   Example:
panic()
 crash_kexec()
  machine_crash_shutdown()
   octeon_generic_shutdown() // shutdown watchdog for ONLINE cpus
  machine_kexec()
 
  Problem 2:
  Most of architectures stop other cpus in the machine_crash_shutdown()
  path and save register information at the same time.  However, if
  smp_send_stop() is called before that, we can't save the register
  information.
 
 To solve these problems, this patch changes the timing of calling
 the callbacks instead of changing the timing of crash_kexec() if
 crash_kexec_post_notifiers boot option is specified.
 
  Before:
   if (!crash_kexec_post_notifiers)
   crash_kexec()
 
   smp_send_stop()
   atomic_notifier_call_chain()
   kmsg_dump()
 
   if (crash_kexec_post_notifiers)
   crash_kexec()
 
  After:
   crash_kexec()
   machine_crash_shutdown()
   if (crash_kexec_post_notifiers) {
   atomic_notifier_call_chain()
   kmsg_dump()
   }
   machine_kexec()
 
   smp_send_stop()
   if (!crash_kexec_post_notifiers) {
   atomic_notifier_call_chain()
   kmsg_dump()
   }
 

I think this new code flow looks bad. Now we are calling kmsg_dump()
and atomic_notifier_call_chain() from inside the crash_kexec() as well
as from inside panic(). This is bad.

So basic problem seems to be that cpus need to be stopped once (with
or without panic notifiers. So why don't we look into desiginig a 
function which stops cpus, saves register states first and then does
rest of the processing.

Something like.

stop_cpus_save_register_state;

if (!crash_kexec_post_notifiers)
crash_kexec()

atomic_notifier_call_chain()
kmsg_dump()

Here crash_kexec() will have to be modified and it will assume that cpus
have already been stopped and register states have already been saved.

IOW, is there a reason that we can't get rid of smp_send_stop() and
use the mechanism crash_kexec() is using to stop cpus after panic()?

Thanks
Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal
On Tue, Jul 14, 2015 at 01:01:12PM -0500, Eric W. Biederman wrote:
 Vivek Goyal vgo...@redhat.com writes:
 
  On Tue, Jul 14, 2015 at 05:29:53PM +, dwal...@fifo99.com wrote:
 
  [..]
 If a machine is failing, there are high chance it can't deliver you 
 the
 notification. Detecting that failure suing some kind of polling 
 mechanism
 might be more reliable. And it will make even kdump mechanism more
 reliable so that it does not have to run panic notifiers after the 
 crash.

I think what your suggesting is that my company should change how 
it's hardware works
and that's not really an option for me. This isn't a simple thing 
like checking over the
network if the machine is down or not, this is way more complex 
hardware design.
   
That means you are ready to live with an unreliable design. There 
might be
cases where notifier does not get run properly and you will not do 
switch
despite the fact that OS has failed. I was just trying to nudge you in
a direction which could be more reliable mechanism.
   
   Sigh I see some deep confusion going on here.
   
   The panic notifiers are just that panic notifiers.  They have not been
   nor should they be tied to kexec.   If those notifiers force a switch
   over of between machines I fail to see why you would care if it was
   kexec or another panic situation that is forcing that switchover.
  
  Hidehiro isn't fixing the failover situation on my side, he's fixing 
  register
  information collection when crash_kexec_post_notifiers is used.
 
  Sure. Given that we have created this new parameter, let us fix it so that
  we can capture the other cpu register state in crash dump.
 
  I am little disappointed that it was not tested well when this parameter was
  introuced. We should have atleast tested it to the extent to see if there
  is proper cpu state present for all cpus in the crash dump.
 
  At that point of time it looked like a simple modification
  to allow panic notifiers before crash_kexec().
 
 Either that or we say no one cares enough, and it known broken so let's
 just revert the fool thing.

Masami, you introduced this option. Are you fine with the revert? Is it
really being used and tested?

 I honestly can't see how to support panic notifiers, before kexec.
 There is no way to tell what is being done and all of the pieces
 including smp_send_stop are known to be buggy.

we should be able to replace smp_send_stop() with what crash_kexec() is
doing to stop the machine? If yes, then it should be fine I guess. This
parameter description clearly says that specify it at your own risk. So
we are not issuing a big support statement for successful kdump after
panic notifiers. If it is something fixable, otherwise user needs
to deal with it.

Thanks
Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

2015-07-14 Thread Vivek Goyal
On Tue, Jul 14, 2015 at 05:29:53PM +, dwal...@fifo99.com wrote:

[..]
If a machine is failing, there are high chance it can't deliver you the
notification. Detecting that failure suing some kind of polling 
mechanism
might be more reliable. And it will make even kdump mechanism more
reliable so that it does not have to run panic notifiers after the 
crash.
   
   I think what your suggesting is that my company should change how it's 
   hardware works
   and that's not really an option for me. This isn't a simple thing like 
   checking over the
   network if the machine is down or not, this is way more complex hardware 
   design.
  
   That means you are ready to live with an unreliable design. There might be
   cases where notifier does not get run properly and you will not do switch
   despite the fact that OS has failed. I was just trying to nudge you in
   a direction which could be more reliable mechanism.
  
  Sigh I see some deep confusion going on here.
  
  The panic notifiers are just that panic notifiers.  They have not been
  nor should they be tied to kexec.   If those notifiers force a switch
  over of between machines I fail to see why you would care if it was
  kexec or another panic situation that is forcing that switchover.
 
 Hidehiro isn't fixing the failover situation on my side, he's fixing register
 information collection when crash_kexec_post_notifiers is used.

Sure. Given that we have created this new parameter, let us fix it so that
we can capture the other cpu register state in crash dump.

I am little disappointed that it was not tested well when this parameter was
introuced. We should have atleast tested it to the extent to see if there
is proper cpu state present for all cpus in the crash dump.

At that point of time it looked like a simple modification
to allow panic notifiers before crash_kexec().

Thanks
Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V25/5] powerpc/kexec: Use global IND_FLAGS macro

2014-10-07 Thread Vivek Goyal
On Tue, Oct 07, 2014 at 12:21:30AM +, Geoff Levand wrote:
 linux/kexec.h now defines an IND_FLAGS macro.  Remove the local powerpc
 definition and use the generic one.
 
 Signed-off-by: Geoff Levand ge...@infradead.org

I think this patch should be merged in previous patch. I guess after
applying patch4, series might not be compilable as there are two
definitions of IND_FLAGS now.

Thanks
Vivek

 ---
  arch/powerpc/kernel/machine_kexec_64.c | 2 --
  1 file changed, 2 deletions(-)
 
 diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
 b/arch/powerpc/kernel/machine_kexec_64.c
 index 879b3aa..75652a32 100644
 --- a/arch/powerpc/kernel/machine_kexec_64.c
 +++ b/arch/powerpc/kernel/machine_kexec_64.c
 @@ -96,8 +96,6 @@ int default_machine_kexec_prepare(struct kimage *image)
   return 0;
  }
  
 -#define IND_FLAGS (IND_DESTINATION | IND_INDIRECTION | IND_DONE | IND_SOURCE)
 -
  static void copy_segments(unsigned long ind)
  {
   unsigned long entry;
 -- 
 1.9.1
 
 
 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5] kexec: minor fixups and enhancements

2014-08-26 Thread Vivek Goyal
On Tue, Aug 26, 2014 at 05:33:28PM -0700, Geoff Levand wrote:
 Hi Vivek,
 
 On Mon, 2014-08-25 at 12:59 -0400, Vivek Goyal wrote:
  Does arm64 has secureboot? If yes, then it might make sense to
  enable the new syscall kexec_file_load() on arm64 instead of trying
  to make old syscall work first.
 
 Yes, arm64 should support secureboot, but I have not looked into it.
 
 When do you expect syscall kexec_file_load to get merged?  I won't wait
 until then to push my current kexec work.

It got merged in 3.17-rc1

Thanks
Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5] kexec: minor fixups and enhancements

2014-08-25 Thread Vivek Goyal
On Fri, Aug 22, 2014 at 06:39:47PM +, Geoff Levand wrote:
 Hi,
 
 Here are a few minor fixups and enhancements for kexec support. 
 
 Patch 3 and 4 that add preprocessor macros for the kimage list flags are
 ones that I use in the arm64 kexec support I am working on, so it would
 be nice for those to go in.
 
 Please consider.

Hi Geoff,

Does arm64 has secureboot? If yes, then it might make sense to
enable the new syscall kexec_file_load() on arm64 instead of trying
to make old syscall work first.

Thanks
Vivek

 
 -Geoff
 
 The following changes since commit 7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9:
 
   Linux 3.17-rc1 (2014-08-16 10:40:26 -0600)
 
 are available in the git repository at:
 
   git://git.linaro.org/people/geoff.levand/linux-kexec.git for-kexec
 
 for you to fetch changes up to dc3c63e69815e00dce5454bd50e2991e9ea33f64:
 
   powerpc/kexec: Use global IND_FLAGS macro (2014-08-22 11:15:18 -0700)
 
 
 Geoff Levand (5):
   kexec: Fix make headers_check
   kexec: Simplify conditional
   kexec: Add bit definitions for kimage entry flags
   kexec: Add IND_FLAGS macro
   powerpc/kexec: Use global IND_FLAGS macro
 
  arch/powerpc/kernel/machine_kexec_64.c |  2 --
  include/linux/kexec.h  | 20 
  include/uapi/linux/kexec.h |  6 --
  kernel/kexec.c | 17 ++---
  4 files changed, 26 insertions(+), 19 deletions(-)
 
 -- 
 1.9.1
 
 
 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/5] kexec: Fix make headers_check

2014-08-25 Thread Vivek Goyal
On Fri, Aug 22, 2014 at 06:39:47PM +, Geoff Levand wrote:
 Remove the unneded declaration for a kexec_load() routine.
 
 Fixes errors like these when running 'make headers_check':
 
 include/uapi/linux/kexec.h: userspace cannot reference function or variable 
 defined in the kernel
 
 Signed-off-by: Geoff Levand ge...@infradead.org

I think Paul Bolle tried to remove this in the past and maximilian
had objections.

http://lists.infradead.org/pipermail/kexec/2014-January/010902.html

I can't see that how exporting kernel prototype helps here. kexec-tools
seems to be using syscall(__NR_kexec_load) directly for non-xen case. So
I would be fine with removing this definition. Just trying to make sure
that it does not break any other library or users of this declaration.

Thanks
Vivek


 ---
  include/uapi/linux/kexec.h | 6 --
  1 file changed, 6 deletions(-)
 
 diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
 index 6925f5b..99048e5 100644
 --- a/include/uapi/linux/kexec.h
 +++ b/include/uapi/linux/kexec.h
 @@ -55,12 +55,6 @@ struct kexec_segment {
   size_t memsz;
  };
  
 -/* Load a new kernel image as described by the kexec_segment array
 - * consisting of passed number of segments at the entry-point address.
 - * The flags allow different useage types.
 - */
 -extern int kexec_load(void *, size_t, struct kexec_segment *,
 - unsigned long int);
  #endif /* __KERNEL__ */
  
  #endif /* _UAPILINUX_KEXEC_H */
 -- 
 1.9.1
 
 
 
 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/5] kexec: Simplify conditional

2014-08-25 Thread Vivek Goyal
On Fri, Aug 22, 2014 at 06:39:47PM +, Geoff Levand wrote:
 Simplify the code around one of the conditionals in the kexec_load
 syscall routine.
 
 The original code was confusing with a redundant check on KEXEC_ON_CRASH
 and comments outside of the conditional block.  This change switches the
 order of the conditional check, and cleans up the comments for the
 conditional.  There is no functional change to the code.
 
 Signed-off-by: Geoff Levand ge...@infradead.org

This is simple reorganization.

Acked-by: Vivek Goyal vgo...@redhat.com

Vivek

 ---
  kernel/kexec.c | 17 ++---
  1 file changed, 10 insertions(+), 7 deletions(-)
 
 diff --git a/kernel/kexec.c b/kernel/kexec.c
 index 0b49a0a..d04b56e 100644
 --- a/kernel/kexec.c
 +++ b/kernel/kexec.c
 @@ -1282,19 +1282,22 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, 
 unsigned long, nr_segments,
   if (nr_segments  0) {
   unsigned long i;
  
 - /* Loading another kernel to reboot into */
 - if ((flags  KEXEC_ON_CRASH) == 0)
 - result = kimage_alloc_init(image, entry, nr_segments,
 -segments, flags);
 - /* Loading another kernel to switch to if this one crashes */
 - else if (flags  KEXEC_ON_CRASH) {
 - /* Free any current crash dump kernel before
 + if (flags  KEXEC_ON_CRASH) {
 + /*
 +  * Loading another kernel to switch to if this one
 +  * crashes.  Free any current crash dump kernel before
* we corrupt it.
*/
 +
   kimage_free(xchg(kexec_crash_image, NULL));
   result = kimage_alloc_init(image, entry, nr_segments,
  segments, flags);
   crash_map_reserved_pages();
 + } else {
 + /* Loading another kernel to reboot into. */
 +
 + result = kimage_alloc_init(image, entry, nr_segments,
 +segments, flags);
   }
   if (result)
   goto out;
 -- 
 1.9.1
 
 
 
 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/5] kexec: Add bit definitions for kimage entry flags

2014-08-25 Thread Vivek Goyal
On Fri, Aug 22, 2014 at 06:39:47PM +, Geoff Levand wrote:
 Define new kexec preprocessor macros IND_*_BIT that define the bit position of
 the kimage entry flags.  Change the existing IND_* flag macros to be defined 
 as
 bit shifts of the corresponding IND_*_BIT macros.  Also wrap all C language 
 code
 in kexec.h with #if !defined(__ASSEMBLY__) so assembly files can include 
 kexec.h
 to get the IND_* and IND_*_BIT macros.
 
 Some CPU instruction sets have tests for bit position which are convenient in
 implementing routines that operate on the kimage entry list.  The addition of
 these bit position macros in a common location will avoid duplicate 
 definitions
 and the chance that changes to the IND_* flags will not be propagated to
 assembly files.
 
 Signed-off-by: Geoff Levand ge...@infradead.org

Looks good to me.

Acked-by: Vivek Goyal vgo...@redhat.com

Vivek

 ---
  include/linux/kexec.h | 19 +++
  1 file changed, 15 insertions(+), 4 deletions(-)
 
 diff --git a/include/linux/kexec.h b/include/linux/kexec.h
 index 4b2a0e1..8c628ca 100644
 --- a/include/linux/kexec.h
 +++ b/include/linux/kexec.h
 @@ -1,6 +1,18 @@
  #ifndef LINUX_KEXEC_H
  #define LINUX_KEXEC_H
  
 +#define IND_DESTINATION_BIT 0
 +#define IND_INDIRECTION_BIT 1
 +#define IND_DONE_BIT2
 +#define IND_SOURCE_BIT  3
 +
 +#define IND_DESTINATION  (1  IND_DESTINATION_BIT)
 +#define IND_INDIRECTION  (1  IND_INDIRECTION_BIT)
 +#define IND_DONE (1  IND_DONE_BIT)
 +#define IND_SOURCE   (1  IND_SOURCE_BIT)
 +
 +#if !defined(__ASSEMBLY__)
 +
  #include uapi/linux/kexec.h
  
  #ifdef CONFIG_KEXEC
 @@ -64,10 +76,6 @@
   */
  
  typedef unsigned long kimage_entry_t;
 -#define IND_DESTINATION  0x1
 -#define IND_INDIRECTION  0x2
 -#define IND_DONE 0x4
 -#define IND_SOURCE   0x8
  
  struct kexec_segment {
   /*
 @@ -312,4 +320,7 @@ struct task_struct;
  static inline void crash_kexec(struct pt_regs *regs) { }
  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
  #endif /* CONFIG_KEXEC */
 +
 +#endif /* !defined(__ASSEBMLY__) */
 +
  #endif /* LINUX_KEXEC_H */
 -- 
 1.9.1
 
 
 
 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/5] kexec: Add IND_FLAGS macro

2014-08-25 Thread Vivek Goyal
On Fri, Aug 22, 2014 at 06:39:47PM +, Geoff Levand wrote:
 Add a new kexec preprocessor macro IND_FLAGS, which is the bitwise OR of
 all the possible kexec IND_ kimage_entry indirection flags.
 
 Having this macro allows for simplified code in the prosessing of the
 kexec kimage_entry items.
 
 Signed-off-by: Geoff Levand ge...@infradead.org

Acked-by: Vivek Goyal vgo...@redhat.com

Vivek

 ---
  include/linux/kexec.h | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/include/linux/kexec.h b/include/linux/kexec.h
 index 8c628ca..a4758f9 100644
 --- a/include/linux/kexec.h
 +++ b/include/linux/kexec.h
 @@ -10,6 +10,7 @@
  #define IND_INDIRECTION  (1  IND_INDIRECTION_BIT)
  #define IND_DONE (1  IND_DONE_BIT)
  #define IND_SOURCE   (1  IND_SOURCE_BIT)
 +#define IND_FLAGS (IND_DESTINATION | IND_INDIRECTION | IND_DONE | IND_SOURCE)
  
  #if !defined(__ASSEMBLY__)
  
 -- 
 1.9.1
 
 
 
 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode

2014-06-06 Thread Vivek Goyal
On Fri, Jun 06, 2014 at 06:00:43PM +0530, Srivatsa S. Bhat wrote:
 On 06/04/2014 07:16 PM, Vivek Goyal wrote:
  On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote:
  On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
  Yep, that makes sense. But unfortunately I don't have enough insight into
  why exactly powerpc has to online the CPUs before doing a kexec. I just
  know from the commit log and the comment mentioned above (and from my own
  experiments) that the CPUs will get stuck if they were offline. Perhaps
  somebody more knowledgeable can explain this in detail and suggest a 
  proper
  long-term solution.
 
  Matt, Ben, any thoughts on this?
 
  The problem is with our soft offline which we do on some platforms. When 
  we
  offline we don't actually send the CPUs back to firmware or anything like 
  that.
 
  We put them into a very low low power loop inside Linux.
 
  The new kernel has no way to extract them from that loop. So we must 
  re-online
  them before we kexec so they can be passed to the new kernel normally (or 
  returned
  to firmware like we do on powernv).
  
  Srivatsa,
  
  Looks like your patch has been merged.
  
  I don't like the following change in arch independent code.
  
  /*
   * migrate_to_reboot_cpu() disables CPU hotplug assuming  that
   * no further code needs to use CPU hotplug (which is true in
   * the reboot case). However, the kexec path depends on  using
   * CPU hotplug again; so re-enable it here. 
   */
 cpu_hotplug_enable();
  
  As it is very powerpc specific requirement, can you enable hotplug in 
  powerpc
  arch dependent code as a short term solution.
  
 
 I didn't do that because that would mean that the _disable() would be
 performed inside kernel/kexec.c and the corresponding _enable() would
 be performed in arch/powerpc/kernel/machine_kexec_64.c -- with no apparent
 connection between them, which would have made them hard to relate.

Which we are doing anyway. The difference is that now we are doing it
for all arches.

If this is powerpc specific requirement, then we should limit this to
powerpc only and not let spill over in generic code.

And putting a big fat comment should take care of being able to figure
out why arch code is overwriting the generic code's decision. By putting
it in generic code and enforcing this on all arches does not buy us
anything, IMHO.


 
  Ideally one needs to fix the requirement of online all cpus in powerpc
  as a long term solution and then get rid of hotplug enable call.
  
 
 Yes, I agree. I'm trying out a solution at the moment (see the 4
 preliminary patches I sent in my reply to Ben). If that works, we won't
 need the enable call on powerpc.

Thanks. This will help.

Thanks
Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode

2014-06-04 Thread Vivek Goyal
On Wed, Jun 04, 2014 at 01:58:40AM +0530, Srivatsa S. Bhat wrote:
 On 05/28/2014 07:01 PM, Vivek Goyal wrote:
  On Tue, May 27, 2014 at 04:25:34PM +0530, Srivatsa S. Bhat wrote:
  If we try to perform a kexec when the machine is in ST (Single-Threaded) 
  mode
  (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
  get the following messages during boot:
 
  [0.089866] POWER8 performance monitor hardware support registered
  [0.089985] power8-pmu: PMAO restore workaround active.
  [5.095419] Processor 1 is stuck.
  [   10.097933] Processor 2 is stuck.
  [   15.100480] Processor 3 is stuck.
  [   20.102982] Processor 4 is stuck.
  [   25.105489] Processor 5 is stuck.
  [   30.108005] Processor 6 is stuck.
  [   35.110518] Processor 7 is stuck.
  [   40.113369] Processor 9 is stuck.
  [   45.115879] Processor 10 is stuck.
  [   50.118389] Processor 11 is stuck.
  [   55.120904] Processor 12 is stuck.
  [   60.123425] Processor 13 is stuck.
  [   65.125970] Processor 14 is stuck.
  [   70.128495] Processor 15 is stuck.
  [   75.131316] Processor 17 is stuck.
 
  Note that only the sibling threads are stuck, while the primary threads 
  (0, 8,
  16 etc) boot just fine. Looking closer at the previous step of kexec, we 
  observe
  that kexec tries to wakeup (bring online) the sibling threads of all the 
  cores,
  before performing kexec:
 
  [ 9464.131231] Starting new kernel
  [ 9464.148507] kexec: Waking offline cpu 1.
  [ 9464.148552] kexec: Waking offline cpu 2.
  [ 9464.148600] kexec: Waking offline cpu 3.
  [ 9464.148636] kexec: Waking offline cpu 4.
  [ 9464.148671] kexec: Waking offline cpu 5.
  [ 9464.148708] kexec: Waking offline cpu 6.
  [ 9464.148743] kexec: Waking offline cpu 7.
  [ 9464.148779] kexec: Waking offline cpu 9.
  [ 9464.148815] kexec: Waking offline cpu 10.
  [ 9464.148851] kexec: Waking offline cpu 11.
  [ 9464.148887] kexec: Waking offline cpu 12.
  [ 9464.148922] kexec: Waking offline cpu 13.
  [ 9464.148958] kexec: Waking offline cpu 14.
  [ 9464.148994] kexec: Waking offline cpu 15.
  [ 9464.149030] kexec: Waking offline cpu 17.
 
  Instrumenting this piece of code revealed that the cpu_up() operation 
  actually
  fails with -EBUSY. Thus, only the primary threads of all the cores are 
  online
  during kexec, and hence this is a sure-shot receipe for disaster, as 
  explained
  in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across 
  kexec),
  as well as in the comment above wake_offline_cpus().
 
  It turns out that cpu_up() was returning -EBUSY because the variable
  'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was 
  done
  by migrate_to_reboot_cpu() inside kernel_kexec().
 
  Now, migrate_to_reboot_cpu() was originally written with the assumption 
  that
  any further code will not need to perform CPU hotplug, since we are anyway 
  in
  the reboot path. However, kexec is clearly not such a case, since we 
  depend on
  onlining CPUs, atleast on powerpc.
 
  So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in 
  the
  kexec path, to fix this regression in kexec on powerpc.
 
  Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that 
  we
  can catch such issues more easily in the future.
 
  Fixes: c97102ba963 (kexec: migrate to reboot cpu)
  Cc: sta...@vger.kernel.org
  Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
  ---
 
   arch/powerpc/kernel/machine_kexec_64.c |2 +-
   kernel/kexec.c |8 
   2 files changed, 9 insertions(+), 1 deletion(-)
 
  diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
  b/arch/powerpc/kernel/machine_kexec_64.c
  index 59d229a..879b3aa 100644
  --- a/arch/powerpc/kernel/machine_kexec_64.c
  +++ b/arch/powerpc/kernel/machine_kexec_64.c
  @@ -237,7 +237,7 @@ static void wake_offline_cpus(void)
 if (!cpu_online(cpu)) {
 printk(KERN_INFO kexec: Waking offline cpu %d.\n,
cpu);
  -  cpu_up(cpu);
  +  WARN_ON(cpu_up(cpu));
 }
 }
   }
  diff --git a/kernel/kexec.c b/kernel/kexec.c
  index c8380ad..28c5706 100644
  --- a/kernel/kexec.c
  +++ b/kernel/kexec.c
  @@ -1683,6 +1683,14 @@ int kernel_kexec(void)
 kexec_in_progress = true;
 kernel_restart_prepare(NULL);
 migrate_to_reboot_cpu();
  +
  +  /*
  +   * migrate_to_reboot_cpu() disables CPU hotplug assuming that
  +   * no further code needs to use CPU hotplug (which is true in
  +   * the reboot case). However, the kexec path depends on using
  +   * CPU hotplug again; so re-enable it here.
  +   */
  +  cpu_hotplug_enable();
 printk(KERN_EMERG Starting new kernel\n);
 machine_shutdown();
  
  After migrate_to_reboot_cpu(), we are calling machine_shutdown() which
  calls disable_nonboot_cpus() and which

Re: [PATCH] powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode

2014-06-04 Thread Vivek Goyal
On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote:
 On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
  Yep, that makes sense. But unfortunately I don't have enough insight into
  why exactly powerpc has to online the CPUs before doing a kexec. I just
  know from the commit log and the comment mentioned above (and from my own
  experiments) that the CPUs will get stuck if they were offline. Perhaps
  somebody more knowledgeable can explain this in detail and suggest a proper
  long-term solution.
  
  Matt, Ben, any thoughts on this?
 
 The problem is with our soft offline which we do on some platforms. When we
 offline we don't actually send the CPUs back to firmware or anything like 
 that.
 
 We put them into a very low low power loop inside Linux.
 
 The new kernel has no way to extract them from that loop. So we must 
 re-online
 them before we kexec so they can be passed to the new kernel normally (or 
 returned
 to firmware like we do on powernv).

Srivatsa,

Looks like your patch has been merged.

I don't like the following change in arch independent code.

/*
 * migrate_to_reboot_cpu() disables CPU hotplug assuming  that
 * no further code needs to use CPU hotplug (which is true in
 * the reboot case). However, the kexec path depends on  using
 * CPU hotplug again; so re-enable it here. 
 */
   cpu_hotplug_enable();

As it is very powerpc specific requirement, can you enable hotplug in powerpc
arch dependent code as a short term solution.

Ideally one needs to fix the requirement of online all cpus in powerpc
as a long term solution and then get rid of hotplug enable call.

Thanks
Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode

2014-05-28 Thread Vivek Goyal
On Tue, May 27, 2014 at 04:25:34PM +0530, Srivatsa S. Bhat wrote:
 If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
 (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
 get the following messages during boot:
 
 [0.089866] POWER8 performance monitor hardware support registered
 [0.089985] power8-pmu: PMAO restore workaround active.
 [5.095419] Processor 1 is stuck.
 [   10.097933] Processor 2 is stuck.
 [   15.100480] Processor 3 is stuck.
 [   20.102982] Processor 4 is stuck.
 [   25.105489] Processor 5 is stuck.
 [   30.108005] Processor 6 is stuck.
 [   35.110518] Processor 7 is stuck.
 [   40.113369] Processor 9 is stuck.
 [   45.115879] Processor 10 is stuck.
 [   50.118389] Processor 11 is stuck.
 [   55.120904] Processor 12 is stuck.
 [   60.123425] Processor 13 is stuck.
 [   65.125970] Processor 14 is stuck.
 [   70.128495] Processor 15 is stuck.
 [   75.131316] Processor 17 is stuck.
 
 Note that only the sibling threads are stuck, while the primary threads (0, 8,
 16 etc) boot just fine. Looking closer at the previous step of kexec, we 
 observe
 that kexec tries to wakeup (bring online) the sibling threads of all the 
 cores,
 before performing kexec:
 
 [ 9464.131231] Starting new kernel
 [ 9464.148507] kexec: Waking offline cpu 1.
 [ 9464.148552] kexec: Waking offline cpu 2.
 [ 9464.148600] kexec: Waking offline cpu 3.
 [ 9464.148636] kexec: Waking offline cpu 4.
 [ 9464.148671] kexec: Waking offline cpu 5.
 [ 9464.148708] kexec: Waking offline cpu 6.
 [ 9464.148743] kexec: Waking offline cpu 7.
 [ 9464.148779] kexec: Waking offline cpu 9.
 [ 9464.148815] kexec: Waking offline cpu 10.
 [ 9464.148851] kexec: Waking offline cpu 11.
 [ 9464.148887] kexec: Waking offline cpu 12.
 [ 9464.148922] kexec: Waking offline cpu 13.
 [ 9464.148958] kexec: Waking offline cpu 14.
 [ 9464.148994] kexec: Waking offline cpu 15.
 [ 9464.149030] kexec: Waking offline cpu 17.
 
 Instrumenting this piece of code revealed that the cpu_up() operation actually
 fails with -EBUSY. Thus, only the primary threads of all the cores are online
 during kexec, and hence this is a sure-shot receipe for disaster, as explained
 in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across kexec),
 as well as in the comment above wake_offline_cpus().
 
 It turns out that cpu_up() was returning -EBUSY because the variable
 'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was 
 done
 by migrate_to_reboot_cpu() inside kernel_kexec().
 
 Now, migrate_to_reboot_cpu() was originally written with the assumption that
 any further code will not need to perform CPU hotplug, since we are anyway in
 the reboot path. However, kexec is clearly not such a case, since we depend on
 onlining CPUs, atleast on powerpc.
 
 So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
 kexec path, to fix this regression in kexec on powerpc.
 
 Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
 can catch such issues more easily in the future.
 
 Fixes: c97102ba963 (kexec: migrate to reboot cpu)
 Cc: sta...@vger.kernel.org
 Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 ---
 
  arch/powerpc/kernel/machine_kexec_64.c |2 +-
  kernel/kexec.c |8 
  2 files changed, 9 insertions(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
 b/arch/powerpc/kernel/machine_kexec_64.c
 index 59d229a..879b3aa 100644
 --- a/arch/powerpc/kernel/machine_kexec_64.c
 +++ b/arch/powerpc/kernel/machine_kexec_64.c
 @@ -237,7 +237,7 @@ static void wake_offline_cpus(void)
   if (!cpu_online(cpu)) {
   printk(KERN_INFO kexec: Waking offline cpu %d.\n,
  cpu);
 - cpu_up(cpu);
 + WARN_ON(cpu_up(cpu));
   }
   }
  }
 diff --git a/kernel/kexec.c b/kernel/kexec.c
 index c8380ad..28c5706 100644
 --- a/kernel/kexec.c
 +++ b/kernel/kexec.c
 @@ -1683,6 +1683,14 @@ int kernel_kexec(void)
   kexec_in_progress = true;
   kernel_restart_prepare(NULL);
   migrate_to_reboot_cpu();
 +
 + /*
 +  * migrate_to_reboot_cpu() disables CPU hotplug assuming that
 +  * no further code needs to use CPU hotplug (which is true in
 +  * the reboot case). However, the kexec path depends on using
 +  * CPU hotplug again; so re-enable it here.
 +  */
 + cpu_hotplug_enable();
   printk(KERN_EMERG Starting new kernel\n);
   machine_shutdown();

After migrate_to_reboot_cpu(), we are calling machine_shutdown() which
calls disable_nonboot_cpus() and which in turn calls _cpu_down().

So it is kind of odd that we first migrate to boot cpu, and then disable
all non-boot cpus and after that powerpc goes ahead and onlines all
cpus.

I think this is not 

[PATCH 5/5] sh: Define elfcorehdr_addr in arch dependent section

2008-07-28 Thread Vivek Goyal



o Move elfcorehdr_addr definition in arch dependent crash dump file. This is
  equivalent to defining elfcorehdr_addr under CONFIG_CRASH_DUMP instead of
  CONFIG_PROC_VMCORE. This is needed by is_kdump_kernel() which can be
  used irrespective of the fact whether CONFIG_PROC_VMCORE is enabled or
  not.

o I don't see sh setup code parsing the command line for elfcorehdr_addr. I 
  am wondering how does vmcore interface work on sh. Anyway, I am atleast
  defining elfcoredhr_addr so that compilation is not broken on sh.

Signed-off-by: Vivek Goyal [EMAIL PROTECTED]
---

 arch/sh/kernel/crash_dump.c |3 +++
 1 file changed, 3 insertions(+)

diff -puN arch/sh/kernel/crash_dump.c~fix-elfcorehdr_addr-sh 
arch/sh/kernel/crash_dump.c
--- linux-2.6.27-pre-rc1/arch/sh/kernel/crash_dump.c~fix-elfcorehdr_addr-sh 
2008-07-28 12:17:12.0 -0400
+++ linux-2.6.27-pre-rc1-root/arch/sh/kernel/crash_dump.c   2008-07-28 
12:17:12.0 -0400
@@ -10,6 +10,9 @@
 #include linux/io.h
 #include asm/uaccess.h
 
+/* Stores the physical address of elf header of crash image. */
+unsigned long long elfcorehdr_addr = ELFCORE_ADDR_MAX;
+
 /**
  * copy_oldmem_page - copy one page from oldmem
  * @pfn: page frame number to be copied
_
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 3/5] ia64: Define elfcorehdr_addr in arch dependent section

2008-07-28 Thread Vivek Goyal

o Move elfcorehdr_addr definition in arch dependent crash dump file. This is
  equivalent to defining elfcorehdr_addr under CONFIG_CRASH_DUMP instead of
  CONFIG_PROC_VMCORE. This is needed by is_kdump_kernel() which can be
  used irrespective of the fact whether CONFIG_PROC_VMCORE is enabled or
  not.

Signed-off-by: Vivek Goyal [EMAIL PROTECTED]
---

 arch/ia64/kernel/setup.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff -puN arch/ia64/kernel/setup.c~fix-elfcorehdr_addr-parsing-ia64 
arch/ia64/kernel/setup.c
--- 
linux-2.6.27-pre-rc1/arch/ia64/kernel/setup.c~fix-elfcorehdr_addr-parsing-ia64  
2008-07-28 12:16:40.0 -0400
+++ linux-2.6.27-pre-rc1-root/arch/ia64/kernel/setup.c  2008-07-28 
12:16:40.0 -0400
@@ -478,7 +478,12 @@ static __init int setup_nomca(char *s)
 }
 early_param(nomca, setup_nomca);
 
-#ifdef CONFIG_PROC_VMCORE
+/*
+ * Note: elfcorehdr_addr is not just limited to vmcore. It is also used by
+ * is_kdump_kernel() to determine if we are booting after a panic. Hence
+ * ifdef it under CONFIG_CRASH_DUMP and not CONFIG_PROC_VMCORE.
+ */
+#ifdef CONFIG_CRASH_DUMP
 /* elfcorehdr= specifies the location of elf core header
  * stored by the crashed kernel.
  */
@@ -491,7 +496,9 @@ static int __init parse_elfcorehdr(char 
return 0;
 }
 early_param(elfcorehdr, parse_elfcorehdr);
+#endif
 
+#ifdef CONFIG_PROC_VMCORE
 int __init reserve_elfcorehdr(unsigned long *start, unsigned long *end)
 {
unsigned long length;
_
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 4/5] powerpc: Define elfcorehdr_addr in arch dependent section

2008-07-28 Thread Vivek Goyal


o Move elfcorehdr_addr definition in arch dependent crash dump file. This is
  equivalent to defining elfcorehdr_addr under CONFIG_CRASH_DUMP instead of
  CONFIG_PROC_VMCORE. This is needed by is_kdump_kernel() which can be
  used irrespective of the fact whether CONFIG_PROC_VMCORE is enabled or
  not.

Signed-off-by: Vivek Goyal [EMAIL PROTECTED]
---

 arch/powerpc/kernel/crash_dump.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff -puN arch/powerpc/kernel/crash_dump.c~fix-elfcorehdr_addr-parsing-ppc64 
arch/powerpc/kernel/crash_dump.c
--- 
linux-2.6.27-pre-rc1/arch/powerpc/kernel/crash_dump.c~fix-elfcorehdr_addr-parsing-ppc64
 2008-07-28 12:14:22.0 -0400
+++ linux-2.6.27-pre-rc1-root/arch/powerpc/kernel/crash_dump.c  2008-07-28 
12:14:22.0 -0400
@@ -27,6 +27,9 @@
 #define DBG(fmt...)
 #endif
 
+/* Stores the physical address of elf header of crash image. */
+unsigned long long elfcorehdr_addr = ELFCORE_ADDR_MAX;
+
 void __init reserve_kdump_trampoline(void)
 {
lmb_reserve(0, KDUMP_RESERVE_LIMIT);
@@ -66,7 +69,11 @@ void __init setup_kdump_trampoline(void)
DBG( - setup_kdump_trampoline()\n);
 }
 
-#ifdef CONFIG_PROC_VMCORE
+/*
+ * Note: elfcorehdr_addr is not just limited to vmcore. It is also used by
+ * is_kdump_kernel() to determine if we are booting after a panic. Hence
+ * ifdef it under CONFIG_CRASH_DUMP and not CONFIG_PROC_VMCORE.
+ */
 static int __init parse_elfcorehdr(char *p)
 {
if (p)
@@ -75,7 +82,6 @@ static int __init parse_elfcorehdr(char 
return 1;
 }
 __setup(elfcorehdr=, parse_elfcorehdr);
-#endif
 
 static int __init parse_savemaxmem(char *p)
 {
_
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/5] Move elfcorehdr_addr out of vmcore.c (Was: Re: [patch] crashdump: fix undefined reference to `elfcorehdr_addr')

2008-07-28 Thread Vivek Goyal
On Tue, Jul 29, 2008 at 11:22:48AM +1000, Simon Horman wrote:
 On Mon, Jul 28, 2008 at 03:47:41PM -0700, Eric W. Biederman wrote:
  Vivek Goyal [EMAIL PROTECTED] writes:
  
   Hi All,
  
   How does following series of patches look like. I have moved
   elfcorehdr_addr out of vmcore.c and pushed it to arch dependent section 
   of crash dump to make sure that it can be worked with even when
   CONFIG_PROC_VMCORE is disabled and CONFIG_CRASH_DUMP is enabled.
  
   I tested it on x86_64. Compile tested it on i386 and ppc64. ia64 and
   sh versions are completely untested.
  
  Given the current state of the code:
  Acked-by: Eric W. Biederman [EMAIL PROTECTED]
  
  To process a kernel crash dump we pass the kernel elfcorehdr option,
  so testing to see if it was passed seems reasonable.
  
  In general I think this method of handling the problems with kdump is
  too brittle to live, but in the case of iommus we certainly need to do
  something different, and unfortunately iommus were not common on x86
  when the original code was merged so we have not handled them well.
 
 Agreed, however these patches look like they really ought to be merged
 into a single patch for the sake of bisect. As things stand, applying
 the first patch will break the build on each architecture with an
 architecture specific until the latter is applied.

That's a good point. I was not very sure because changes were in 
different arches and I broke the patch. At the same time changes are
really miniscule in each arch. 

So, for the sake of not breaking compilation for git-bisect, I will
generate a single patch tomorrow. (Until and unless somebody has an
objection).

Thanks
Vivek

 
 -- 
 Horms
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev