Re: [PATCH v3 01/11] kexec: introduce kexec firmware support

2013-01-04 Thread Daniel Kiper
On Thu, Dec 27, 2012 at 07:06:13PM -0800, ebied...@xmission.com wrote:
 Daniel Kiper daniel.ki...@oracle.com writes:

  Daniel Kiper daniel.ki...@oracle.com writes:
 
   Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
   Linux infrastructure and require some support from firmware and/or 
   hypervisor.
   To cope with that problem kexec firmware infrastructure was introduced.
   It allows a developer to use all kexec/kdump features of given firmware
   or hypervisor.
 
  As this stands this patch is wrong.
 
  You need to pass an additional flag from userspace through /sbin/kexec
  that says load the kexec image in the firmware.  A global variable here
  is not ok.
 
  As I understand it you are loading a kexec on xen panic image.  Which
  is semantically different from a kexec on linux panic image.  It is not
  ok to do have a silly global variable kexec_use_firmware.
 
  Earlier we agreed that /sbin/kexec should call kexec syscall with
  special flag. However, during work on Xen kexec/kdump v3 patch
  I stated that this is insufficient because e.g. crash_kexec()
  should execute different code in case of use of firmware support too.

 That implies you have the wrong model of userspace.

 Very simply there is:
 linux kexec pass through to xen kexec.

 And
 linux kexec (ultimately pv kexec because the pv machine is a slightly
 different architecture).

As I understand in Xen dom0 kexec/kdump case machine_kexec() should call
stub which should call relevant hypercall to initiate kexec/kdump in
Xen itself. Right?

  Sadly syscall does not save this flag anywhere.

  Additionally, I stated
  that kernel itself has the best knowledge which code path should be
  used (firmware or plain Linux). If this decision will be left to userspace
  then simple kexec syscall could crash system at worst case (e.g. when
  plain Linux kexec will be used in case when firmware kaxec should be
  used).

 And that path selection bit is strongly non-sense.  You are advocating
 hardcoding unnecessary policy in the kernel.

 If for dom0 you need crash_kexec to do something different from domU
 you should be able to load a small piece of code via kexec that makes
 the hypervisor calls you need.

  However, if you wish I could add this flag to syscall.

 I do wish.  We need to distinguish between the kexec firmware pass
 through, and normal kexec.

OK.

  Additionally, I could
  add function which enables firmware support and then kexec_use_firmware
  variable will be global only in kexec.c module.

 No.  kexec_use_firmware is the wrong mental model.

 Do not mix the kexec pass through and the normal kexec case.

 We most definitely need to call different code in the kexec firmware
 pass through case.

 For normal kexec we just need to use a paravirt aware version of
 machine_kexec and machine_kexec_shutdown.

OK, but this solves problem in crash_kexec() only. However, kernel_kexec()
still calls machine_shutdown() which breaks kexec on Xen dom0 (to be precise
it shutdown machine via hypercall). Should I add machine_kexec_shutdown()
(like machine_crash_shutdown()) which would call, let's say,
machine_ops.kexec_shutdown()?

Additionally, crash_shrink_memory() does not make sens in Xen dom0 case.
How do you wish disable it if kexec_use_firmware is the wrong mental model?

  Furthermore it is not ok to have a conditional
  code outside of header files.
 
  I agree but how to dispatch execution e.g. in crash_kexec()
  if we would like (I suppose) compile kexec firmware
  support conditionally?

 The classic pattern is to have the #ifdefs in the header and have an
 noop function that is inlined when the functionality is compiled out.
 This allows all of the logic to always be compiled.

OK.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2013-01-04 Thread Daniel Kiper
On Fri, Dec 28, 2012 at 01:59:27PM +0100, Borislav Petkov wrote:
 On Thu, Dec 27, 2012 at 03:19:24PM -0800, Daniel Kiper wrote:
   Hmm... this code is being redone at the moment... this might conflict.
 
  Is this available somewhere? May I have a look at it?

 http://marc.info/?l=linux-kernelm=135581534620383

 The for-x86-boot-v7 and -v8 branches.

 HTH.

Thanks.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-04 Thread Daniel Kiper
On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
 On 27/12/12 18:02, Eric W. Biederman wrote:
 Andrew Cooperandrew.coop...@citrix.com  writes:
 
 On 27/12/2012 07:53, Eric W. Biederman wrote:
 The syscall ABI still has the wrong semantics.
 
 Aka totally unmaintainable and umergeable.
 
 The concept of domU support is also strange.  What does domU support even 
 mean, when the dom0 support is loading a kernel to pick up Xen when Xen 
 falls over.
 There are two requirements pulling at this patch series, but I agree
 that we need to clarify them.
 It probably make sense to split them apart a little even.
 
 

 Thinking about this split, there might be a way to simply it even more.

 /sbin/kexec can load the Xen crash kernel itself by issuing
 hypercalls using /dev/xen/privcmd.  This would remove the need for
 the dom0 kernel to distinguish between loading a crash kernel for
 itself and loading a kernel for Xen.

 Or is this just a silly idea complicating the matter?

This is impossible with current Xen kexec/kdump interface.
It should be changed to do that. However, I suppose that
Xen community would not be interested in such changes.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-04 Thread Ian Campbell
On Fri, 2013-01-04 at 14:22 +, Daniel Kiper wrote:
 On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
  On 27/12/12 18:02, Eric W. Biederman wrote:
  Andrew Cooperandrew.coop...@citrix.com  writes:
  
  On 27/12/2012 07:53, Eric W. Biederman wrote:
  The syscall ABI still has the wrong semantics.
  
  Aka totally unmaintainable and umergeable.
  
  The concept of domU support is also strange.  What does domU support 
  even mean, when the dom0 support is loading a kernel to pick up Xen when 
  Xen falls over.
  There are two requirements pulling at this patch series, but I agree
  that we need to clarify them.
  It probably make sense to split them apart a little even.
  
  
 
  Thinking about this split, there might be a way to simply it even more.
 
  /sbin/kexec can load the Xen crash kernel itself by issuing
  hypercalls using /dev/xen/privcmd.  This would remove the need for
  the dom0 kernel to distinguish between loading a crash kernel for
  itself and loading a kernel for Xen.
 
  Or is this just a silly idea complicating the matter?
 
 This is impossible with current Xen kexec/kdump interface.
 It should be changed to do that. However, I suppose that
 Xen community would not be interested in such changes.

The current HYPERVISOR_kexec interface is pretty fricken bad (it
basically hardcodes the Linux Circa-2.6.18 internal interface!).

I'd be all for a new HYPERVISOR_kexec (with the old gaining a _compat
suffix) which implements something more generic that isn't tied to a
particular dom0 kernel implementation (be it differing versions of Linux
or e.g. *BSD).

If that enables /sbin/kexec to load the kernel directly then so much the
better, assuming the /sbin/kexec maintainers are happy with that
approach.

Ian.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-04 Thread Jan Beulich
 On 04.01.13 at 15:22, Daniel Kiper daniel.ki...@oracle.com wrote:
 On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
 /sbin/kexec can load the Xen crash kernel itself by issuing
 hypercalls using /dev/xen/privcmd.  This would remove the need for
 the dom0 kernel to distinguish between loading a crash kernel for
 itself and loading a kernel for Xen.

 Or is this just a silly idea complicating the matter?
 
 This is impossible with current Xen kexec/kdump interface.

Why?

 It should be changed to do that. However, I suppose that
 Xen community would not be interested in such changes.

And again - why?

Jan

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2013-01-04 Thread Daniel Kiper
On Thu, Jan 03, 2013 at 09:34:55AM +, Jan Beulich wrote:
  On 27.12.12 at 03:18, Daniel Kiper daniel.ki...@oracle.com wrote:
  Some implementations (e.g. Xen PVOPS) could not use part of identity page 
  table
  to construct transition page table. It means that they require separate 
  PUDs,
  PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
  requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
  code.

 So you keep posting this despite it having got pointed out on each
 earlier submission that this is unnecessary, proven by the fact that
 the non-pvops Xen kernels can get away without it. Why?

Sorry but I forgot to reply for your email last time.

I am still not convinced. I have tested SUSE kernel itself and it does not work.
Maybe I missed something but... Please check 
arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()

I can see:

vaddr = (unsigned long)relocate_kernel;

and later:

pgd += pgd_index(vaddr);
...

It is wrong. relocate_kernel() virtual address in Xen is different
than its virtual address in Linux Kernel. That is why transition
page table could not be established in Linux Kernel and so on...
How does this work in SUSE? I do not have an idea.

I am happy to fix that but whatever fix for it is
I would like to be sure that it works.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2013-01-04 Thread Jan Beulich
 On 04.01.13 at 16:15, Daniel Kiper daniel.ki...@oracle.com wrote:
 On Thu, Jan 03, 2013 at 09:34:55AM +, Jan Beulich wrote:
  On 27.12.12 at 03:18, Daniel Kiper daniel.ki...@oracle.com wrote:
  Some implementations (e.g. Xen PVOPS) could not use part of identity page 
 table
  to construct transition page table. It means that they require separate 
 PUDs,
  PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
  requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
  code.

 So you keep posting this despite it having got pointed out on each
 earlier submission that this is unnecessary, proven by the fact that
 the non-pvops Xen kernels can get away without it. Why?
 
 Sorry but I forgot to reply for your email last time.
 
 I am still not convinced. I have tested SUSE kernel itself and it does not 
 work.
 Maybe I missed something but... Please check 
 arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
 
 I can see:
 
 vaddr = (unsigned long)relocate_kernel;
 
 and later:
 
 pgd += pgd_index(vaddr);
 ...

I think that mapping is simply irrelevant, as the code at
relocate_kernel gets copied to the control page and
invoked there (other than in the native case, where
relocate_kernel() gets invoked directly).

Jan

 It is wrong. relocate_kernel() virtual address in Xen is different
 than its virtual address in Linux Kernel. That is why transition
 page table could not be established in Linux Kernel and so on...
 How does this work in SUSE? I do not have an idea.
 
 I am happy to fix that but whatever fix for it is
 I would like to be sure that it works.
 
 Daniel



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-04 Thread Daniel Kiper
On Fri, Jan 04, 2013 at 02:38:44PM +, David Vrabel wrote:
 On 04/01/13 14:22, Daniel Kiper wrote:
  On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
  On 27/12/12 18:02, Eric W. Biederman wrote:
  Andrew Cooperandrew.coop...@citrix.com  writes:
 
  On 27/12/2012 07:53, Eric W. Biederman wrote:
  The syscall ABI still has the wrong semantics.
 
  Aka totally unmaintainable and umergeable.
 
  The concept of domU support is also strange.  What does domU support 
  even mean, when the dom0 support is loading a kernel to pick up Xen 
  when Xen falls over.
  There are two requirements pulling at this patch series, but I agree
  that we need to clarify them.
  It probably make sense to split them apart a little even.
 
 
 
  Thinking about this split, there might be a way to simply it even more.
 
  /sbin/kexec can load the Xen crash kernel itself by issuing
  hypercalls using /dev/xen/privcmd.  This would remove the need for
  the dom0 kernel to distinguish between loading a crash kernel for
  itself and loading a kernel for Xen.
 
  Or is this just a silly idea complicating the matter?
 
  This is impossible with current Xen kexec/kdump interface.
  It should be changed to do that. However, I suppose that
  Xen community would not be interested in such changes.

 I don't see why the hypercall ABI cannot be extended with new sub-ops
 that do the right thing -- the existing ABI is a bit weird.

 I plan to start prototyping something shortly (hopefully next week) for
 the Xen kexec case.

Wow... As I can this time Xen community is interested in...
That is great. I agree that current kexec interface is not ideal.

David, I am happy to help in that process. However, if you wish I could
carry it myself. Anyway, it looks that I should hold on with my
Linux kexec/kdump patches.

My .5 cents:
  - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
probably we should introduce KEXEC_CMD_kexec_load2 and 
KEXEC_CMD_kexec_unload2;
load should __LOAD__ kernel image and other things into hypervisor memory;
I suppose that allmost all things could be copied from linux/kernel/kexec.c,
linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
I think that KEXEC_CMD_kexec should stay as is,
  - Hmmm... Now I think that we should still use kexec syscall to load image
into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
all things which are needed to call kdump if dom0 crashes; however,
I could be wrong...
  - last but not least, we should think about support for PV guests too.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-04 Thread Daniel Kiper
On Fri, Jan 04, 2013 at 02:41:17PM +, Jan Beulich wrote:
  On 04.01.13 at 15:22, Daniel Kiper daniel.ki...@oracle.com wrote:
  On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
  /sbin/kexec can load the Xen crash kernel itself by issuing
  hypercalls using /dev/xen/privcmd.  This would remove the need for
  the dom0 kernel to distinguish between loading a crash kernel for
  itself and loading a kernel for Xen.
 
  Or is this just a silly idea complicating the matter?
 
  This is impossible with current Xen kexec/kdump interface.

 Why?

Because current KEXEC_CMD_kexec_load does not load kernel
image and other things into Xen memory. It means that it
should live somewhere in dom0 Linux kernel memory.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2013-01-04 Thread Daniel Kiper
On Fri, Jan 04, 2013 at 04:12:32PM +, Jan Beulich wrote:
  On 04.01.13 at 16:15, Daniel Kiper daniel.ki...@oracle.com wrote:
  On Thu, Jan 03, 2013 at 09:34:55AM +, Jan Beulich wrote:
   On 27.12.12 at 03:18, Daniel Kiper daniel.ki...@oracle.com wrote:
   Some implementations (e.g. Xen PVOPS) could not use part of identity 
   page table
   to construct transition page table. It means that they require separate 
   PUDs,
   PMDs and PTEs for virtual and physical (identity) mapping. To satisfy 
   that
   requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
   code.
 
  So you keep posting this despite it having got pointed out on each
  earlier submission that this is unnecessary, proven by the fact that
  the non-pvops Xen kernels can get away without it. Why?
 
  Sorry but I forgot to reply for your email last time.
 
  I am still not convinced. I have tested SUSE kernel itself and it does not 
  work.
  Maybe I missed something but... Please check
  arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
 
  I can see:
 
  vaddr = (unsigned long)relocate_kernel;
 
  and later:
 
  pgd += pgd_index(vaddr);
  ...

 I think that mapping is simply irrelevant, as the code at
 relocate_kernel gets copied to the control page and
 invoked there (other than in the native case, where
 relocate_kernel() gets invoked directly).

Right, so where is virtual mapping of control page established?
I could not find relevant code in SLES kernel which does that.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-04 Thread Konrad Rzeszutek Wilk
On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
 On Fri, Jan 04, 2013 at 02:41:17PM +, Jan Beulich wrote:
   On 04.01.13 at 15:22, Daniel Kiper daniel.ki...@oracle.com wrote:
   On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
   /sbin/kexec can load the Xen crash kernel itself by issuing
   hypercalls using /dev/xen/privcmd.  This would remove the need for
   the dom0 kernel to distinguish between loading a crash kernel for
   itself and loading a kernel for Xen.
  
   Or is this just a silly idea complicating the matter?
  
   This is impossible with current Xen kexec/kdump interface.
 
  Why?
 
 Because current KEXEC_CMD_kexec_load does not load kernel
 image and other things into Xen memory. It means that it
 should live somewhere in dom0 Linux kernel memory.

We could have a very simple hypercall which would have:

struct fancy_new_hypercall {
xen_pfn_t payload; // IN
ssize_t len; // IN
#define DATA (11)
#define DATA_EOF (12)
#define DATA_KERNEL (13)
#define DATA_RAMDISK (14)
unsigned int flags; // IN
unsigned int status; // OUT
};

which would in a loop just iterate over the payloads and
let the hypervisor stick it in the crashkernel space.

This is all hand-waving of course. There probably would be a need
to figure out how much space you have in the reserved Xen's
'crashkernel' memory region too.

 
 Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization