Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2014-03-31 Thread Petr Tesarik
On Thu, 22 Nov 2012 14:26:10 -0800
"H. Peter Anvin"  wrote:

> Bullshit.  This should be a separate domain.

Thanks for top-posting, hpa...

> Andrew Cooper  wrote:
> 
> >On 22/11/12 17:47, H. Peter Anvin wrote:
> >> The other thing that should be considered here is how utterly 
> >> preposterous the notion of doing in-guest crash dumping is in a
> >system 
> >> that contains a hypervisor.  The reason for kdump is that on bare
> >metal 
> >> there are no other options, but in a hypervisor system the right
> >thing 
> >> should be for the hypervisor to do the dump (possibly spawning a
> >clean 
> >> I/O domain if the I/O domain is necessary to access the media.)
> >>
> >> There is absolutely no reason to have a crashkernel sitting around in
> >
> >> each guest, consuming memory, and possibly get corrupt.
> >>
> >>-hpa
> >>
> >
> >I agree that regular guests should not be using the kexec/kdump. 
> >However, this patch series is required for allowing a pvops kernel to
> >be
> >a crash kernel for Xen, which is very important from dom0/Xen's point
> >of
> >view.

In fact, a normal kernel is used for dumping, so it can handle both,
Dom0 crashes _and_ hypervisor crashes. If you wanted to address
hypervisor crashes, you'd have to allocate some space for that, too, so
you may view this "madness" as a way to conserve resources.

The memory area is reserved by the Xen hypervisor, and only the extents
are passed down to the Dom0 kernel. In other words, there is indeed no
physical mapping for this area.

Having said that, I see no reason why that physical mapping cannot be
created if it is needed.

Petr T
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2014-03-31 Thread Petr Tesarik
On Thu, 22 Nov 2012 14:26:10 -0800
H. Peter Anvin h...@zytor.com wrote:

 Bullshit.  This should be a separate domain.

Thanks for top-posting, hpa...

 Andrew Cooper andrew.coop...@citrix.com wrote:
 
 On 22/11/12 17:47, H. Peter Anvin wrote:
  The other thing that should be considered here is how utterly 
  preposterous the notion of doing in-guest crash dumping is in a
 system 
  that contains a hypervisor.  The reason for kdump is that on bare
 metal 
  there are no other options, but in a hypervisor system the right
 thing 
  should be for the hypervisor to do the dump (possibly spawning a
 clean 
  I/O domain if the I/O domain is necessary to access the media.)
 
  There is absolutely no reason to have a crashkernel sitting around in
 
  each guest, consuming memory, and possibly get corrupt.
 
 -hpa
 
 
 I agree that regular guests should not be using the kexec/kdump. 
 However, this patch series is required for allowing a pvops kernel to
 be
 a crash kernel for Xen, which is very important from dom0/Xen's point
 of
 view.

In fact, a normal kernel is used for dumping, so it can handle both,
Dom0 crashes _and_ hypervisor crashes. If you wanted to address
hypervisor crashes, you'd have to allocate some space for that, too, so
you may view this madness as a way to conserve resources.

The memory area is reserved by the Xen hypervisor, and only the extents
are passed down to the Dom0 kernel. In other words, there is indeed no
physical mapping for this area.

Having said that, I see no reason why that physical mapping cannot be
created if it is needed.

Petr T
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Eric W. Biederman
Daniel Kiper  writes:

> On Thu, Nov 22, 2012 at 04:15:48AM -0800, ebied...@xmission.com wrote:
>>
>> Is this for when the hypervisor crashes and we want a crash dump of
>> that?
>
> dom0 at boot gets some info about kexec/kdump configuration from Xen 
> hypervisor
> (e.g. placement of crash kernel area). Later if you call kexec syscall most
> things are done in the same way as on baremetal. However, after placing image
> in memory, HYPERVISOR_kexec_op() hypercall must be called to inform hypervisor
> that image is loaded (new hook machine_kexec_load is used for this;
> machine_kexec_unload is used for unload). Then Xen establishes fixmap for 
> pages
> found in page_list[] and returns control to dom0. If dom0 crashes or "kexec 
> execute"
> is used by user then dom0 calls HYPERVISOR_kexec_op() to instruct hypervisor 
> that
> kexec/kdump image should be executed immediately. Xen calls relocate_kernel()
> and all things runs as usual.


Close

>> Successful code reuse depends upon not breaking the assumptions on which
>> the code relies, or modifying the code so that the new modified
>> assumptions are clear.  In this case you might as well define up as down
>> for all of the sense kexec_ops makes.
>
> Hmmm... Well, problem with above mentioned functions is that they work
> on physical addresses. In Xen PVOPS (currently dom0 is PVOPS) they
> are useless in kexec/kdump case. It means that physical addresses
> must be converted to/from machine addresses which has a real meaning
> in Xen PVOPS case. That is why those funtions were introduced.

Agreed operating on addresses that are relevant to the operation at hand
makes sense.

>> >> There may be a point to all of these but you are mixing and matching
>> >> things badly.
>> >
>> > Do you whish to split this kexec_ops struct to something which
>> > works with addresses and something which is reponsible for
>> > loading, unloading and executing kexec/kdump? I am able to change
>> > that but I would like to know a bit about your vision first.
>>
>> My vision is that we should have code that makes sense.
>>
>> My suspicion is that what you want is a cousin of the existing kexec
>> system call.  Perhaps what is needed is a flag to say use the firmware
>> kexec system call.
>>
>> I absolutely do not understand what Xen is trying to do.  kexec by
>> design should not require any firmware specific hooks.  kexec at this
>> level should only need to care about the processor architeture.  Clearly
>> what you are doing with Xen requires special hooks separate even from
>> the normal paravirt hooks.  So I do not understand you are trying to do.
>>
>> It needs to be clear from the code what is happening differently in the
>> Xen case.  Otherwise the code is unmaintainable as no one will be able
>> to understand it.
>
> I agree. I could remove all machine_* hooks from kexec_ops and call Xen
> specific functions from arch files. However, I need to add two new
> machine calls, machine_kexec_load and machine_kexec_unload, in the same
> manner as existing machine_* calls. In general they could be used to inform
> firmware (in this case Xen) that kexec/kdump image is loaded.
>
> kimage_alloc_pages, kimage_free_pages, page_to_pfn, pfn_to_page, virt_to_phys
> and phys_to_virt are worse. If we could not find good solution how to replace
> them then we end up with calling Xen specific version of kexec/kdump which
> would contain nearly full copy of exisiting kexec/kdump code. Not good.
>
> We could add some code to kernel/kexec.c which depends on CONFIG_XEN.
> It could contain above mentioned functions which later will be called
> by existing kexec code. This is not nice to be honest. However, I hope
> that we could find better solution for that problem.

Since in the Xen case you are not performing a normal kexec or kdump if
you are going to continue to use the kexec system call then another flag
(like the KEXEC_ON_CRASH flag) should be used.

The userspace flag should be something like KEXEC_HYPERVISOR.  From
there we can have a generic interface that feeds into whatever the Xen
infrastructure is.  And if any other hypervisors implement kexec like
functionality it could feed into them if we so choose.

When the choice is clearly between a linux-only kexec and for a hypervisor
level kexec using different functions to understand the target addresses
makes sense.

And of course /sbin/kexec can easity take an additional flag to say load
the kexec image to the hypervisor.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Daniel Kiper
On Fri, Nov 23, 2012 at 10:51:55AM +, Jan Beulich wrote:
> >>> On 23.11.12 at 11:37, Daniel Kiper  wrote:
> > On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote:
> >> >>> On 23.11.12 at 02:56, Andrew Cooper  wrote:
> >> > On 23/11/2012 01:38, H. Peter Anvin wrote:
> >> >> I still don't really get why it can't be isolated from dom0, which would
> >> > make more sense to me, even for a Xen crash.
> >> >>
> >> >
> >> > The crash region (as specified by crashkernel= on the Xen command line)
> >> > is isolated from dom0.
> >> >
> >> > dom0 (using the kexec utility etc) has the task of locating the Xen
> >> > crash notes (using the kexec hypercall interface), constructing a binary
> >> > blob containing kernel, initram and gubbins, and asking Xen to put this
> >> > blob in the crash region (again, using the kexec hypercall interface).
> >> >
> >> > I do not see how this is very much different from the native case
> >> > currently (although please correct me if I am misinformed).  Linux has
> >> > extra work to do by populating /proc/iomem with the Xen crash regions
> >> > boot (so the kexec utility can reference their physical addresses when
> >> > constructing the blob), and should just act as a conduit between the
> >> > kexec system call and the kexec hypercall to load the blob.
> >>
> >> But all of this _could_ be done completely independent of the
> >> Dom0 kernel's kexec infrastructure (i.e. fully from user space,
> >> invoking the necessary hypercalls through the privcmd driver).
> >
> > No, this is impossible. kexec/kdump image lives in dom0 kernel memory
> > until execution. That is why privcmd driver itself is not a solution
> > in this case.
>
> Even if so, there's no fundamental reason why that kernel image
> can't be put into Xen controlled space instead.

Yep, but we must change Xen kexec interface and/or its behavior first.
If we take that option then we could also move almost all needed things
from dom0 kernel to Xen. This way we could simplify Linux Kernel
kexec/kdump infrastructure needed to run on Xen.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Jan Beulich
>>> On 23.11.12 at 11:37, Daniel Kiper  wrote:
> On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote:
>> >>> On 23.11.12 at 02:56, Andrew Cooper  wrote:
>> > On 23/11/2012 01:38, H. Peter Anvin wrote:
>> >> I still don't really get why it can't be isolated from dom0, which would
>> > make more sense to me, even for a Xen crash.
>> >>
>> >
>> > The crash region (as specified by crashkernel= on the Xen command line)
>> > is isolated from dom0.
>> >
>> > dom0 (using the kexec utility etc) has the task of locating the Xen
>> > crash notes (using the kexec hypercall interface), constructing a binary
>> > blob containing kernel, initram and gubbins, and asking Xen to put this
>> > blob in the crash region (again, using the kexec hypercall interface).
>> >
>> > I do not see how this is very much different from the native case
>> > currently (although please correct me if I am misinformed).  Linux has
>> > extra work to do by populating /proc/iomem with the Xen crash regions
>> > boot (so the kexec utility can reference their physical addresses when
>> > constructing the blob), and should just act as a conduit between the
>> > kexec system call and the kexec hypercall to load the blob.
>>
>> But all of this _could_ be done completely independent of the
>> Dom0 kernel's kexec infrastructure (i.e. fully from user space,
>> invoking the necessary hypercalls through the privcmd driver).
> 
> No, this is impossible. kexec/kdump image lives in dom0 kernel memory
> until execution. That is why privcmd driver itself is not a solution
> in this case.

Even if so, there's no fundamental reason why that kernel image
can't be put into Xen controlled space instead.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Daniel Kiper
On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote:
> >>> On 23.11.12 at 02:56, Andrew Cooper  wrote:
> > On 23/11/2012 01:38, H. Peter Anvin wrote:
> >> I still don't really get why it can't be isolated from dom0, which would
> > make more sense to me, even for a Xen crash.
> >>
> >
> > The crash region (as specified by crashkernel= on the Xen command line)
> > is isolated from dom0.
> >
> > dom0 (using the kexec utility etc) has the task of locating the Xen
> > crash notes (using the kexec hypercall interface), constructing a binary
> > blob containing kernel, initram and gubbins, and asking Xen to put this
> > blob in the crash region (again, using the kexec hypercall interface).
> >
> > I do not see how this is very much different from the native case
> > currently (although please correct me if I am misinformed).  Linux has
> > extra work to do by populating /proc/iomem with the Xen crash regions
> > boot (so the kexec utility can reference their physical addresses when
> > constructing the blob), and should just act as a conduit between the
> > kexec system call and the kexec hypercall to load the blob.
>
> But all of this _could_ be done completely independent of the
> Dom0 kernel's kexec infrastructure (i.e. fully from user space,
> invoking the necessary hypercalls through the privcmd driver).

No, this is impossible. kexec/kdump image lives in dom0 kernel memory
until execution. That is why privcmd driver itself is not a solution
in this case.

> It's just that parts of the kexec infrastructure can be re-used
> (and hence that mechanism probably seemed the easier approach
> to the implementer of the original kexec-on-Xen). If the kernel
> folks dislike that re-use (quite understandably looking at how
> much of it needs to be re-done), that shouldn't prevent us from
> looking into the existing alternatives.

This is last resort option. First I think we should try to find
good solution which reuses existing code as much as possible.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Jan Beulich
>>> On 22.11.12 at 18:37, "H. Peter Anvin"  wrote:
> I actually talked to Ian Jackson at LCE, and mentioned among other 
> things the bogosity of requiring a PUD page for three-level paging in 
> Linux -- a bogosity which has spread from Xen into native.  It's a page 
> wasted for no good reason, since it only contains 32 bytes worth of 
> data, *inherently*.  Furthermore, contrary to popular belief, it is 
> *not* pa page table per se.
> 
> Ian told me: "I didn't know we did that, and we shouldn't have to." 
> Here we have suffered this overhead for at least six years, ...

Even the Xen kernel only needs the full page when running on a
64-bit hypervisor (now that we don't have a 32-bit hypervisor
anymore, that of course basically means always). But yes, I too
never liked this enforced over-allocation for native kernels (and
was surprised that it was allowed in at all).

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Jan Beulich
>>> On 23.11.12 at 02:56, Andrew Cooper  wrote:
> On 23/11/2012 01:38, H. Peter Anvin wrote:
>> I still don't really get why it can't be isolated from dom0, which would 
> make more sense to me, even for a Xen crash.
>>
> 
> The crash region (as specified by crashkernel= on the Xen command line)
> is isolated from dom0.
> 
> dom0 (using the kexec utility etc) has the task of locating the Xen
> crash notes (using the kexec hypercall interface), constructing a binary
> blob containing kernel, initram and gubbins, and asking Xen to put this
> blob in the crash region (again, using the kexec hypercall interface).
> 
> I do not see how this is very much different from the native case
> currently (although please correct me if I am misinformed).  Linux has
> extra work to do by populating /proc/iomem with the Xen crash regions
> boot (so the kexec utility can reference their physical addresses when
> constructing the blob), and should just act as a conduit between the
> kexec system call and the kexec hypercall to load the blob.

But all of this _could_ be done completely independent of the
Dom0 kernel's kexec infrastructure (i.e. fully from user space,
invoking the necessary hypercalls through the privcmd driver).
It's just that parts of the kexec infrastructure can be re-used
(and hence that mechanism probably seemed the easier approach
to the implementer of the original kexec-on-Xen). If the kernel
folks dislike that re-use (quite understandably looking at how
much of it needs to be re-done), that shouldn't prevent us from
looking into the existing alternatives.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Daniel Kiper
On Thu, Nov 22, 2012 at 04:15:48AM -0800, ebied...@xmission.com wrote:
> Daniel Kiper  writes:
>
> > On Tue, Nov 20, 2012 at 08:40:39AM -0800, ebied...@xmission.com wrote:
> >> Daniel Kiper  writes:
> >>
> >> > Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
> >> > functions or require some changes in behavior of kexec/kdump generic 
> >> > code.
> >> > To cope with that problem kexec_ops struct was introduced. It allows
> >> > a developer to replace all or some functions and control some
> >> > functionality of kexec/kdump generic code.
> >> >
> >> > Default behavior of kexec/kdump generic code is not changed.
> >>
> >> Ick.
> >>
> >> > v2 - suggestions/fixes:
> >> >- add comment for kexec_ops.crash_alloc_temp_store member
> >> >  (suggested by Konrad Rzeszutek Wilk),
> >> >- simplify kexec_ops usage
> >> >  (suggested by Konrad Rzeszutek Wilk).
> >> >
> >> > Signed-off-by: Daniel Kiper 
> >> > ---
> >> >  include/linux/kexec.h |   26 ++
> >> >  kernel/kexec.c|  131 
> >> > +
> >> >  2 files changed, 125 insertions(+), 32 deletions(-)
> >> >
> >> > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> >> > index d0b8458..c8d0b35 100644
> >> > --- a/include/linux/kexec.h
> >> > +++ b/include/linux/kexec.h
> >> > @@ -116,7 +116,33 @@ struct kimage {
> >> >  #endif
> >> >  };
> >> >
> >> > +struct kexec_ops {
> >> > +/*
> >> > + * Some kdump implementations (e.g. Xen PVOPS dom0) could not 
> >> > access
> >> > + * directly crash kernel memory area. In this situation they 
> >> > must
> >> > + * allocate memory outside of it and later move contents from 
> >> > temporary
> >> > + * storage to final resting places (usualy done by 
> >> > relocate_kernel()).
> >> > + * Such behavior could be enforced by setting
> >> > + * crash_alloc_temp_store member to true.
> >> > + */
> >>
> >> Why in the world would Xen not be able to access crash kernel memory?
> >> As currently defined it is normal memory that the kernel chooses not to
> >> use.
> >>
> >> If relocate kernel can access that memory you definitely can access the
> >> memory so the comment does not make any sense.
> >
> > Crash kernel memory is reserved by Xen hypervisor and Xen hypervisor
> > only has access to it. dom0 does not have any mapping of this area.
> > However, relocate_kernel() has access to crash kernel memory
> > because it is executed by Xen hypervisor and whole machine
> > memory is identity mapped.
>
> This is all weird.  Doubly so since this code is multi-arch and you have
> a set of requirements no other arch has had.
>
> I recall that Xen uses kexec in a unique manner.  What is the hypervisor
> interface and how is it used?
>
> Is this for when the hypervisor crashes and we want a crash dump of
> that?

dom0 at boot gets some info about kexec/kdump configuration from Xen hypervisor
(e.g. placement of crash kernel area). Later if you call kexec syscall most
things are done in the same way as on baremetal. However, after placing image
in memory, HYPERVISOR_kexec_op() hypercall must be called to inform hypervisor
that image is loaded (new hook machine_kexec_load is used for this;
machine_kexec_unload is used for unload). Then Xen establishes fixmap for pages
found in page_list[] and returns control to dom0. If dom0 crashes or "kexec 
execute"
is used by user then dom0 calls HYPERVISOR_kexec_op() to instruct hypervisor 
that
kexec/kdump image should be executed immediately. Xen calls relocate_kernel()
and all things runs as usual.

> >> > +bool crash_alloc_temp_store;
> >> > +struct page *(*kimage_alloc_pages)(gfp_t gfp_mask,
> >> > +unsigned int order,
> >> > +unsigned long limit);
> >> > +void (*kimage_free_pages)(struct page *page);
> >> > +unsigned long (*page_to_pfn)(struct page *page);
> >> > +struct page *(*pfn_to_page)(unsigned long pfn);
> >> > +unsigned long (*virt_to_phys)(volatile void *address);
> >> > +void *(*phys_to_virt)(unsigned long address);
> >> > +int (*machine_kexec_prepare)(struct kimage *image);
> >> > +int (*machine_kexec_load)(struct kimage *image);
> >> > +void (*machine_kexec_cleanup)(struct kimage *image);
> >> > +void (*machine_kexec_unload)(struct kimage *image);
> >> > +void (*machine_kexec_shutdown)(void);
> >> > +void (*machine_kexec)(struct kimage *image);
> >> > +};
> >>
> >> Ugh.  This is a nasty abstraction.
> >>
> >> You are mixing and matching a bunch of things together here.
> >>
> >> If you need to override machine_kexec_xxx please do that on a per
> >> architecture basis.
> >
> > Yes, it is possible but I think that it is worth to do it at that
> > level because it could be useful for other archs too (e.g. Xen ARM port
> > is 

Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Daniel Kiper
On Thu, Nov 22, 2012 at 04:15:48AM -0800, ebied...@xmission.com wrote:
 Daniel Kiper daniel.ki...@oracle.com writes:

  On Tue, Nov 20, 2012 at 08:40:39AM -0800, ebied...@xmission.com wrote:
  Daniel Kiper daniel.ki...@oracle.com writes:
 
   Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
   functions or require some changes in behavior of kexec/kdump generic 
   code.
   To cope with that problem kexec_ops struct was introduced. It allows
   a developer to replace all or some functions and control some
   functionality of kexec/kdump generic code.
  
   Default behavior of kexec/kdump generic code is not changed.
 
  Ick.
 
   v2 - suggestions/fixes:
  - add comment for kexec_ops.crash_alloc_temp_store member
(suggested by Konrad Rzeszutek Wilk),
  - simplify kexec_ops usage
(suggested by Konrad Rzeszutek Wilk).
  
   Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
   ---
include/linux/kexec.h |   26 ++
kernel/kexec.c|  131 
   +
2 files changed, 125 insertions(+), 32 deletions(-)
  
   diff --git a/include/linux/kexec.h b/include/linux/kexec.h
   index d0b8458..c8d0b35 100644
   --- a/include/linux/kexec.h
   +++ b/include/linux/kexec.h
   @@ -116,7 +116,33 @@ struct kimage {
#endif
};
  
   +struct kexec_ops {
   +/*
   + * Some kdump implementations (e.g. Xen PVOPS dom0) could not 
   access
   + * directly crash kernel memory area. In this situation they 
   must
   + * allocate memory outside of it and later move contents from 
   temporary
   + * storage to final resting places (usualy done by 
   relocate_kernel()).
   + * Such behavior could be enforced by setting
   + * crash_alloc_temp_store member to true.
   + */
 
  Why in the world would Xen not be able to access crash kernel memory?
  As currently defined it is normal memory that the kernel chooses not to
  use.
 
  If relocate kernel can access that memory you definitely can access the
  memory so the comment does not make any sense.
 
  Crash kernel memory is reserved by Xen hypervisor and Xen hypervisor
  only has access to it. dom0 does not have any mapping of this area.
  However, relocate_kernel() has access to crash kernel memory
  because it is executed by Xen hypervisor and whole machine
  memory is identity mapped.

 This is all weird.  Doubly so since this code is multi-arch and you have
 a set of requirements no other arch has had.

 I recall that Xen uses kexec in a unique manner.  What is the hypervisor
 interface and how is it used?

 Is this for when the hypervisor crashes and we want a crash dump of
 that?

dom0 at boot gets some info about kexec/kdump configuration from Xen hypervisor
(e.g. placement of crash kernel area). Later if you call kexec syscall most
things are done in the same way as on baremetal. However, after placing image
in memory, HYPERVISOR_kexec_op() hypercall must be called to inform hypervisor
that image is loaded (new hook machine_kexec_load is used for this;
machine_kexec_unload is used for unload). Then Xen establishes fixmap for pages
found in page_list[] and returns control to dom0. If dom0 crashes or kexec 
execute
is used by user then dom0 calls HYPERVISOR_kexec_op() to instruct hypervisor 
that
kexec/kdump image should be executed immediately. Xen calls relocate_kernel()
and all things runs as usual.

   +bool crash_alloc_temp_store;
   +struct page *(*kimage_alloc_pages)(gfp_t gfp_mask,
   +unsigned int order,
   +unsigned long limit);
   +void (*kimage_free_pages)(struct page *page);
   +unsigned long (*page_to_pfn)(struct page *page);
   +struct page *(*pfn_to_page)(unsigned long pfn);
   +unsigned long (*virt_to_phys)(volatile void *address);
   +void *(*phys_to_virt)(unsigned long address);
   +int (*machine_kexec_prepare)(struct kimage *image);
   +int (*machine_kexec_load)(struct kimage *image);
   +void (*machine_kexec_cleanup)(struct kimage *image);
   +void (*machine_kexec_unload)(struct kimage *image);
   +void (*machine_kexec_shutdown)(void);
   +void (*machine_kexec)(struct kimage *image);
   +};
 
  Ugh.  This is a nasty abstraction.
 
  You are mixing and matching a bunch of things together here.
 
  If you need to override machine_kexec_xxx please do that on a per
  architecture basis.
 
  Yes, it is possible but I think that it is worth to do it at that
  level because it could be useful for other archs too (e.g. Xen ARM port
  is under development). Then we do not need to duplicate that functionality
  in arch code. Additionally, Xen requires machine_kexec_load and
  machine_kexec_unload hooks which are not available in current generic
  kexec/kdump code.


 Let me be clear.  

Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Jan Beulich
 On 23.11.12 at 02:56, Andrew Cooper andrew.coop...@citrix.com wrote:
 On 23/11/2012 01:38, H. Peter Anvin wrote:
 I still don't really get why it can't be isolated from dom0, which would 
 make more sense to me, even for a Xen crash.

 
 The crash region (as specified by crashkernel= on the Xen command line)
 is isolated from dom0.
 
 dom0 (using the kexec utility etc) has the task of locating the Xen
 crash notes (using the kexec hypercall interface), constructing a binary
 blob containing kernel, initram and gubbins, and asking Xen to put this
 blob in the crash region (again, using the kexec hypercall interface).
 
 I do not see how this is very much different from the native case
 currently (although please correct me if I am misinformed).  Linux has
 extra work to do by populating /proc/iomem with the Xen crash regions
 boot (so the kexec utility can reference their physical addresses when
 constructing the blob), and should just act as a conduit between the
 kexec system call and the kexec hypercall to load the blob.

But all of this _could_ be done completely independent of the
Dom0 kernel's kexec infrastructure (i.e. fully from user space,
invoking the necessary hypercalls through the privcmd driver).
It's just that parts of the kexec infrastructure can be re-used
(and hence that mechanism probably seemed the easier approach
to the implementer of the original kexec-on-Xen). If the kernel
folks dislike that re-use (quite understandably looking at how
much of it needs to be re-done), that shouldn't prevent us from
looking into the existing alternatives.

Jan

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Jan Beulich
 On 22.11.12 at 18:37, H. Peter Anvin h...@zytor.com wrote:
 I actually talked to Ian Jackson at LCE, and mentioned among other 
 things the bogosity of requiring a PUD page for three-level paging in 
 Linux -- a bogosity which has spread from Xen into native.  It's a page 
 wasted for no good reason, since it only contains 32 bytes worth of 
 data, *inherently*.  Furthermore, contrary to popular belief, it is 
 *not* pa page table per se.
 
 Ian told me: I didn't know we did that, and we shouldn't have to. 
 Here we have suffered this overhead for at least six years, ...

Even the Xen kernel only needs the full page when running on a
64-bit hypervisor (now that we don't have a 32-bit hypervisor
anymore, that of course basically means always). But yes, I too
never liked this enforced over-allocation for native kernels (and
was surprised that it was allowed in at all).

Jan

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Daniel Kiper
On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote:
  On 23.11.12 at 02:56, Andrew Cooper andrew.coop...@citrix.com wrote:
  On 23/11/2012 01:38, H. Peter Anvin wrote:
  I still don't really get why it can't be isolated from dom0, which would
  make more sense to me, even for a Xen crash.
 
 
  The crash region (as specified by crashkernel= on the Xen command line)
  is isolated from dom0.
 
  dom0 (using the kexec utility etc) has the task of locating the Xen
  crash notes (using the kexec hypercall interface), constructing a binary
  blob containing kernel, initram and gubbins, and asking Xen to put this
  blob in the crash region (again, using the kexec hypercall interface).
 
  I do not see how this is very much different from the native case
  currently (although please correct me if I am misinformed).  Linux has
  extra work to do by populating /proc/iomem with the Xen crash regions
  boot (so the kexec utility can reference their physical addresses when
  constructing the blob), and should just act as a conduit between the
  kexec system call and the kexec hypercall to load the blob.

 But all of this _could_ be done completely independent of the
 Dom0 kernel's kexec infrastructure (i.e. fully from user space,
 invoking the necessary hypercalls through the privcmd driver).

No, this is impossible. kexec/kdump image lives in dom0 kernel memory
until execution. That is why privcmd driver itself is not a solution
in this case.

 It's just that parts of the kexec infrastructure can be re-used
 (and hence that mechanism probably seemed the easier approach
 to the implementer of the original kexec-on-Xen). If the kernel
 folks dislike that re-use (quite understandably looking at how
 much of it needs to be re-done), that shouldn't prevent us from
 looking into the existing alternatives.

This is last resort option. First I think we should try to find
good solution which reuses existing code as much as possible.

Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Jan Beulich
 On 23.11.12 at 11:37, Daniel Kiper daniel.ki...@oracle.com wrote:
 On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote:
  On 23.11.12 at 02:56, Andrew Cooper andrew.coop...@citrix.com wrote:
  On 23/11/2012 01:38, H. Peter Anvin wrote:
  I still don't really get why it can't be isolated from dom0, which would
  make more sense to me, even for a Xen crash.
 
 
  The crash region (as specified by crashkernel= on the Xen command line)
  is isolated from dom0.
 
  dom0 (using the kexec utility etc) has the task of locating the Xen
  crash notes (using the kexec hypercall interface), constructing a binary
  blob containing kernel, initram and gubbins, and asking Xen to put this
  blob in the crash region (again, using the kexec hypercall interface).
 
  I do not see how this is very much different from the native case
  currently (although please correct me if I am misinformed).  Linux has
  extra work to do by populating /proc/iomem with the Xen crash regions
  boot (so the kexec utility can reference their physical addresses when
  constructing the blob), and should just act as a conduit between the
  kexec system call and the kexec hypercall to load the blob.

 But all of this _could_ be done completely independent of the
 Dom0 kernel's kexec infrastructure (i.e. fully from user space,
 invoking the necessary hypercalls through the privcmd driver).
 
 No, this is impossible. kexec/kdump image lives in dom0 kernel memory
 until execution. That is why privcmd driver itself is not a solution
 in this case.

Even if so, there's no fundamental reason why that kernel image
can't be put into Xen controlled space instead.

Jan

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Daniel Kiper
On Fri, Nov 23, 2012 at 10:51:55AM +, Jan Beulich wrote:
  On 23.11.12 at 11:37, Daniel Kiper daniel.ki...@oracle.com wrote:
  On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote:
   On 23.11.12 at 02:56, Andrew Cooper andrew.coop...@citrix.com wrote:
   On 23/11/2012 01:38, H. Peter Anvin wrote:
   I still don't really get why it can't be isolated from dom0, which would
   make more sense to me, even for a Xen crash.
  
  
   The crash region (as specified by crashkernel= on the Xen command line)
   is isolated from dom0.
  
   dom0 (using the kexec utility etc) has the task of locating the Xen
   crash notes (using the kexec hypercall interface), constructing a binary
   blob containing kernel, initram and gubbins, and asking Xen to put this
   blob in the crash region (again, using the kexec hypercall interface).
  
   I do not see how this is very much different from the native case
   currently (although please correct me if I am misinformed).  Linux has
   extra work to do by populating /proc/iomem with the Xen crash regions
   boot (so the kexec utility can reference their physical addresses when
   constructing the blob), and should just act as a conduit between the
   kexec system call and the kexec hypercall to load the blob.
 
  But all of this _could_ be done completely independent of the
  Dom0 kernel's kexec infrastructure (i.e. fully from user space,
  invoking the necessary hypercalls through the privcmd driver).
 
  No, this is impossible. kexec/kdump image lives in dom0 kernel memory
  until execution. That is why privcmd driver itself is not a solution
  in this case.

 Even if so, there's no fundamental reason why that kernel image
 can't be put into Xen controlled space instead.

Yep, but we must change Xen kexec interface and/or its behavior first.
If we take that option then we could also move almost all needed things
from dom0 kernel to Xen. This way we could simplify Linux Kernel
kexec/kdump infrastructure needed to run on Xen.

Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Eric W. Biederman
Daniel Kiper daniel.ki...@oracle.com writes:

 On Thu, Nov 22, 2012 at 04:15:48AM -0800, ebied...@xmission.com wrote:

 Is this for when the hypervisor crashes and we want a crash dump of
 that?

 dom0 at boot gets some info about kexec/kdump configuration from Xen 
 hypervisor
 (e.g. placement of crash kernel area). Later if you call kexec syscall most
 things are done in the same way as on baremetal. However, after placing image
 in memory, HYPERVISOR_kexec_op() hypercall must be called to inform hypervisor
 that image is loaded (new hook machine_kexec_load is used for this;
 machine_kexec_unload is used for unload). Then Xen establishes fixmap for 
 pages
 found in page_list[] and returns control to dom0. If dom0 crashes or kexec 
 execute
 is used by user then dom0 calls HYPERVISOR_kexec_op() to instruct hypervisor 
 that
 kexec/kdump image should be executed immediately. Xen calls relocate_kernel()
 and all things runs as usual.


Close

 Successful code reuse depends upon not breaking the assumptions on which
 the code relies, or modifying the code so that the new modified
 assumptions are clear.  In this case you might as well define up as down
 for all of the sense kexec_ops makes.

 Hmmm... Well, problem with above mentioned functions is that they work
 on physical addresses. In Xen PVOPS (currently dom0 is PVOPS) they
 are useless in kexec/kdump case. It means that physical addresses
 must be converted to/from machine addresses which has a real meaning
 in Xen PVOPS case. That is why those funtions were introduced.

Agreed operating on addresses that are relevant to the operation at hand
makes sense.

  There may be a point to all of these but you are mixing and matching
  things badly.
 
  Do you whish to split this kexec_ops struct to something which
  works with addresses and something which is reponsible for
  loading, unloading and executing kexec/kdump? I am able to change
  that but I would like to know a bit about your vision first.

 My vision is that we should have code that makes sense.

 My suspicion is that what you want is a cousin of the existing kexec
 system call.  Perhaps what is needed is a flag to say use the firmware
 kexec system call.

 I absolutely do not understand what Xen is trying to do.  kexec by
 design should not require any firmware specific hooks.  kexec at this
 level should only need to care about the processor architeture.  Clearly
 what you are doing with Xen requires special hooks separate even from
 the normal paravirt hooks.  So I do not understand you are trying to do.

 It needs to be clear from the code what is happening differently in the
 Xen case.  Otherwise the code is unmaintainable as no one will be able
 to understand it.

 I agree. I could remove all machine_* hooks from kexec_ops and call Xen
 specific functions from arch files. However, I need to add two new
 machine calls, machine_kexec_load and machine_kexec_unload, in the same
 manner as existing machine_* calls. In general they could be used to inform
 firmware (in this case Xen) that kexec/kdump image is loaded.

 kimage_alloc_pages, kimage_free_pages, page_to_pfn, pfn_to_page, virt_to_phys
 and phys_to_virt are worse. If we could not find good solution how to replace
 them then we end up with calling Xen specific version of kexec/kdump which
 would contain nearly full copy of exisiting kexec/kdump code. Not good.

 We could add some code to kernel/kexec.c which depends on CONFIG_XEN.
 It could contain above mentioned functions which later will be called
 by existing kexec code. This is not nice to be honest. However, I hope
 that we could find better solution for that problem.

Since in the Xen case you are not performing a normal kexec or kdump if
you are going to continue to use the kexec system call then another flag
(like the KEXEC_ON_CRASH flag) should be used.

The userspace flag should be something like KEXEC_HYPERVISOR.  From
there we can have a generic interface that feeds into whatever the Xen
infrastructure is.  And if any other hypervisors implement kexec like
functionality it could feed into them if we so choose.

When the choice is clearly between a linux-only kexec and for a hypervisor
level kexec using different functions to understand the target addresses
makes sense.

And of course /sbin/kexec can easity take an additional flag to say load
the kexec image to the hypervisor.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread Andrew Cooper
On 23/11/2012 01:38, H. Peter Anvin wrote:
> I still don't really get why it can't be isolated from dom0, which would make 
> more sense to me, even for a Xen crash.
>

The crash region (as specified by crashkernel= on the Xen command line)
is isolated from dom0.

dom0 (using the kexec utility etc) has the task of locating the Xen
crash notes (using the kexec hypercall interface), constructing a binary
blob containing kernel, initram and gubbins, and asking Xen to put this
blob in the crash region (again, using the kexec hypercall interface).

I do not see how this is very much different from the native case
currently (although please correct me if I am misinformed).  Linux has
extra work to do by populating /proc/iomem with the Xen crash regions
boot (so the kexec utility can reference their physical addresses when
constructing the blob), and should just act as a conduit between the
kexec system call and the kexec hypercall to load the blob.

For within-guest kexec/kdump functionality, I agree that it is barking
mad.  However, we do see cloud operators interested in the idea so VM
administrators can look after their crashes themselves.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread H. Peter Anvin
I still don't really get why it can't be isolated from dom0, which would make 
more sense to me, even for a Xen crash.

Andrew Cooper  wrote:

>On 22/11/2012 17:47, H. Peter Anvin wrote:
>> The other thing that should be considered here is how utterly 
>> preposterous the notion of doing in-guest crash dumping is in a
>system 
>> that contains a hypervisor.  The reason for kdump is that on bare
>metal 
>> there are no other options, but in a hypervisor system the right
>thing 
>> should be for the hypervisor to do the dump (possibly spawning a
>clean 
>> I/O domain if the I/O domain is necessary to access the media.)
>>
>> There is absolutely no reason to have a crashkernel sitting around in
>
>> each guest, consuming memory, and possibly get corrupt.
>>
>>  -hpa
>>
>
>(Your reply to my email which I can see on the xen devel archive
>appears
>to have gotten lost somewhere inside the citrix email system, so
>apologies for replying out of order)
>
>The kdump kernel loaded by dom0 is for when Xen crashes, not for when
>dom0 crashes (although a dom0 crash does admittedly lead to a Xen
>crash)
>
>There is no possible way it could be a separate domain; Xen completely
>ceases to function as soon as jumps to the entry point of the kdump
>image.
>
>~Andrew

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread H. Peter Anvin
Ok... that *sort of* makes sense, but also underscores how utterly different 
this is from a normal kexec.

Andrew Cooper  wrote:

>On 22/11/2012 17:47, H. Peter Anvin wrote:
>> The other thing that should be considered here is how utterly 
>> preposterous the notion of doing in-guest crash dumping is in a
>system 
>> that contains a hypervisor.  The reason for kdump is that on bare
>metal 
>> there are no other options, but in a hypervisor system the right
>thing 
>> should be for the hypervisor to do the dump (possibly spawning a
>clean 
>> I/O domain if the I/O domain is necessary to access the media.)
>>
>> There is absolutely no reason to have a crashkernel sitting around in
>
>> each guest, consuming memory, and possibly get corrupt.
>>
>>  -hpa
>>
>
>(Your reply to my email which I can see on the xen devel archive
>appears
>to have gotten lost somewhere inside the citrix email system, so
>apologies for replying out of order)
>
>The kdump kernel loaded by dom0 is for when Xen crashes, not for when
>dom0 crashes (although a dom0 crash does admittedly lead to a Xen
>crash)
>
>There is no possible way it could be a separate domain; Xen completely
>ceases to function as soon as jumps to the entry point of the kdump
>image.
>
>~Andrew

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread Eric W. Biederman
Daniel Kiper  writes:

> On Tue, Nov 20, 2012 at 08:40:39AM -0800, ebied...@xmission.com wrote:
>> Daniel Kiper  writes:
>>
>> > Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
>> > functions or require some changes in behavior of kexec/kdump generic code.
>> > To cope with that problem kexec_ops struct was introduced. It allows
>> > a developer to replace all or some functions and control some
>> > functionality of kexec/kdump generic code.
>> >
>> > Default behavior of kexec/kdump generic code is not changed.
>>
>> Ick.
>>
>> > v2 - suggestions/fixes:
>> >- add comment for kexec_ops.crash_alloc_temp_store member
>> >  (suggested by Konrad Rzeszutek Wilk),
>> >- simplify kexec_ops usage
>> >  (suggested by Konrad Rzeszutek Wilk).
>> >
>> > Signed-off-by: Daniel Kiper 
>> > ---
>> >  include/linux/kexec.h |   26 ++
>> >  kernel/kexec.c|  131 
>> > +
>> >  2 files changed, 125 insertions(+), 32 deletions(-)
>> >
>> > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> > index d0b8458..c8d0b35 100644
>> > --- a/include/linux/kexec.h
>> > +++ b/include/linux/kexec.h
>> > @@ -116,7 +116,33 @@ struct kimage {
>> >  #endif
>> >  };
>> >
>> > +struct kexec_ops {
>> > +  /*
>> > +   * Some kdump implementations (e.g. Xen PVOPS dom0) could not access
>> > +   * directly crash kernel memory area. In this situation they must
>> > +   * allocate memory outside of it and later move contents from temporary
>> > +   * storage to final resting places (usualy done by relocate_kernel()).
>> > +   * Such behavior could be enforced by setting
>> > +   * crash_alloc_temp_store member to true.
>> > +   */
>>
>> Why in the world would Xen not be able to access crash kernel memory?
>> As currently defined it is normal memory that the kernel chooses not to
>> use.
>>
>> If relocate kernel can access that memory you definitely can access the
>> memory so the comment does not make any sense.
>
> Crash kernel memory is reserved by Xen hypervisor and Xen hypervisor
> only has access to it. dom0 does not have any mapping of this area.
> However, relocate_kernel() has access to crash kernel memory
> because it is executed by Xen hypervisor and whole machine
> memory is identity mapped.

This is all weird.  Doubly so since this code is multi-arch and you have
a set of requirements no other arch has had.

I recall that Xen uses kexec in a unique manner.  What is the hypervisor
interface and how is it used?

Is this for when the hypervisor crashes and we want a crash dump of
that?



>> > +  bool crash_alloc_temp_store;
>> > +  struct page *(*kimage_alloc_pages)(gfp_t gfp_mask,
>> > +  unsigned int order,
>> > +  unsigned long limit);
>> > +  void (*kimage_free_pages)(struct page *page);
>> > +  unsigned long (*page_to_pfn)(struct page *page);
>> > +  struct page *(*pfn_to_page)(unsigned long pfn);
>> > +  unsigned long (*virt_to_phys)(volatile void *address);
>> > +  void *(*phys_to_virt)(unsigned long address);
>> > +  int (*machine_kexec_prepare)(struct kimage *image);
>> > +  int (*machine_kexec_load)(struct kimage *image);
>> > +  void (*machine_kexec_cleanup)(struct kimage *image);
>> > +  void (*machine_kexec_unload)(struct kimage *image);
>> > +  void (*machine_kexec_shutdown)(void);
>> > +  void (*machine_kexec)(struct kimage *image);
>> > +};
>>
>> Ugh.  This is a nasty abstraction.
>>
>> You are mixing and matching a bunch of things together here.
>>
>> If you need to override machine_kexec_xxx please do that on a per
>> architecture basis.
>
> Yes, it is possible but I think that it is worth to do it at that
> level because it could be useful for other archs too (e.g. Xen ARM port
> is under development). Then we do not need to duplicate that functionality
> in arch code. Additionally, Xen requires machine_kexec_load and
> machine_kexec_unload hooks which are not available in current generic
> kexec/kdump code.


Let me be clear.  kexec_ops as you have implemented it is absolutely
unacceptable.

Your kexec_ops is not an abstraction but a hack that enshrines in stone
implementation details.

>> Special case overrides of page_to_pfn, pfn_to_page, virt_to_phys,
>> phys_to_virt, and friends seem completely inappropriate.
>
> They are required in Xen PVOPS case. If we do not do that in that way
> then we at least need to duplicate almost all generic kexec/kdump existing
> code in arch depended files. I do not mention that we need to capture
> relevant syscall and other things. I think that this is wrong way.

A different definition of phys_to_virt and page_to_pfn for one specific
function is total nonsense.

It may actually be better to have a completely different code path.
This looks more like code abuse than code reuse.

Successful code reuse depends upon not breaking the assumptions on which
the code relies, or modifying the code so that 

Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread Andrew Cooper
On 22/11/2012 17:47, H. Peter Anvin wrote:
> The other thing that should be considered here is how utterly 
> preposterous the notion of doing in-guest crash dumping is in a system 
> that contains a hypervisor.  The reason for kdump is that on bare metal 
> there are no other options, but in a hypervisor system the right thing 
> should be for the hypervisor to do the dump (possibly spawning a clean 
> I/O domain if the I/O domain is necessary to access the media.)
>
> There is absolutely no reason to have a crashkernel sitting around in 
> each guest, consuming memory, and possibly get corrupt.
>
>   -hpa
>

(Your reply to my email which I can see on the xen devel archive appears
to have gotten lost somewhere inside the citrix email system, so
apologies for replying out of order)

The kdump kernel loaded by dom0 is for when Xen crashes, not for when
dom0 crashes (although a dom0 crash does admittedly lead to a Xen crash)

There is no possible way it could be a separate domain; Xen completely
ceases to function as soon as jumps to the entry point of the kdump image.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread H. Peter Anvin
Bullshit.  This should be a separate domain.

Andrew Cooper  wrote:

>On 22/11/12 17:47, H. Peter Anvin wrote:
>> The other thing that should be considered here is how utterly 
>> preposterous the notion of doing in-guest crash dumping is in a
>system 
>> that contains a hypervisor.  The reason for kdump is that on bare
>metal 
>> there are no other options, but in a hypervisor system the right
>thing 
>> should be for the hypervisor to do the dump (possibly spawning a
>clean 
>> I/O domain if the I/O domain is necessary to access the media.)
>>
>> There is absolutely no reason to have a crashkernel sitting around in
>
>> each guest, consuming memory, and possibly get corrupt.
>>
>>  -hpa
>>
>
>I agree that regular guests should not be using the kexec/kdump. 
>However, this patch series is required for allowing a pvops kernel to
>be
>a crash kernel for Xen, which is very important from dom0/Xen's point
>of
>view.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread H. Peter Anvin

On 11/22/2012 04:15 AM, Eric W. Biederman wrote:


Let me be clear.  kexec_ops as you have implemented it is absolutely
unacceptable.

Your kexec_ops is not an abstraction but a hack that enshrines in stone
implementation details.



This is the kind of stuff that is absolutely endemic to the Xen 
endeavour, and which is why Xen is such a disease.  The design principle 
seems to have been "hey, let's go and replace random Linux kernel 
internals with our own stuff, and make them ABIs, so that they can never 
change.  Oh, and let's not bother documenting the constraints we're 
imposing, that might make the code manageable."


I actually talked to Ian Jackson at LCE, and mentioned among other 
things the bogosity of requiring a PUD page for three-level paging in 
Linux -- a bogosity which has spread from Xen into native.  It's a page 
wasted for no good reason, since it only contains 32 bytes worth of 
data, *inherently*.  Furthermore, contrary to popular belief, it is 
*not* pa page table per se.


Ian told me: "I didn't know we did that, and we shouldn't have to." 
Here we have suffered this overhead for at least six years, because *XEN 
FUCKED UP AND NOONE ELSE HAD ANY WAY OF KNOWING THAT*.


Now we know that it can "maybe"(!!!) be fixed, if we are willing to 
spend time working on a dying platform, whereas we have already suffered 
the damage during the height of its importance.


-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread H. Peter Anvin
The other thing that should be considered here is how utterly 
preposterous the notion of doing in-guest crash dumping is in a system 
that contains a hypervisor.  The reason for kdump is that on bare metal 
there are no other options, but in a hypervisor system the right thing 
should be for the hypervisor to do the dump (possibly spawning a clean 
I/O domain if the I/O domain is necessary to access the media.)


There is absolutely no reason to have a crashkernel sitting around in 
each guest, consuming memory, and possibly get corrupt.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread Andrew Cooper
On 22/11/12 17:47, H. Peter Anvin wrote:
> The other thing that should be considered here is how utterly 
> preposterous the notion of doing in-guest crash dumping is in a system 
> that contains a hypervisor.  The reason for kdump is that on bare metal 
> there are no other options, but in a hypervisor system the right thing 
> should be for the hypervisor to do the dump (possibly spawning a clean 
> I/O domain if the I/O domain is necessary to access the media.)
>
> There is absolutely no reason to have a crashkernel sitting around in 
> each guest, consuming memory, and possibly get corrupt.
>
>   -hpa
>

I agree that regular guests should not be using the kexec/kdump. 
However, this patch series is required for allowing a pvops kernel to be
a crash kernel for Xen, which is very important from dom0/Xen's point of
view.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread Andrew Cooper
On 22/11/12 17:47, H. Peter Anvin wrote:
 The other thing that should be considered here is how utterly 
 preposterous the notion of doing in-guest crash dumping is in a system 
 that contains a hypervisor.  The reason for kdump is that on bare metal 
 there are no other options, but in a hypervisor system the right thing 
 should be for the hypervisor to do the dump (possibly spawning a clean 
 I/O domain if the I/O domain is necessary to access the media.)

 There is absolutely no reason to have a crashkernel sitting around in 
 each guest, consuming memory, and possibly get corrupt.

   -hpa


I agree that regular guests should not be using the kexec/kdump. 
However, this patch series is required for allowing a pvops kernel to be
a crash kernel for Xen, which is very important from dom0/Xen's point of
view.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread H. Peter Anvin
The other thing that should be considered here is how utterly 
preposterous the notion of doing in-guest crash dumping is in a system 
that contains a hypervisor.  The reason for kdump is that on bare metal 
there are no other options, but in a hypervisor system the right thing 
should be for the hypervisor to do the dump (possibly spawning a clean 
I/O domain if the I/O domain is necessary to access the media.)


There is absolutely no reason to have a crashkernel sitting around in 
each guest, consuming memory, and possibly get corrupt.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread H. Peter Anvin

On 11/22/2012 04:15 AM, Eric W. Biederman wrote:


Let me be clear.  kexec_ops as you have implemented it is absolutely
unacceptable.

Your kexec_ops is not an abstraction but a hack that enshrines in stone
implementation details.



This is the kind of stuff that is absolutely endemic to the Xen 
endeavour, and which is why Xen is such a disease.  The design principle 
seems to have been hey, let's go and replace random Linux kernel 
internals with our own stuff, and make them ABIs, so that they can never 
change.  Oh, and let's not bother documenting the constraints we're 
imposing, that might make the code manageable.


I actually talked to Ian Jackson at LCE, and mentioned among other 
things the bogosity of requiring a PUD page for three-level paging in 
Linux -- a bogosity which has spread from Xen into native.  It's a page 
wasted for no good reason, since it only contains 32 bytes worth of 
data, *inherently*.  Furthermore, contrary to popular belief, it is 
*not* pa page table per se.


Ian told me: I didn't know we did that, and we shouldn't have to. 
Here we have suffered this overhead for at least six years, because *XEN 
FUCKED UP AND NOONE ELSE HAD ANY WAY OF KNOWING THAT*.


Now we know that it can maybe(!!!) be fixed, if we are willing to 
spend time working on a dying platform, whereas we have already suffered 
the damage during the height of its importance.


-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread H. Peter Anvin
Bullshit.  This should be a separate domain.

Andrew Cooper andrew.coop...@citrix.com wrote:

On 22/11/12 17:47, H. Peter Anvin wrote:
 The other thing that should be considered here is how utterly 
 preposterous the notion of doing in-guest crash dumping is in a
system 
 that contains a hypervisor.  The reason for kdump is that on bare
metal 
 there are no other options, but in a hypervisor system the right
thing 
 should be for the hypervisor to do the dump (possibly spawning a
clean 
 I/O domain if the I/O domain is necessary to access the media.)

 There is absolutely no reason to have a crashkernel sitting around in

 each guest, consuming memory, and possibly get corrupt.

  -hpa


I agree that regular guests should not be using the kexec/kdump. 
However, this patch series is required for allowing a pvops kernel to
be
a crash kernel for Xen, which is very important from dom0/Xen's point
of
view.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread Andrew Cooper
On 22/11/2012 17:47, H. Peter Anvin wrote:
 The other thing that should be considered here is how utterly 
 preposterous the notion of doing in-guest crash dumping is in a system 
 that contains a hypervisor.  The reason for kdump is that on bare metal 
 there are no other options, but in a hypervisor system the right thing 
 should be for the hypervisor to do the dump (possibly spawning a clean 
 I/O domain if the I/O domain is necessary to access the media.)

 There is absolutely no reason to have a crashkernel sitting around in 
 each guest, consuming memory, and possibly get corrupt.

   -hpa


(Your reply to my email which I can see on the xen devel archive appears
to have gotten lost somewhere inside the citrix email system, so
apologies for replying out of order)

The kdump kernel loaded by dom0 is for when Xen crashes, not for when
dom0 crashes (although a dom0 crash does admittedly lead to a Xen crash)

There is no possible way it could be a separate domain; Xen completely
ceases to function as soon as jumps to the entry point of the kdump image.

~Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread Eric W. Biederman
Daniel Kiper daniel.ki...@oracle.com writes:

 On Tue, Nov 20, 2012 at 08:40:39AM -0800, ebied...@xmission.com wrote:
 Daniel Kiper daniel.ki...@oracle.com writes:

  Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
  functions or require some changes in behavior of kexec/kdump generic code.
  To cope with that problem kexec_ops struct was introduced. It allows
  a developer to replace all or some functions and control some
  functionality of kexec/kdump generic code.
 
  Default behavior of kexec/kdump generic code is not changed.

 Ick.

  v2 - suggestions/fixes:
 - add comment for kexec_ops.crash_alloc_temp_store member
   (suggested by Konrad Rzeszutek Wilk),
 - simplify kexec_ops usage
   (suggested by Konrad Rzeszutek Wilk).
 
  Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
  ---
   include/linux/kexec.h |   26 ++
   kernel/kexec.c|  131 
  +
   2 files changed, 125 insertions(+), 32 deletions(-)
 
  diff --git a/include/linux/kexec.h b/include/linux/kexec.h
  index d0b8458..c8d0b35 100644
  --- a/include/linux/kexec.h
  +++ b/include/linux/kexec.h
  @@ -116,7 +116,33 @@ struct kimage {
   #endif
   };
 
  +struct kexec_ops {
  +  /*
  +   * Some kdump implementations (e.g. Xen PVOPS dom0) could not access
  +   * directly crash kernel memory area. In this situation they must
  +   * allocate memory outside of it and later move contents from temporary
  +   * storage to final resting places (usualy done by relocate_kernel()).
  +   * Such behavior could be enforced by setting
  +   * crash_alloc_temp_store member to true.
  +   */

 Why in the world would Xen not be able to access crash kernel memory?
 As currently defined it is normal memory that the kernel chooses not to
 use.

 If relocate kernel can access that memory you definitely can access the
 memory so the comment does not make any sense.

 Crash kernel memory is reserved by Xen hypervisor and Xen hypervisor
 only has access to it. dom0 does not have any mapping of this area.
 However, relocate_kernel() has access to crash kernel memory
 because it is executed by Xen hypervisor and whole machine
 memory is identity mapped.

This is all weird.  Doubly so since this code is multi-arch and you have
a set of requirements no other arch has had.

I recall that Xen uses kexec in a unique manner.  What is the hypervisor
interface and how is it used?

Is this for when the hypervisor crashes and we want a crash dump of
that?



  +  bool crash_alloc_temp_store;
  +  struct page *(*kimage_alloc_pages)(gfp_t gfp_mask,
  +  unsigned int order,
  +  unsigned long limit);
  +  void (*kimage_free_pages)(struct page *page);
  +  unsigned long (*page_to_pfn)(struct page *page);
  +  struct page *(*pfn_to_page)(unsigned long pfn);
  +  unsigned long (*virt_to_phys)(volatile void *address);
  +  void *(*phys_to_virt)(unsigned long address);
  +  int (*machine_kexec_prepare)(struct kimage *image);
  +  int (*machine_kexec_load)(struct kimage *image);
  +  void (*machine_kexec_cleanup)(struct kimage *image);
  +  void (*machine_kexec_unload)(struct kimage *image);
  +  void (*machine_kexec_shutdown)(void);
  +  void (*machine_kexec)(struct kimage *image);
  +};

 Ugh.  This is a nasty abstraction.

 You are mixing and matching a bunch of things together here.

 If you need to override machine_kexec_xxx please do that on a per
 architecture basis.

 Yes, it is possible but I think that it is worth to do it at that
 level because it could be useful for other archs too (e.g. Xen ARM port
 is under development). Then we do not need to duplicate that functionality
 in arch code. Additionally, Xen requires machine_kexec_load and
 machine_kexec_unload hooks which are not available in current generic
 kexec/kdump code.


Let me be clear.  kexec_ops as you have implemented it is absolutely
unacceptable.

Your kexec_ops is not an abstraction but a hack that enshrines in stone
implementation details.

 Special case overrides of page_to_pfn, pfn_to_page, virt_to_phys,
 phys_to_virt, and friends seem completely inappropriate.

 They are required in Xen PVOPS case. If we do not do that in that way
 then we at least need to duplicate almost all generic kexec/kdump existing
 code in arch depended files. I do not mention that we need to capture
 relevant syscall and other things. I think that this is wrong way.

A different definition of phys_to_virt and page_to_pfn for one specific
function is total nonsense.

It may actually be better to have a completely different code path.
This looks more like code abuse than code reuse.

Successful code reuse depends upon not breaking the assumptions on which
the code relies, or modifying the code so that the new modified
assumptions are clear.  In this case you might as well define up as down
for all of the sense kexec_ops makes.

 There may be a point 

Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread H. Peter Anvin
Ok... that *sort of* makes sense, but also underscores how utterly different 
this is from a normal kexec.

Andrew Cooper andrew.coop...@citrix.com wrote:

On 22/11/2012 17:47, H. Peter Anvin wrote:
 The other thing that should be considered here is how utterly 
 preposterous the notion of doing in-guest crash dumping is in a
system 
 that contains a hypervisor.  The reason for kdump is that on bare
metal 
 there are no other options, but in a hypervisor system the right
thing 
 should be for the hypervisor to do the dump (possibly spawning a
clean 
 I/O domain if the I/O domain is necessary to access the media.)

 There is absolutely no reason to have a crashkernel sitting around in

 each guest, consuming memory, and possibly get corrupt.

  -hpa


(Your reply to my email which I can see on the xen devel archive
appears
to have gotten lost somewhere inside the citrix email system, so
apologies for replying out of order)

The kdump kernel loaded by dom0 is for when Xen crashes, not for when
dom0 crashes (although a dom0 crash does admittedly lead to a Xen
crash)

There is no possible way it could be a separate domain; Xen completely
ceases to function as soon as jumps to the entry point of the kdump
image.

~Andrew

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread H. Peter Anvin
I still don't really get why it can't be isolated from dom0, which would make 
more sense to me, even for a Xen crash.

Andrew Cooper andrew.coop...@citrix.com wrote:

On 22/11/2012 17:47, H. Peter Anvin wrote:
 The other thing that should be considered here is how utterly 
 preposterous the notion of doing in-guest crash dumping is in a
system 
 that contains a hypervisor.  The reason for kdump is that on bare
metal 
 there are no other options, but in a hypervisor system the right
thing 
 should be for the hypervisor to do the dump (possibly spawning a
clean 
 I/O domain if the I/O domain is necessary to access the media.)

 There is absolutely no reason to have a crashkernel sitting around in

 each guest, consuming memory, and possibly get corrupt.

  -hpa


(Your reply to my email which I can see on the xen devel archive
appears
to have gotten lost somewhere inside the citrix email system, so
apologies for replying out of order)

The kdump kernel loaded by dom0 is for when Xen crashes, not for when
dom0 crashes (although a dom0 crash does admittedly lead to a Xen
crash)

There is no possible way it could be a separate domain; Xen completely
ceases to function as soon as jumps to the entry point of the kdump
image.

~Andrew

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-22 Thread Andrew Cooper
On 23/11/2012 01:38, H. Peter Anvin wrote:
 I still don't really get why it can't be isolated from dom0, which would make 
 more sense to me, even for a Xen crash.


The crash region (as specified by crashkernel= on the Xen command line)
is isolated from dom0.

dom0 (using the kexec utility etc) has the task of locating the Xen
crash notes (using the kexec hypercall interface), constructing a binary
blob containing kernel, initram and gubbins, and asking Xen to put this
blob in the crash region (again, using the kexec hypercall interface).

I do not see how this is very much different from the native case
currently (although please correct me if I am misinformed).  Linux has
extra work to do by populating /proc/iomem with the Xen crash regions
boot (so the kexec utility can reference their physical addresses when
constructing the blob), and should just act as a conduit between the
kexec system call and the kexec hypercall to load the blob.

For within-guest kexec/kdump functionality, I agree that it is barking
mad.  However, we do see cloud operators interested in the idea so VM
administrators can look after their crashes themselves.

~Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-21 Thread Daniel Kiper
On Tue, Nov 20, 2012 at 08:40:39AM -0800, ebied...@xmission.com wrote:
> Daniel Kiper  writes:
>
> > Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
> > functions or require some changes in behavior of kexec/kdump generic code.
> > To cope with that problem kexec_ops struct was introduced. It allows
> > a developer to replace all or some functions and control some
> > functionality of kexec/kdump generic code.
> >
> > Default behavior of kexec/kdump generic code is not changed.
>
> Ick.
>
> > v2 - suggestions/fixes:
> >- add comment for kexec_ops.crash_alloc_temp_store member
> >  (suggested by Konrad Rzeszutek Wilk),
> >- simplify kexec_ops usage
> >  (suggested by Konrad Rzeszutek Wilk).
> >
> > Signed-off-by: Daniel Kiper 
> > ---
> >  include/linux/kexec.h |   26 ++
> >  kernel/kexec.c|  131 
> > +
> >  2 files changed, 125 insertions(+), 32 deletions(-)
> >
> > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > index d0b8458..c8d0b35 100644
> > --- a/include/linux/kexec.h
> > +++ b/include/linux/kexec.h
> > @@ -116,7 +116,33 @@ struct kimage {
> >  #endif
> >  };
> >
> > +struct kexec_ops {
> > +   /*
> > +* Some kdump implementations (e.g. Xen PVOPS dom0) could not access
> > +* directly crash kernel memory area. In this situation they must
> > +* allocate memory outside of it and later move contents from temporary
> > +* storage to final resting places (usualy done by relocate_kernel()).
> > +* Such behavior could be enforced by setting
> > +* crash_alloc_temp_store member to true.
> > +*/
>
> Why in the world would Xen not be able to access crash kernel memory?
> As currently defined it is normal memory that the kernel chooses not to
> use.
>
> If relocate kernel can access that memory you definitely can access the
> memory so the comment does not make any sense.

Crash kernel memory is reserved by Xen hypervisor and Xen hypervisor
only has access to it. dom0 does not have any mapping of this area.
However, relocate_kernel() has access to crash kernel memory
because it is executed by Xen hypervisor and whole machine
memory is identity mapped.

> > +   bool crash_alloc_temp_store;
> > +   struct page *(*kimage_alloc_pages)(gfp_t gfp_mask,
> > +   unsigned int order,
> > +   unsigned long limit);
> > +   void (*kimage_free_pages)(struct page *page);
> > +   unsigned long (*page_to_pfn)(struct page *page);
> > +   struct page *(*pfn_to_page)(unsigned long pfn);
> > +   unsigned long (*virt_to_phys)(volatile void *address);
> > +   void *(*phys_to_virt)(unsigned long address);
> > +   int (*machine_kexec_prepare)(struct kimage *image);
> > +   int (*machine_kexec_load)(struct kimage *image);
> > +   void (*machine_kexec_cleanup)(struct kimage *image);
> > +   void (*machine_kexec_unload)(struct kimage *image);
> > +   void (*machine_kexec_shutdown)(void);
> > +   void (*machine_kexec)(struct kimage *image);
> > +};
>
> Ugh.  This is a nasty abstraction.
>
> You are mixing and matching a bunch of things together here.
>
> If you need to override machine_kexec_xxx please do that on a per
> architecture basis.

Yes, it is possible but I think that it is worth to do it at that
level because it could be useful for other archs too (e.g. Xen ARM port
is under development). Then we do not need to duplicate that functionality
in arch code. Additionally, Xen requires machine_kexec_load and
machine_kexec_unload hooks which are not available in current generic
kexec/kdump code.

> Special case overrides of page_to_pfn, pfn_to_page, virt_to_phys,
> phys_to_virt, and friends seem completely inappropriate.

They are required in Xen PVOPS case. If we do not do that in that way
then we at least need to duplicate almost all generic kexec/kdump existing
code in arch depended files. I do not mention that we need to capture
relevant syscall and other things. I think that this is wrong way.

> There may be a point to all of these but you are mixing and matching
> things badly.

Do you whish to split this kexec_ops struct to something which
works with addresses and something which is reponsible for
loading, unloading and executing kexec/kdump? I am able to change
that but I would like to know a bit about your vision first.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-21 Thread Daniel Kiper
On Tue, Nov 20, 2012 at 08:40:39AM -0800, ebied...@xmission.com wrote:
 Daniel Kiper daniel.ki...@oracle.com writes:

  Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
  functions or require some changes in behavior of kexec/kdump generic code.
  To cope with that problem kexec_ops struct was introduced. It allows
  a developer to replace all or some functions and control some
  functionality of kexec/kdump generic code.
 
  Default behavior of kexec/kdump generic code is not changed.

 Ick.

  v2 - suggestions/fixes:
 - add comment for kexec_ops.crash_alloc_temp_store member
   (suggested by Konrad Rzeszutek Wilk),
 - simplify kexec_ops usage
   (suggested by Konrad Rzeszutek Wilk).
 
  Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
  ---
   include/linux/kexec.h |   26 ++
   kernel/kexec.c|  131 
  +
   2 files changed, 125 insertions(+), 32 deletions(-)
 
  diff --git a/include/linux/kexec.h b/include/linux/kexec.h
  index d0b8458..c8d0b35 100644
  --- a/include/linux/kexec.h
  +++ b/include/linux/kexec.h
  @@ -116,7 +116,33 @@ struct kimage {
   #endif
   };
 
  +struct kexec_ops {
  +   /*
  +* Some kdump implementations (e.g. Xen PVOPS dom0) could not access
  +* directly crash kernel memory area. In this situation they must
  +* allocate memory outside of it and later move contents from temporary
  +* storage to final resting places (usualy done by relocate_kernel()).
  +* Such behavior could be enforced by setting
  +* crash_alloc_temp_store member to true.
  +*/

 Why in the world would Xen not be able to access crash kernel memory?
 As currently defined it is normal memory that the kernel chooses not to
 use.

 If relocate kernel can access that memory you definitely can access the
 memory so the comment does not make any sense.

Crash kernel memory is reserved by Xen hypervisor and Xen hypervisor
only has access to it. dom0 does not have any mapping of this area.
However, relocate_kernel() has access to crash kernel memory
because it is executed by Xen hypervisor and whole machine
memory is identity mapped.

  +   bool crash_alloc_temp_store;
  +   struct page *(*kimage_alloc_pages)(gfp_t gfp_mask,
  +   unsigned int order,
  +   unsigned long limit);
  +   void (*kimage_free_pages)(struct page *page);
  +   unsigned long (*page_to_pfn)(struct page *page);
  +   struct page *(*pfn_to_page)(unsigned long pfn);
  +   unsigned long (*virt_to_phys)(volatile void *address);
  +   void *(*phys_to_virt)(unsigned long address);
  +   int (*machine_kexec_prepare)(struct kimage *image);
  +   int (*machine_kexec_load)(struct kimage *image);
  +   void (*machine_kexec_cleanup)(struct kimage *image);
  +   void (*machine_kexec_unload)(struct kimage *image);
  +   void (*machine_kexec_shutdown)(void);
  +   void (*machine_kexec)(struct kimage *image);
  +};

 Ugh.  This is a nasty abstraction.

 You are mixing and matching a bunch of things together here.

 If you need to override machine_kexec_xxx please do that on a per
 architecture basis.

Yes, it is possible but I think that it is worth to do it at that
level because it could be useful for other archs too (e.g. Xen ARM port
is under development). Then we do not need to duplicate that functionality
in arch code. Additionally, Xen requires machine_kexec_load and
machine_kexec_unload hooks which are not available in current generic
kexec/kdump code.

 Special case overrides of page_to_pfn, pfn_to_page, virt_to_phys,
 phys_to_virt, and friends seem completely inappropriate.

They are required in Xen PVOPS case. If we do not do that in that way
then we at least need to duplicate almost all generic kexec/kdump existing
code in arch depended files. I do not mention that we need to capture
relevant syscall and other things. I think that this is wrong way.

 There may be a point to all of these but you are mixing and matching
 things badly.

Do you whish to split this kexec_ops struct to something which
works with addresses and something which is reponsible for
loading, unloading and executing kexec/kdump? I am able to change
that but I would like to know a bit about your vision first.

Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-20 Thread Eric W. Biederman
Daniel Kiper  writes:

> Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
> functions or require some changes in behavior of kexec/kdump generic code.
> To cope with that problem kexec_ops struct was introduced. It allows
> a developer to replace all or some functions and control some
> functionality of kexec/kdump generic code.
>
> Default behavior of kexec/kdump generic code is not changed.

Ick.

> v2 - suggestions/fixes:
>- add comment for kexec_ops.crash_alloc_temp_store member
>  (suggested by Konrad Rzeszutek Wilk),
>- simplify kexec_ops usage
>  (suggested by Konrad Rzeszutek Wilk).
>
> Signed-off-by: Daniel Kiper 
> ---
>  include/linux/kexec.h |   26 ++
>  kernel/kexec.c|  131 
> +
>  2 files changed, 125 insertions(+), 32 deletions(-)
>
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index d0b8458..c8d0b35 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -116,7 +116,33 @@ struct kimage {
>  #endif
>  };
>  
> +struct kexec_ops {
> + /*
> +  * Some kdump implementations (e.g. Xen PVOPS dom0) could not access
> +  * directly crash kernel memory area. In this situation they must
> +  * allocate memory outside of it and later move contents from temporary
> +  * storage to final resting places (usualy done by relocate_kernel()).
> +  * Such behavior could be enforced by setting
> +  * crash_alloc_temp_store member to true.
> +  */

Why in the world would Xen not be able to access crash kernel memory?
As currently defined it is normal memory that the kernel chooses not to
use.

If relocate kernel can access that memory you definitely can access the
memory so the comment does not make any sense.

> + bool crash_alloc_temp_store;
> + struct page *(*kimage_alloc_pages)(gfp_t gfp_mask,
> + unsigned int order,
> + unsigned long limit);
> + void (*kimage_free_pages)(struct page *page);
> + unsigned long (*page_to_pfn)(struct page *page);
> + struct page *(*pfn_to_page)(unsigned long pfn);
> + unsigned long (*virt_to_phys)(volatile void *address);
> + void *(*phys_to_virt)(unsigned long address);
> + int (*machine_kexec_prepare)(struct kimage *image);
> + int (*machine_kexec_load)(struct kimage *image);
> + void (*machine_kexec_cleanup)(struct kimage *image);
> + void (*machine_kexec_unload)(struct kimage *image);
> + void (*machine_kexec_shutdown)(void);
> + void (*machine_kexec)(struct kimage *image);
> +};

Ugh.  This is a nasty abstraction.

You are mixing and matching a bunch of things together here.

If you need to override machine_kexec_xxx please do that on a per
architecture basis.

Special case overrides of page_to_pfn, pfn_to_page, virt_to_phys,
phys_to_virt, and friends seem completely inappropriate.

There may be a point to all of these but you are mixing and matching
things badly.


Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-20 Thread Eric W. Biederman
Daniel Kiper daniel.ki...@oracle.com writes:

 Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
 functions or require some changes in behavior of kexec/kdump generic code.
 To cope with that problem kexec_ops struct was introduced. It allows
 a developer to replace all or some functions and control some
 functionality of kexec/kdump generic code.

 Default behavior of kexec/kdump generic code is not changed.

Ick.

 v2 - suggestions/fixes:
- add comment for kexec_ops.crash_alloc_temp_store member
  (suggested by Konrad Rzeszutek Wilk),
- simplify kexec_ops usage
  (suggested by Konrad Rzeszutek Wilk).

 Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
 ---
  include/linux/kexec.h |   26 ++
  kernel/kexec.c|  131 
 +
  2 files changed, 125 insertions(+), 32 deletions(-)

 diff --git a/include/linux/kexec.h b/include/linux/kexec.h
 index d0b8458..c8d0b35 100644
 --- a/include/linux/kexec.h
 +++ b/include/linux/kexec.h
 @@ -116,7 +116,33 @@ struct kimage {
  #endif
  };
  
 +struct kexec_ops {
 + /*
 +  * Some kdump implementations (e.g. Xen PVOPS dom0) could not access
 +  * directly crash kernel memory area. In this situation they must
 +  * allocate memory outside of it and later move contents from temporary
 +  * storage to final resting places (usualy done by relocate_kernel()).
 +  * Such behavior could be enforced by setting
 +  * crash_alloc_temp_store member to true.
 +  */

Why in the world would Xen not be able to access crash kernel memory?
As currently defined it is normal memory that the kernel chooses not to
use.

If relocate kernel can access that memory you definitely can access the
memory so the comment does not make any sense.

 + bool crash_alloc_temp_store;
 + struct page *(*kimage_alloc_pages)(gfp_t gfp_mask,
 + unsigned int order,
 + unsigned long limit);
 + void (*kimage_free_pages)(struct page *page);
 + unsigned long (*page_to_pfn)(struct page *page);
 + struct page *(*pfn_to_page)(unsigned long pfn);
 + unsigned long (*virt_to_phys)(volatile void *address);
 + void *(*phys_to_virt)(unsigned long address);
 + int (*machine_kexec_prepare)(struct kimage *image);
 + int (*machine_kexec_load)(struct kimage *image);
 + void (*machine_kexec_cleanup)(struct kimage *image);
 + void (*machine_kexec_unload)(struct kimage *image);
 + void (*machine_kexec_shutdown)(void);
 + void (*machine_kexec)(struct kimage *image);
 +};

Ugh.  This is a nasty abstraction.

You are mixing and matching a bunch of things together here.

If you need to override machine_kexec_xxx please do that on a per
architecture basis.

Special case overrides of page_to_pfn, pfn_to_page, virt_to_phys,
phys_to_virt, and friends seem completely inappropriate.

There may be a point to all of these but you are mixing and matching
things badly.


Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/