Re: [PATCH] [RFC] Pass a valid token to rats_call() in phyp-dump code.

2008-12-16 Thread Manish Ahuja
Yes,

That is required. It is in the patches that I sent to Ben, Paul  Brad.

I just waiting to post it with other patches.

Acked-by: Manish Ahuja mahu...@gmail.com

Tony Breeds wrote:
 ibm_configure_kernel_dump, is passed as the token to rtas_call() but I
 cannot see where it is initialised.  Set it to something sane?
 
 Signed-off-by: Tony Breeds t...@bakeyournoodle.com
 ---
  arch/powerpc/platforms/pseries/phyp_dump.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/platforms/pseries/phyp_dump.c 
 b/arch/powerpc/platforms/pseries/phyp_dump.c
 index 16e659a..6cf35cd 100644
 --- a/arch/powerpc/platforms/pseries/phyp_dump.c
 +++ b/arch/powerpc/platforms/pseries/phyp_dump.c
 @@ -414,6 +414,8 @@ static int __init phyp_dump_setup(void)
   of_node_put(rtas);
   }
  
 + ibm_configure_kernel_dump = rtas_token(ibm,configure-kernel-dump);
 +
   print_dump_header(dump_header);
   dump_area_length = init_dump_header(phdr);
   /* align down */


-- 

--
Manish Ahuja
Linux RAS Engineer.
IBM Linux Technology Center
mah...@us.ibm.com
512-838-1928, t/l 678-1928.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Protect against NULL pointer deref in phyp-dump code.

2008-12-16 Thread Manish Ahuja

Acked-by: Manish Ahuja mahu...@gmail.com


Tony Breeds wrote:
 print_dump_header() will be called at least once with a NULL pointer in
 a normal boot sequence.  if DEBUG is defined then we will get a deref,
 add a quick fix to exit early in the NULL pointer case.
 
 Signed-off-by: Tony Breeds t...@bakeyournoodle.com
 ---
  arch/powerpc/platforms/pseries/phyp_dump.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/platforms/pseries/phyp_dump.c 
 b/arch/powerpc/platforms/pseries/phyp_dump.c
 index edbc012..16e659a 100644
 --- a/arch/powerpc/platforms/pseries/phyp_dump.c
 +++ b/arch/powerpc/platforms/pseries/phyp_dump.c
 @@ -130,6 +130,9 @@ static unsigned long init_dump_header(struct 
 phyp_dump_header *ph)
  static void print_dump_header(const struct phyp_dump_header *ph)
  {
  #ifdef DEBUG
 + if (ph == NULL)
 + return;
 +
   printk(KERN_INFO dump header:\n);
   /* setup some ph-sections required */
   printk(KERN_INFO version = %d\n, ph-version);


-- 

--
Manish Ahuja
Linux RAS Engineer.
IBM Linux Technology Center
mah...@us.ibm.com
512-838-1928, t/l 678-1928.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] pseries: phyp dump: Variable size reserve space.

2008-04-18 Thread Manish Ahuja
Yeah, that makes sense, I will shortly send a documentation patch for all the 
boot vars
that I have added.

Thanks for reminding.

-Manish




Linas Vepstas wrote:
 On 07/04/2008, Manish Ahuja [EMAIL PROTECTED] wrote:
 A small proposed change in the amount of reserve space we allocate during 
 boot.
  Currently we reserve 256MB only.
  The proposed change does one of the 3 things.

  A. It checks to see if there is cmdline variable set and if found sets the
value to it. OR
  B. It computes 5% of total ram and rounds it down to multiples of 256MB. AND
  C. Compares the rounded down value and returns larger of two values, the new
computed value or 256MB.

  Again this is for large systems who have excess memory.

 [...]
   early_param(phyp_dump, early_phyp_dump_enabled);
 
 I'm pretty sure you will want to document this boot param in the 
 documentation,
 as well as add a few words about why it might be interesting to users (i.e.
 that its for large systems...)
 
 --linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] pseries: phyp dump: Variable size reserve space.

2008-04-15 Thread Manish Ahuja
Paul,

The aim is to have more flex space for the kernel on machines with more 
resources. Although the dump will be collected pretty fast and the memory 
released really early on allowing the machine to have the full memory 
available, this alleviates any issues that can be caused by having way too 
little memory on very very large systems during those few minutes.

-Manish



Paul Mackerras wrote:
 Manish Ahuja writes:
 
 B. It computers 5% of total ram and rounds it down to multiples of 256MB.
 C. Compares the rounded down value and returns larger of 256MB or the new
computed value.
 
 So if we have 10GB of memory or more we'll use reserve more than
 256MB.  What is the advantage of reserving more than 256MB of memory?
 
 Paul.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] pseries: phyp dump: Variable size reserve space.

2008-04-11 Thread Manish Ahuja
Reposting patch with following changes:
1. Changed phyp_dump_reserve_bootvar to just reserve_bootvar.
2. Changed 0x0001fff to 0x0fffUL.

Paulus,
If you think this is okay can you send this upstream ?
Many thanks,
Manish

A small proposed change in the amount of reserve space we allocate during boot.
Currently we reserve 256MB only. 
The proposed change does one of the 3 things.

A. It checks to see if there is boot variable set and if found sets the
   value to it.
B. It computers 5% of total ram and rounds it down to multiples of 256MB.
C. Compares the rounded down value and returns larger of 256MB or the new
   computed value.

Again this is for large systems who have excess memory. 

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

---
 arch/powerpc/kernel/prom.c |   35 +++--
 arch/powerpc/platforms/pseries/phyp_dump.c |9 +++
 include/asm-powerpc/phyp_dump.h|4 ++-
 3 files changed, 45 insertions(+), 3 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-04-02 
23:36:51.0 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-04-11 
23:54:34.0 -0500
@@ -496,3 +496,12 @@ static int __init early_phyp_dump_enable
 }
 early_param(phyp_dump, early_phyp_dump_enabled);
 
+/* Look for phyp_dump_reserve_size= cmdline option */
+static int __init early_phyp_dump_reserve_size(char *p)
+{
+if (p)
+   phyp_dump_info-reserve_bootvar = memparse(p, p);
+
+return 0;
+}
+early_param(phyp_dump_reserve_size, early_phyp_dump_reserve_size);
Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===
--- 2.6.25-rc1.orig/include/asm-powerpc/phyp_dump.h 2008-04-02 
23:36:49.0 -0500
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h  2008-04-11 23:53:10.0 
-0500
@@ -24,8 +24,10 @@ struct phyp_dump {
/* Memory that is reserved during very early boot. */
unsigned long init_reserve_start;
unsigned long init_reserve_size;
-   /* Check status during boot if dump supported, active  present*/
+   /* cmd line options during boot */
+   unsigned long reserve_bootvar;
unsigned long phyp_dump_at_boot;
+   /* Check status during boot if dump supported, active  present*/
unsigned long phyp_dump_configured;
unsigned long phyp_dump_is_active;
/* store cpu  hpte size */
Index: 2.6.25-rc1/arch/powerpc/kernel/prom.c
===
--- 2.6.25-rc1.orig/arch/powerpc/kernel/prom.c  2008-04-02 23:36:49.0 
-0500
+++ 2.6.25-rc1/arch/powerpc/kernel/prom.c   2008-04-11 23:53:48.0 
-0500
@@ -1042,6 +1042,33 @@ static void __init early_reserve_mem(voi
 
 #ifdef CONFIG_PHYP_DUMP
 /**
+ * phyp_dump_calculate_reserve_size() - reserve variable boot area 5% or arg
+ *
+ * Function to find the largest size we need to reserve
+ * during early boot process.
+ *
+ * It either looks for boot param and returns that OR
+ * returns larger of 256 or 5% rounded down to multiples of 256MB.
+ *
+ */
+static inline unsigned long phyp_dump_calculate_reserve_size(void)
+{
+   unsigned long tmp;
+
+   if (phyp_dump_info-reserve_bootvar)
+   return phyp_dump_info-reserve_bootvar;
+
+   /* divide by 20 to get 5% of value */
+   tmp = lmb_end_of_DRAM();
+   do_div(tmp, 20);
+
+   /* round it down in multiples of 256 */
+   tmp = tmp  ~0x0FFFUL;
+
+   return (tmp  PHYP_DUMP_RMR_END ? tmp : PHYP_DUMP_RMR_END);
+}
+
+/**
  * phyp_dump_reserve_mem() - reserve all not-yet-dumped mmemory
  *
  * This routine may reserve memory regions in the kernel only
@@ -1054,6 +1081,8 @@ static void __init early_reserve_mem(voi
 static void __init phyp_dump_reserve_mem(void)
 {
unsigned long base, size;
+   unsigned long variable_reserve_size;
+
if (!phyp_dump_info-phyp_dump_configured) {
printk(KERN_ERR Phyp-dump not supported on this hardware\n);
return;
@@ -1064,9 +1093,11 @@ static void __init phyp_dump_reserve_mem
return;
}
 
+   variable_reserve_size = phyp_dump_calculate_reserve_size();
+
if (phyp_dump_info-phyp_dump_is_active) {
/* Reserve *everything* above RMR.Area freed by userland tools*/
-   base = PHYP_DUMP_RMR_END;
+   base = variable_reserve_size;
size = lmb_end_of_DRAM() - base;
 
/* XXX crashed_ram_end is wrong, since it may be beyond
@@ -1078,7 +1109,7 @@ static void __init phyp_dump_reserve_mem
} else {
size = phyp_dump_info-cpu_state_size +
phyp_dump_info-hpte_region_size +
-   PHYP_DUMP_RMR_END

Re: [PATCH] pseries: phyp dump: Variable size reserve space.

2008-04-09 Thread Manish Ahuja
Olof Johansson wrote:
 These make for some really long variable names and lines. I know from
 experience, since I've picked unneccessary long driver names in the past
 myself. :)
 
 How about just naming the new variables reserve_bootvar, etc? The name
 of the struct they're in makes it obvious what they're for.
 

Yeah, I guess thats a good suggestion. Will truncate it.

 
 +static inline unsigned long phyp_dump_calculate_reserve_size(void)
 +{
 +unsigned long tmp;
 +
 +if (phyp_dump_info-phyp_dump_reserve_bootvar)
 +return phyp_dump_info-phyp_dump_reserve_bootvar;
 +
 +/* divide by 20 to get 5% of value */
 +tmp = lmb_end_of_DRAM();
 +do_div(tmp, 20);
 +
 +/* round it down in multiples of 256 */
 +tmp = tmp  ~0x1FFF;
 
 That's 512MB, isn't it?
 

No, its 5 % of memory and then rounded down to 256 MB multiples.
so if you 4GB its 256MB.
if you have 8 GB its 512 MB etc.

 
 -Olof

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] pseries: phyp dump: Variable size reserve space.

2008-04-09 Thread Manish Ahuja
Olof Johansson wrote:
 +static inline unsigned long phyp_dump_calculate_reserve_size(void)
 +{
 +unsigned long tmp;
 +
 +if (phyp_dump_info-phyp_dump_reserve_bootvar)
 +return phyp_dump_info-phyp_dump_reserve_bootvar;
 +
 +/* divide by 20 to get 5% of value */
 +tmp = lmb_end_of_DRAM();
 +do_div(tmp, 20);
 +
 +/* round it down in multiples of 256 */
 +tmp = tmp  ~0x1FFF;
 
 That's 512MB, isn't it?

My calculations in the example I gave in the last email were wrong.

In mentally did 10% instead of 5%. But the premise is same.

So assuming 5% of some memory is 400 MB, it rounds it down to 256MB etc.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] pseries: phyp dump: Variable size reserve space.

2008-04-09 Thread Manish Ahuja
Hmmm,

You are possibly right.

Okay I can check and fix that.

-Manish

Olof Johansson wrote:
 That's 512MB, isn't it?
 My calculations in the example I gave in the last email were wrong.

 In mentally did 10% instead of 5%. But the premise is same.

 So assuming 5% of some memory is 400 MB, it rounds it down to 256MB etc.
 
 But 0x1fff is 512MB, not 256MB. So you're rounding it down to a
 multiple of 512MB.
 
 
 -Olof

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump

2008-03-21 Thread Manish Ahuja
The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides 
documentation explaining what this is. 

A list of open issues / todo list is included in the documentation.
It also appears that the not-yet-released firmware versions this was tested 
on are still,incomplete; this work is also pending.

The following is a list of changes from previous version:
- Deleted ifdef CONFIG_PHYP_DUMP from early_init_dt_scan_phyp_dump function.
- Changed reserve_crashed_mem() to phyp_dump_reserve_mem() as suggested.
- Added #ifdef CONFIG_PHYP_DUMP around of_scan_flat_dt call, removed empty 
function
  from header file.
- Changed phyp_dump_global to phyp_dump_vars.
- Changed style issues at several places.

Manish  Linas.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/8] pseries: phyp dump: Documentation

2008-03-21 Thread Manish Ahuja

Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

---
 Documentation/powerpc/phyp-assisted-dump.txt |  127 +++
 1 file changed, 127 insertions(+)

Index: 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt 2008-02-18 
03:22:33.0 -0600
@@ -0,0 +1,127 @@
+
+   Hypervisor-Assisted Dump
+   
+   November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x1: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x1000, 0x1000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+ echo 0x4000 0x1000  /sys/kernel/release_region
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+--
+
+During boot, a check is made to see if firmware supports
+this feature on this particular machine. If it does, then
+we check to see if a active dump is waiting for us. If yes
+then everything but 256 MB of RAM is reserved during early
+boot. This area is released once we collect a dump from user
+land scripts that are run. If there is dump data, then
+the /sys/kernel/release_region file is created, and
+the reserved memory is held.
+
+If there is no waiting dump data, then only the highest
+256MB of the ram is reserved as a scratch area. This area
+is *not* be released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy
+of the low 256MB in the case a crash does occur. See,
+however, open issues below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+General notes:
+--
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead

[PATCH 2/8] pseries: phyp dump: reserve-release

2008-03-21 Thread Manish Ahuja

Initial patch for reserving memory in early boot, and freeing it later.
If the previous boot had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

---
 arch/powerpc/kernel/prom.c |   52 ++
 arch/powerpc/platforms/pseries/Makefile|1 
 arch/powerpc/platforms/pseries/phyp_dump.c |  103 +
 include/asm-powerpc/phyp_dump.h|   41 +++
 4 files changed, 197 insertions(+)

Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h  2008-03-21 23:37:11.0 
-0500
@@ -0,0 +1,41 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END   (1UL28)
+
+struct phyp_dump {
+   /* Memory that is reserved during very early boot. */
+   unsigned long init_reserve_start;
+   unsigned long init_reserve_size;
+   /* Check status during boot if dump supported, active  present*/
+   unsigned long phyp_dump_configured;
+   unsigned long phyp_dump_is_active;
+   /* store cpu  hpte size */
+   unsigned long cpu_state_size;
+   unsigned long hpte_region_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+int early_init_dt_scan_phyp_dump(unsigned long node,
+   const char *uname, int depth, void *data);
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-03-21 
23:37:12.0 -0500
@@ -0,0 +1,103 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include linux/init.h
+#include linux/mm.h
+#include linux/pfn.h
+#include linux/swap.h
+
+#include asm/page.h
+#include asm/phyp_dump.h
+#include asm/machdep.h
+#include asm/prom.h
+
+/* Variables, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_vars;
+struct phyp_dump *phyp_dump_info = phyp_dump_vars;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+   struct page *rpage;
+   unsigned long end_pfn;
+   long i;
+
+   end_pfn = start_pfn + nr_pages;
+
+   for (i = start_pfn; i = end_pfn; i++) {
+   rpage = pfn_to_page(i);
+   if (PageReserved(rpage)) {
+   ClearPageReserved(rpage);
+   init_page_count(rpage);
+   __free_page(rpage);
+   totalram_pages++;
+   }
+   }
+}
+
+static int __init phyp_dump_setup(void)
+{
+   unsigned long start_pfn, nr_pages;
+
+   /* If no memory was reserved in early boot, there is nothing to do */
+   if (phyp_dump_info-init_reserve_size == 0)
+   return 0;
+
+   /* Release memory that was reserved in early boot */
+   start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
+   nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
+   release_memory_range(start_pfn, nr_pages);
+
+   return 0;
+}
+machine_subsys_initcall(pseries, phyp_dump_setup);
+
+int __init early_init_dt_scan_phyp_dump(unsigned long node,
+   const char *uname, int depth, void *data)
+{
+   const unsigned int *sizes;
+
+   phyp_dump_info-phyp_dump_configured = 0;
+   phyp_dump_info-phyp_dump_is_active = 0;
+
+   if (depth != 1 || strcmp(uname, rtas) != 0)
+   return

[PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem

2008-03-21 Thread Manish Ahuja

Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo 0x4000 0x1000  /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   81 +++--
 1 file changed, 76 insertions(+), 5 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-03-21 
00:10:15.0 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-03-21 
22:39:21.0 -0500
@@ -12,19 +12,24 @@
  */
 
 #include linux/init.h
+#include linux/kobject.h
 #include linux/mm.h
+#include linux/of.h
 #include linux/pfn.h
 #include linux/swap.h
+#include linux/sysfs.h
 
 #include asm/page.h
 #include asm/phyp_dump.h
 #include asm/machdep.h
 #include asm/prom.h
+#include asm/rtas.h
 
 /* Variables, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_vars;
 struct phyp_dump *phyp_dump_info = phyp_dump_vars;
 
+/* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -54,18 +59,84 @@ release_memory_range(unsigned long start
}
 }
 
-static int __init phyp_dump_setup(void)
+/* - */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   echo start addr length  /sys/kernel/release_region
+ *
+ * Example:
+ *   echo 0x4000 0x1000  /sys/kernel/release_region
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t store_release_region(struct kobject *kobj,
+   struct kobj_attribute *attr,
+   const char *buf, size_t count)
 {
+   unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+   ssize_t ret;
+
+   ret = sscanf(buf, %lx %lx, start_addr, length);
+   if (ret != 2)
+   return -EINVAL;
+
+   /* Range-check - don't free any reserved memory that
+* wasn't reserved for phyp-dump */
+   if (start_addr  phyp_dump_info-init_reserve_start)
+   start_addr = phyp_dump_info-init_reserve_start;
+
+   end_addr = phyp_dump_info-init_reserve_start +
+   phyp_dump_info-init_reserve_size;
+   if (start_addr+length  end_addr)
+   length = end_addr - start_addr;
+
+   /* Release the region of memory assed in by user */
+   start_pfn = PFN_DOWN(start_addr);
+   nr_pages = PFN_DOWN(length);
+   release_memory_range(start_pfn, nr_pages);
+
+   return count;
+}
+
+static struct kobj_attribute rr = __ATTR(release_region, 0600,
+NULL, store_release_region);
+
+static int __init phyp_dump_setup(void)
+{
+   struct device_node *rtas;
+   const int *dump_header = NULL;
+   int header_len = 0;
+   int rc;
 
/* If no memory was reserved in early boot, there is nothing to do */
if (phyp_dump_info-init_reserve_size == 0)
return 0;
 
-   /* Release memory that was reserved in early boot */
-   start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
-   nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
-   release_memory_range(start_pfn, nr_pages);
+   /* Return if phyp dump not supported */
+   if (!phyp_dump_info-phyp_dump_configured)
+   return -ENOSYS;
+
+   /* Is there dump data waiting for us? */
+   rtas = of_find_node_by_path(/rtas);
+   if (rtas) {
+   dump_header = of_get_property(rtas, ibm,kernel-dump,
+   header_len);
+   of_node_put(rtas);
+   }
+
+   if (dump_header == NULL)
+   return 0;
+
+   /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+   rc = sysfs_create_file(kernel_kobj, rr.attr);
+   if (rc) {
+   printk(KERN_ERR phyp-dump: unable to create sysfs file (%d)\n,
+   rc);
+   return 0;
+   }
 
return 0;
 }
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 4/8] pseries: phyp dump: register dump area.

2008-03-21 Thread Manish Ahuja


Set up the actual dump header, register it with the hypervisor.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

---
 arch/powerpc/platforms/pseries/phyp_dump.c |  137 +++--
 1 file changed, 131 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-03-21 
22:39:21.0 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-03-21 
22:52:53.0 -0500
@@ -29,6 +29,117 @@
 static struct phyp_dump phyp_dump_vars;
 struct phyp_dump *phyp_dump_info = phyp_dump_vars;
 
+static int ibm_configure_kernel_dump;
+/* - */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+   u32 dump_flags;
+   u16 source_type;
+   u16 error_flags;
+   u64 source_address;
+   u64 source_length;
+   u64 length_copied;
+   u64 destination_address;
+};
+
+struct phyp_dump_header {
+   u32 version;
+   u16 num_of_sections;
+   u16 status;
+
+   u32 first_offset_section;
+   u32 dump_disk_section;
+   u64 block_num_dd;
+   u64 num_of_blocks_dd;
+   u32 offset_dd;
+   u32 maxtime_to_auto;
+   /* No dump disk path string used */
+
+   struct dump_section cpu_data;
+   struct dump_section hpte_data;
+   struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS  3
+#define DUMP_HEADER_VERSION0x1
+#define DUMP_REQUEST_FLAG  0x1
+#define DUMP_SOURCE_CPU0x0001
+#define DUMP_SOURCE_HPTE   0x0002
+#define DUMP_SOURCE_RMO0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+   unsigned long addr_offset = 0;
+
+   /* Set up the dump header */
+   ph-version = DUMP_HEADER_VERSION;
+   ph-num_of_sections = NUM_DUMP_SECTIONS;
+   ph-status = 0;
+
+   ph-first_offset_section =
+   (u32)offsetof(struct phyp_dump_header, cpu_data);
+   ph-dump_disk_section = 0;
+   ph-block_num_dd = 0;
+   ph-num_of_blocks_dd = 0;
+   ph-offset_dd = 0;
+
+   ph-maxtime_to_auto = 0; /* disabled */
+
+   /* The first two sections are mandatory */
+   ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-cpu_data.source_type = DUMP_SOURCE_CPU;
+   ph-cpu_data.source_address = 0;
+   ph-cpu_data.source_length = phyp_dump_info-cpu_state_size;
+   ph-cpu_data.destination_address = addr_offset;
+   addr_offset += phyp_dump_info-cpu_state_size;
+
+   ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-hpte_data.source_type = DUMP_SOURCE_HPTE;
+   ph-hpte_data.source_address = 0;
+   ph-hpte_data.source_length = phyp_dump_info-hpte_region_size;
+   ph-hpte_data.destination_address = addr_offset;
+   addr_offset += phyp_dump_info-hpte_region_size;
+
+   /* This section describes the low kernel region */
+   ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-kernel_data.source_type = DUMP_SOURCE_RMO;
+   ph-kernel_data.source_address = PHYP_DUMP_RMR_START;
+   ph-kernel_data.source_length = PHYP_DUMP_RMR_END;
+   ph-kernel_data.destination_address = addr_offset;
+   addr_offset += ph-kernel_data.source_length;
+
+   return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+   1, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc)
+   printk(KERN_ERR phyp-dump: unexpected error (%d) on 
+   register\n, rc);
+}
+
 /* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -107,7 +218,9 @@ static struct kobj_attribute rr = __ATTR
 static int __init phyp_dump_setup(void)
 {
struct device_node *rtas;
-   const int *dump_header = NULL;
+   const struct phyp_dump_header *dump_header = NULL;
+   unsigned long dump_area_start;
+   unsigned long dump_area_length;
int header_len = 0;
int rc;
 
@@ -119,7 +232,13 @@ static int __init phyp_dump_setup(void

[PATCH 5/8] pseries: phyp dump: debugging print routines.

2008-03-21 Thread Manish Ahuja


Provide some basic debugging support.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
---

 arch/powerpc/platforms/pseries/phyp_dump.c |   61 -
 1 file changed, 59 insertions(+), 2 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-03-21 
22:52:53.0 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-03-21 
22:54:44.0 -0500
@@ -123,6 +123,61 @@ static unsigned long init_dump_header(st
return addr_offset;
 }
 
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+#ifdef DEBUG
+   printk(KERN_INFO dump header:\n);
+   /* setup some ph-sections required */
+   printk(KERN_INFO version = %d\n, ph-version);
+   printk(KERN_INFO Sections = %d\n, ph-num_of_sections);
+   printk(KERN_INFO Status = 0x%x\n, ph-status);
+
+   /* No ph-disk, so all should be set to 0 */
+   printk(KERN_INFO Offset to first section 0x%x\n,
+   ph-first_offset_section);
+   printk(KERN_INFO dump disk sections should be zero\n);
+   printk(KERN_INFO dump disk section = %d\n, ph-dump_disk_section);
+   printk(KERN_INFO block num = %ld\n, ph-block_num_dd);
+   printk(KERN_INFO number of blocks = %ld\n, ph-num_of_blocks_dd);
+   printk(KERN_INFO dump disk offset = %d\n, ph-offset_dd);
+   printk(KERN_INFO Max auto time= %d\n, ph-maxtime_to_auto);
+
+   /*set cpu state and hpte states as well scratch pad area */
+   printk(KERN_INFO  CPU AREA \n);
+   printk(KERN_INFO cpu dump_flags =%d\n, ph-cpu_data.dump_flags);
+   printk(KERN_INFO cpu source_type =%d\n, ph-cpu_data.source_type);
+   printk(KERN_INFO cpu error_flags =%d\n, ph-cpu_data.error_flags);
+   printk(KERN_INFO cpu source_address =%lx\n,
+   ph-cpu_data.source_address);
+   printk(KERN_INFO cpu source_length =%lx\n,
+   ph-cpu_data.source_length);
+   printk(KERN_INFO cpu length_copied =%lx\n,
+   ph-cpu_data.length_copied);
+
+   printk(KERN_INFO  HPTE AREA \n);
+   printk(KERN_INFO HPTE dump_flags =%d\n, ph-hpte_data.dump_flags);
+   printk(KERN_INFO HPTE source_type =%d\n, ph-hpte_data.source_type);
+   printk(KERN_INFO HPTE error_flags =%d\n, ph-hpte_data.error_flags);
+   printk(KERN_INFO HPTE source_address =%lx\n,
+   ph-hpte_data.source_address);
+   printk(KERN_INFO HPTE source_length =%lx\n,
+   ph-hpte_data.source_length);
+   printk(KERN_INFO HPTE length_copied =%lx\n,
+   ph-hpte_data.length_copied);
+
+   printk(KERN_INFO  SRSD AREA \n);
+   printk(KERN_INFO SRSD dump_flags =%d\n, ph-kernel_data.dump_flags);
+   printk(KERN_INFO SRSD source_type =%d\n, ph-kernel_data.source_type);
+   printk(KERN_INFO SRSD error_flags =%d\n, ph-kernel_data.error_flags);
+   printk(KERN_INFO SRSD source_address =%lx\n,
+   ph-kernel_data.source_address);
+   printk(KERN_INFO SRSD source_length =%lx\n,
+   ph-kernel_data.source_length);
+   printk(KERN_INFO SRSD length_copied =%lx\n,
+   ph-kernel_data.length_copied);
+#endif
+}
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
@@ -135,9 +190,11 @@ static void register_dump_area(struct ph
1, ph, sizeof(struct phyp_dump_header));
} while (rtas_busy_delay(rc));
 
-   if (rc)
+   if (rc) {
printk(KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+   print_dump_header(ph);
+   }
 }
 
 /* - */
@@ -246,8 +303,8 @@ static int __init phyp_dump_setup(void)
of_node_put(rtas);
}
 
+   print_dump_header(dump_header);
dump_area_length = init_dump_header(phdr);
-
/* align down */
dump_area_start = phyp_dump_info-init_reserve_start  PAGE_MASK;
 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.

2008-03-21 Thread Manish Ahuja



Routines to 
a. invalidate dump 
b. Calculate region that is reserved and needs to be freed. This is 
   exported through sysfs interface.

Unregister has been removed for now as it wasn't being used.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   83 ++---
 include/asm-powerpc/phyp_dump.h|3 +
 2 files changed, 80 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-03-20 
21:52:59.0 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-03-20 
21:55:52.0 -0500
@@ -70,6 +70,10 @@ static struct phyp_dump_header phdr;
 #define DUMP_SOURCE_CPU0x0001
 #define DUMP_SOURCE_HPTE   0x0002
 #define DUMP_SOURCE_RMO0x0011
+#define DUMP_ERROR_FLAG0x2000
+#define DUMP_TRIGGERED 0x4000
+#define DUMP_PERFORMED 0x8000
+
 
 /**
  * init_dump_header() - initialize the header declaring a dump
@@ -181,9 +185,15 @@ static void print_dump_header(const stru
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
-   ph-cpu_data.destination_address += addr;
-   ph-hpte_data.destination_address += addr;
-   ph-kernel_data.destination_address += addr;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   /* ToDo Invalidate kdump and free memory range. */
 
do {
rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
@@ -197,6 +207,30 @@ static void register_dump_area(struct ph
}
 }
 
+static
+void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+   2, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc) {
+   printk(KERN_ERR phyp-dump: unexpected error (%d) 
+   on invalidate\n, rc);
+   print_dump_header(ph);
+   }
+}
+
 /* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -207,8 +241,8 @@ static void register_dump_area(struct ph
  * lmb_reserved in early boot. The released memory becomes
  * available for genreal use.
  */
-static void
-release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+static void release_memory_range(unsigned long start_pfn,
+   unsigned long nr_pages)
 {
struct page *rpage;
unsigned long end_pfn;
@@ -269,8 +303,29 @@ static ssize_t store_release_region(stru
return count;
 }
 
+static ssize_t show_release_region(struct kobject *kobj,
+   struct kobj_attribute *attr, char *buf)
+{
+   u64 second_addr_range;
+
+   /* total reserved size - start of scratch area */
+   second_addr_range = phyp_dump_info-init_reserve_size -
+   phyp_dump_info-reserved_scratch_size;
+   return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:
+DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n,
+   phdr.cpu_data.destination_address,
+   phdr.cpu_data.length_copied,
+   phdr.hpte_data.destination_address,
+   phdr.hpte_data.length_copied,
+   phdr.kernel_data.destination_address,
+   phdr.kernel_data.length_copied,
+   phyp_dump_info-init_reserve_start,
+   second_addr_range);
+}
+
 static struct kobj_attribute rr = __ATTR(release_region, 0600,
-NULL, store_release_region);
+   show_release_region,
+   store_release_region);
 
 static int __init phyp_dump_setup(void)
 {
@@ -313,6 +368,22 @@ static int __init phyp_dump_setup(void)
return 0;
}
 
+   /* re-register the dump area, if old dump was invalid */
+   if ((dump_header)  (dump_header-status  DUMP_ERROR_FLAG)) {
+   invalidate_last_dump(phdr, dump_area_start);
+   register_dump_area(phdr, dump_area_start);
+   return 0

[PATCH 7/8] pseries: phyp dump: Tracking memory range freed.

2008-03-21 Thread Manish Ahuja



This patch tracks the size freed. For now it does a simple
rudimentary calculation of the ranges freed. The idea is
to keep it simple at the external shell script level and 
send in large chunks for now.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   35 +
 1 file changed, 35 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-03-21 
22:14:00.0 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-03-21 
22:14:05.0 -0500
@@ -261,6 +261,39 @@ static void release_memory_range(unsigne
}
 }
 
+/**
+ * track_freed_range -- Counts the range being freed.
+ * Once the counter goes to zero, it re-registers dump for
+ * future use.
+ */
+static void
+track_freed_range(unsigned long addr, unsigned long length)
+{
+   static unsigned long scratch_area_size, reserved_area_size;
+
+   if (addr  phyp_dump_info-init_reserve_start)
+   return;
+
+   if ((addr = phyp_dump_info-init_reserve_start) 
+   (addr = phyp_dump_info-init_reserve_start +
+phyp_dump_info-init_reserve_size))
+   reserved_area_size += length;
+
+   if ((addr = phyp_dump_info-reserved_scratch_addr) 
+   (addr = phyp_dump_info-reserved_scratch_addr +
+phyp_dump_info-reserved_scratch_size))
+   scratch_area_size += length;
+
+   if ((reserved_area_size == phyp_dump_info-init_reserve_size) 
+   (scratch_area_size == phyp_dump_info-reserved_scratch_size)) {
+
+   invalidate_last_dump(phdr,
+   phyp_dump_info-reserved_scratch_addr);
+   register_dump_area(phdr,
+   phyp_dump_info-reserved_scratch_addr);
+   }
+}
+
 /* - */
 /**
  * sysfs_release_region -- sysfs interface to release memory range.
@@ -285,6 +318,8 @@ static ssize_t store_release_region(stru
if (ret != 2)
return -EINVAL;
 
+   track_freed_range(start_addr, length);
+
/* Range-check - don't free any reserved memory that
 * wasn't reserved for phyp-dump */
if (start_addr  phyp_dump_info-init_reserve_start)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 8/8] pseries: phyp dump: config file

2008-03-21 Thread Manish Ahuja

Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
---
 arch/powerpc/Kconfig |   10 ++
 1 file changed, 10 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/Kconfig
===
--- 2.6.25-rc1.orig/arch/powerpc/Kconfig2008-03-20 20:53:33.0 
-0500
+++ 2.6.25-rc1/arch/powerpc/Kconfig 2008-03-20 21:06:29.0 -0500
@@ -306,6 +306,16 @@ config CRASH_DUMP
 
  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+   bool Hypervisor-assisted dump (EXPERIMENTAL)
+   depends on PPC_PSERIES  EXPERIMENTAL
+   help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say N
+
 config PPCBUG_NVRAM
bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC
default y if PPC_PREP
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/2] pseries: phyp dump: Disable phyp-dump through boot-var.

2008-03-21 Thread Manish Ahuja

The goal of these 2 patches is to ensure that there is only one dumping
mechanism enabled at any given time. These patches depend upon phyp-dump
patches posted earlier.

Patch 1:

Addition of boot-variable phyp_dump, which takes values [0/1] for disabling/
enabling phyp_dump at boot time. Kdump can use this on cmdline (phyp_dump=0) 
to disable phyp-dump during boot when enabling itself. This will ensure only
one dumping mechanism is active at any given time.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

---
 arch/powerpc/kernel/prom.c |5 +
 arch/powerpc/platforms/pseries/phyp_dump.c |   18 ++
 include/asm-powerpc/phyp_dump.h|1 +
 3 files changed, 24 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-03-22 
00:42:02.0 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-03-22 
01:07:43.0 -0500
@@ -460,3 +460,21 @@ int __init early_init_dt_scan_phyp_dump(
*((unsigned long *)sizes[4]);
return 1;
 }
+
+/* Look for phyp_dump= cmdline option */
+static int __init early_phyp_dump_enabled(char *p)
+{
+   phyp_dump_info-phyp_dump_at_boot = 1;
+
+if (!p)
+return 0;
+
+if (strncmp(p, 1, 1) == 0)
+   phyp_dump_info-phyp_dump_at_boot = 1;
+else if (strncmp(p, 0, 1) == 0)
+   phyp_dump_info-phyp_dump_at_boot = 0;
+
+return 0;
+}
+early_param(phyp_dump, early_phyp_dump_enabled);
+
Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===
--- 2.6.25-rc1.orig/include/asm-powerpc/phyp_dump.h 2008-03-22 
00:42:02.0 -0500
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h  2008-03-22 00:42:08.0 
-0500
@@ -25,6 +25,7 @@ struct phyp_dump {
unsigned long init_reserve_start;
unsigned long init_reserve_size;
/* Check status during boot if dump supported, active  present*/
+   unsigned long phyp_dump_at_boot;
unsigned long phyp_dump_configured;
unsigned long phyp_dump_is_active;
/* store cpu  hpte size */
Index: 2.6.25-rc1/arch/powerpc/kernel/prom.c
===
--- 2.6.25-rc1.orig/arch/powerpc/kernel/prom.c  2008-03-22 00:42:02.0 
-0500
+++ 2.6.25-rc1/arch/powerpc/kernel/prom.c   2008-03-22 00:42:54.0 
-0500
@@ -1059,6 +1059,11 @@ static void __init phyp_dump_reserve_mem
return;
}
 
+   if (!phyp_dump_info-phyp_dump_at_boot) {
+   printk(KERN_INFO Phyp-dump disabled at boot time\n);
+   return;
+   }
+
if (phyp_dump_info-phyp_dump_is_active) {
/* Reserve *everything* above RMR.Area freed by userland tools*/
base = PHYP_DUMP_RMR_END;
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 2/2] pseries: phyp dump: inform kdump, phyp-dump is loaded.

2008-03-21 Thread Manish Ahuja

Patch 2:

Addition of /sys/kernel/phyp_dump_active so that kdump init scripts may 
look for it and take appropriate action if this file is found. This
file is only loaded when phyp_dump has been registered.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   18 ++
 1 file changed, 18 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-03-22 
01:07:43.0 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-03-22 
01:08:56.0 -0500
@@ -182,6 +182,18 @@ static void print_dump_header(const stru
 #endif
 }
 
+static ssize_t show_phyp_dump_active(struct kobject *kobj,
+   struct kobj_attribute *attr, char *buf)
+{
+
+   /* create filesystem entry so kdump is phyp-dump aware */
+   return sprintf(buf, %lx\n, phyp_dump_info-phyp_dump_at_boot);
+}
+
+static struct kobj_attribute pdl = __ATTR(phyp_dump_active, 0600,
+   show_phyp_dump_active,
+   NULL);
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
@@ -204,7 +216,13 @@ static void register_dump_area(struct ph
printk(KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
print_dump_header(ph);
+   return;
}
+
+   rc = sysfs_create_file(kernel_kobj, pdl.attr);
+   if (rc)
+   printk(KERN_ERR phyp-dump: unable to create sysfs
+file (%d)\n, rc);
 }
 
 static
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 8/8] pseries: phyp dump: config file

2008-03-12 Thread Manish Ahuja
Thanks for the review. I will try and make the recommended changes and repost 
it soon.

Manish


Paul Mackerras wrote:
 Manish Ahuja writes:
 
 +config PHYP_DUMP
 +bool Hypervisor-assisted dump (EXPERIMENTAL)
 +depends on PPC_PSERIES  EXPERIMENTAL
 +default y
 
 I think this should default to n for now (i.e. leave out the default
 line entirely).  We can make it default to y later.
 
 Paul.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept

2008-03-12 Thread Manish Ahuja
If Mike and Paul are okay, then I will leave this bit as is and fix all 
other issues and comments.

Thanks,
Manish



Linas Vepstas wrote:
 On 10/03/2008, Michael Ellerman [EMAIL PROTECTED] wrote:
 On Thu, 2008-02-28 at 18:24 -0600, Manish Ahuja wrote:
 
   +
   +/* Global, used to communicate data between early boot and late boot */
   +static struct phyp_dump phyp_dump_global;
   +struct phyp_dump *phyp_dump_info = phyp_dump_global;

 I don't see the point of this. You have a static (ie. non-global) struct
  called phyp_dump_global, then you create a pointer to it and pass that
  around.
 
 I did this. This is a style used to minimize disruption due to future
 design changes. Basically, the idea is that, at some later time, for
 some unknown reason, we decide that this structure shouldn't
 be global, or maybe shouldn't be statically allocated, or maybe
 should be per-cpu, or who knows.  By creating a pointer, and
 just passing that around, you isolate other code from this change.
 
 I learned this trick after spending too many months of my life hunting
 down globals and replacing them by dynamically allocated structs.
 Its a long and painful process, on many levels, often requiring major
 code restructuring.  Code that touches globals directly is often
 poorly thought out, designed.  But going in the opposite direction
 is easy: if your code always passes everything it needs as args
 to subroutines,  then you are free  clear ... if one of those args
 just happens to be a pointer to a global, there's no loss (not even
 a performance loss -- the arg passing overhead is about the same
 as a global TOC lookup!)
 
 So it may look weird if you're not used to seeing it; but the alternative
 is almost always worse.
 
 --linas


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump

2008-02-28 Thread Manish Ahuja
Changes from previous version:

The only changes are in patch 2.
moved early_init_dt_scan_phyp_dump from rtas.c to phyp_dump.c
Added dummy function in phyp_dump.h

Patch 3 required repatching due to changes to patch 2.
Resubmitting all patches to avoid confusion.

Thanks,
Manish


Michael Ellerman wrote:
 On Sun, 2008-02-17 at 22:53 -0600, Manish Ahuja wrote:
 The following series of patches implement a basic framework
 for hypervisor-assisted dump. The very first patch provides 
 documentation explaining what this is  :-) . Yes, its supposed
 to be an improvement over kdump.

 A list of open issues / todo list is included in the documentation.
 It also appears that the not-yet-released firmware versions this was tested 
 on are still, ahem, incomplete; this work is also pending.

 I have included most of the changes requested. Although, I did find
 one or two, fixed in a later patch file rather than the first location
 they appeared at.
 
 This series still doesn't build on !CONFIG_RTAS configs:
 http://kisskb.ellerman.id.au/kisskb/head/629/
 
 This solution is to move early_init_dt_scan_phyp_dump() into
 arch/powerpc/platforms/pseries/phyp_dump.c and provide a dummy
 implementation in asm-powerpc/phyp_dump.c for the !CONFIG_PHYP_DUMP
 case.
 
 cheers
 

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/8] pseries: phyp dump: Docmentation

2008-02-28 Thread Manish Ahuja

Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Manish Ahuja [EMAIL PROTECTED]


 Documentation/powerpc/phyp-assisted-dump.txt |  127 +++
 1 file changed, 127 insertions(+)

Index: 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt 2008-02-18 
03:22:33.0 -0600
@@ -0,0 +1,127 @@
+
+   Hypervisor-Assisted Dump
+   
+   November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x1: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x1000, 0x1000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+ echo 0x4000 0x1000  /sys/kernel/release_region
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+--
+
+During boot, a check is made to see if firmware supports
+this feature on this particular machine. If it does, then
+we check to see if a active dump is waiting for us. If yes
+then everything but 256 MB of RAM is reserved during early
+boot. This area is released once we collect a dump from user
+land scripts that are run. If there is dump data, then
+the /sys/kernel/release_region file is created, and
+the reserved memory is held.
+
+If there is no waiting dump data, then only the highest
+256MB of the ram is reserved as a scratch area. This area
+is *not* be released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy
+of the low 256MB in the case a crash does occur. See,
+however, open issues below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+General notes:
+--
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead

[PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept

2008-02-28 Thread Manish Ahuja

Initial patch for reserving memory in early boot, and freeing it later.
If the previous boot had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 arch/powerpc/kernel/prom.c |   49 +
 arch/powerpc/platforms/pseries/Makefile|1 
 arch/powerpc/platforms/pseries/phyp_dump.c |  105 +
 include/asm-powerpc/phyp_dump.h|   44 
 4 files changed, 199 insertions(+)

Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h  2008-02-28 22:05:25.0 
-0600
@@ -0,0 +1,44 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END   (1UL28)
+
+struct phyp_dump {
+   /* Memory that is reserved during very early boot. */
+   unsigned long init_reserve_start;
+   unsigned long init_reserve_size;
+   /* Check status during boot if dump supported, active  present*/
+   unsigned long phyp_dump_configured;
+   unsigned long phyp_dump_is_active;
+   /* store cpu  hpte size */
+   unsigned long cpu_state_size;
+   unsigned long hpte_region_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+int early_init_dt_scan_phyp_dump(unsigned long node,
+   const char *uname, int depth, void *data);
+#else /* CONFIG_PHYP_DUMP */
+int early_init_dt_scan_phyp_dump(unsigned long node,
+   const char *uname, int depth, void *data) { return 0; }
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-28 
21:57:52.0 -0600
@@ -0,0 +1,105 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include linux/init.h
+#include linux/mm.h
+#include linux/pfn.h
+#include linux/swap.h
+
+#include asm/page.h
+#include asm/phyp_dump.h
+#include asm/machdep.h
+#include asm/prom.h
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+   struct page *rpage;
+   unsigned long end_pfn;
+   long i;
+
+   end_pfn = start_pfn + nr_pages;
+
+   for (i = start_pfn; i = end_pfn; i++) {
+   rpage = pfn_to_page(i);
+   if (PageReserved(rpage)) {
+   ClearPageReserved(rpage);
+   init_page_count(rpage);
+   __free_page(rpage);
+   totalram_pages++;
+   }
+   }
+}
+
+static int __init phyp_dump_setup(void)
+{
+   unsigned long start_pfn, nr_pages;
+
+   /* If no memory was reserved in early boot, there is nothing to do */
+   if (phyp_dump_info-init_reserve_size == 0)
+   return 0;
+
+   /* Release memory that was reserved in early boot */
+   start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
+   nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
+   release_memory_range(start_pfn, nr_pages);
+
+   return 0;
+}
+machine_subsys_initcall(pseries, phyp_dump_setup);
+
+int __init early_init_dt_scan_phyp_dump(unsigned long node,
+   const char *uname, int depth, void *data)
+{
+#ifdef CONFIG_PHYP_DUMP
+   const unsigned int

[PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem

2008-02-28 Thread Manish Ahuja

Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo 0x4000 0x1000  /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |   82 +++--
 1 file changed, 77 insertions(+), 5 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-28 
21:57:52.0 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-28 
23:36:01.0 -0600
@@ -12,19 +12,25 @@
  */
 
 #include linux/init.h
+#include linux/kobject.h
 #include linux/mm.h
+#include linux/of.h
 #include linux/pfn.h
 #include linux/swap.h
+#include linux/sysfs.h
 
 #include asm/page.h
 #include asm/phyp_dump.h
 #include asm/machdep.h
 #include asm/prom.h
+#include asm/rtas.h
+
 
 /* Global, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = phyp_dump_global;
 
+/* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -54,18 +60,84 @@ release_memory_range(unsigned long start
}
 }
 
-static int __init phyp_dump_setup(void)
+/* - */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   echo start addr length  /sys/kernel/release_region
+ *
+ * Example:
+ *   echo 0x4000 0x1000  /sys/kernel/release_region
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t store_release_region(struct kobject *kobj,
+   struct kobj_attribute *attr,
+   const char *buf, size_t count)
 {
+   unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+   ssize_t ret;
+
+   ret = sscanf(buf, %lx %lx, start_addr, length);
+   if (ret != 2)
+   return -EINVAL;
+
+   /* Range-check - don't free any reserved memory that
+* wasn't reserved for phyp-dump */
+   if (start_addr  phyp_dump_info-init_reserve_start)
+   start_addr = phyp_dump_info-init_reserve_start;
+
+   end_addr = phyp_dump_info-init_reserve_start +
+   phyp_dump_info-init_reserve_size;
+   if (start_addr+length  end_addr)
+   length = end_addr - start_addr;
+
+   /* Release the region of memory assed in by user */
+   start_pfn = PFN_DOWN(start_addr);
+   nr_pages = PFN_DOWN(length);
+   release_memory_range(start_pfn, nr_pages);
+
+   return count;
+}
+
+static struct kobj_attribute rr = __ATTR(release_region, 0600,
+NULL, store_release_region);
+
+static int __init phyp_dump_setup(void)
+{
+   struct device_node *rtas;
+   const int *dump_header = NULL;
+   int header_len = 0;
+   int rc;
 
/* If no memory was reserved in early boot, there is nothing to do */
if (phyp_dump_info-init_reserve_size == 0)
return 0;
 
-   /* Release memory that was reserved in early boot */
-   start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
-   nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
-   release_memory_range(start_pfn, nr_pages);
+   /* Return if phyp dump not supported */
+   if (!phyp_dump_info-phyp_dump_configured)
+   return -ENOSYS;
+
+   /* Is there dump data waiting for us? */
+   rtas = of_find_node_by_path(/rtas);
+   if (rtas) {
+   dump_header = of_get_property(rtas, ibm,kernel-dump,
+   header_len);
+   of_node_put(rtas);
+   }
+
+   if (dump_header == NULL)
+   return 0;
+
+   /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+   rc = sysfs_create_file(kernel_kobj, rr.attr);
+   if (rc) {
+   printk(KERN_ERR phyp-dump: unable to create sysfs file (%d)\n,
+   rc);
+   return 0;
+   }
 
return 0;
 }
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 4/8] pseries: phyp dump: register dump area.

2008-02-28 Thread Manish Ahuja

Set up the actual dump header, register it with the hypervisor.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |  137 +++--
 1 file changed, 131 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-28 
23:36:01.0 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-28 
23:36:42.0 -0600
@@ -30,6 +30,117 @@
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = phyp_dump_global;
 
+static int ibm_configure_kernel_dump;
+/* - */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+   u32 dump_flags;
+   u16 source_type;
+   u16 error_flags;
+   u64 source_address;
+   u64 source_length;
+   u64 length_copied;
+   u64 destination_address;
+};
+
+struct phyp_dump_header {
+   u32 version;
+   u16 num_of_sections;
+   u16 status;
+
+   u32 first_offset_section;
+   u32 dump_disk_section;
+   u64 block_num_dd;
+   u64 num_of_blocks_dd;
+   u32 offset_dd;
+   u32 maxtime_to_auto;
+   /* No dump disk path string used */
+
+   struct dump_section cpu_data;
+   struct dump_section hpte_data;
+   struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO  0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+   unsigned long addr_offset = 0;
+
+   /* Set up the dump header */
+   ph-version = DUMP_HEADER_VERSION;
+   ph-num_of_sections = NUM_DUMP_SECTIONS;
+   ph-status = 0;
+
+   ph-first_offset_section =
+   (u32)offsetof(struct phyp_dump_header, cpu_data);
+   ph-dump_disk_section = 0;
+   ph-block_num_dd = 0;
+   ph-num_of_blocks_dd = 0;
+   ph-offset_dd = 0;
+
+   ph-maxtime_to_auto = 0; /* disabled */
+
+   /* The first two sections are mandatory */
+   ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-cpu_data.source_type = DUMP_SOURCE_CPU;
+   ph-cpu_data.source_address = 0;
+   ph-cpu_data.source_length = phyp_dump_info-cpu_state_size;
+   ph-cpu_data.destination_address = addr_offset;
+   addr_offset += phyp_dump_info-cpu_state_size;
+
+   ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-hpte_data.source_type = DUMP_SOURCE_HPTE;
+   ph-hpte_data.source_address = 0;
+   ph-hpte_data.source_length = phyp_dump_info-hpte_region_size;
+   ph-hpte_data.destination_address = addr_offset;
+   addr_offset += phyp_dump_info-hpte_region_size;
+
+   /* This section describes the low kernel region */
+   ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-kernel_data.source_type = DUMP_SOURCE_RMO;
+   ph-kernel_data.source_address = PHYP_DUMP_RMR_START;
+   ph-kernel_data.source_length = PHYP_DUMP_RMR_END;
+   ph-kernel_data.destination_address = addr_offset;
+   addr_offset += ph-kernel_data.source_length;
+
+   return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+   1, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc)
+   printk(KERN_ERR phyp-dump: unexpected error (%d) on 
+   register\n, rc);
+}
+
 /* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -108,7 +219,9 @@ static struct kobj_attribute rr = __ATTR
 static int __init phyp_dump_setup(void)
 {
struct device_node *rtas;
-   const int *dump_header = NULL;
+   const struct phyp_dump_header *dump_header = NULL;
+   unsigned long dump_area_start;
+   unsigned long dump_area_length;
int header_len = 0;
int rc;
 
@@ -120,7 +233,13 @@ static int __init phyp_dump_setup(void)
if (!phyp_dump_info-phyp_dump_configured

[PATCH 5/8] pseries: phyp dump: debugging print routines.

2008-02-28 Thread Manish Ahuja

Provide some basic debugging support.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
-

 arch/powerpc/platforms/pseries/phyp_dump.c |   61 -
 1 file changed, 59 insertions(+), 2 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-28 
23:36:42.0 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-28 
23:36:45.0 -0600
@@ -124,6 +124,61 @@ static unsigned long init_dump_header(st
return addr_offset;
 }
 
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+#ifdef DEBUG
+   printk(KERN_INFO dump header:\n);
+   /* setup some ph-sections required */
+   printk(KERN_INFO version = %d\n, ph-version);
+   printk(KERN_INFO Sections = %d\n, ph-num_of_sections);
+   printk(KERN_INFO Status = 0x%x\n, ph-status);
+
+   /* No ph-disk, so all should be set to 0 */
+   printk(KERN_INFO Offset to first section 0x%x\n,
+   ph-first_offset_section);
+   printk(KERN_INFO dump disk sections should be zero\n);
+   printk(KERN_INFO dump disk section = %d\n, ph-dump_disk_section);
+   printk(KERN_INFO block num = %ld\n, ph-block_num_dd);
+   printk(KERN_INFO number of blocks = %ld\n, ph-num_of_blocks_dd);
+   printk(KERN_INFO dump disk offset = %d\n, ph-offset_dd);
+   printk(KERN_INFO Max auto time= %d\n, ph-maxtime_to_auto);
+
+   /*set cpu state and hpte states as well scratch pad area */
+   printk(KERN_INFO  CPU AREA \n);
+   printk(KERN_INFO cpu dump_flags =%d\n, ph-cpu_data.dump_flags);
+   printk(KERN_INFO cpu source_type =%d\n, ph-cpu_data.source_type);
+   printk(KERN_INFO cpu error_flags =%d\n, ph-cpu_data.error_flags);
+   printk(KERN_INFO cpu source_address =%lx\n,
+   ph-cpu_data.source_address);
+   printk(KERN_INFO cpu source_length =%lx\n,
+   ph-cpu_data.source_length);
+   printk(KERN_INFO cpu length_copied =%lx\n,
+   ph-cpu_data.length_copied);
+
+   printk(KERN_INFO  HPTE AREA \n);
+   printk(KERN_INFO HPTE dump_flags =%d\n, ph-hpte_data.dump_flags);
+   printk(KERN_INFO HPTE source_type =%d\n, ph-hpte_data.source_type);
+   printk(KERN_INFO HPTE error_flags =%d\n, ph-hpte_data.error_flags);
+   printk(KERN_INFO HPTE source_address =%lx\n,
+   ph-hpte_data.source_address);
+   printk(KERN_INFO HPTE source_length =%lx\n,
+   ph-hpte_data.source_length);
+   printk(KERN_INFO HPTE length_copied =%lx\n,
+   ph-hpte_data.length_copied);
+
+   printk(KERN_INFO  SRSD AREA \n);
+   printk(KERN_INFO SRSD dump_flags =%d\n, ph-kernel_data.dump_flags);
+   printk(KERN_INFO SRSD source_type =%d\n, ph-kernel_data.source_type);
+   printk(KERN_INFO SRSD error_flags =%d\n, ph-kernel_data.error_flags);
+   printk(KERN_INFO SRSD source_address =%lx\n,
+   ph-kernel_data.source_address);
+   printk(KERN_INFO SRSD source_length =%lx\n,
+   ph-kernel_data.source_length);
+   printk(KERN_INFO SRSD length_copied =%lx\n,
+   ph-kernel_data.length_copied);
+#endif
+}
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
@@ -136,9 +191,11 @@ static void register_dump_area(struct ph
1, ph, sizeof(struct phyp_dump_header));
} while (rtas_busy_delay(rc));
 
-   if (rc)
+   if (rc) {
printk(KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+   print_dump_header(ph);
+   }
 }
 
 /* - */
@@ -247,8 +304,8 @@ static int __init phyp_dump_setup(void)
of_node_put(rtas);
}
 
+   print_dump_header(dump_header);
dump_area_length = init_dump_header(phdr);
-
/* align down */
dump_area_start = phyp_dump_info-init_reserve_start  PAGE_MASK;
 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.

2008-02-28 Thread Manish Ahuja

Routines to 
a. invalidate dump 
b. Calculate region that is reserved and needs to be freed. This is 
   exported through sysfs interface.

Unregister has been removed for now as it wasn't being used.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
-

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   83 ++---
 include/asm-powerpc/phyp_dump.h|3 +
 2 files changed, 80 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-28 
23:36:45.0 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-28 
23:36:47.0 -0600
@@ -71,6 +71,10 @@ static struct phyp_dump_header phdr;
 #define DUMP_SOURCE_CPU 0x0001
 #define DUMP_SOURCE_HPTE 0x0002
 #define DUMP_SOURCE_RMO  0x0011
+#define DUMP_ERROR_FLAG 0x2000
+#define DUMP_TRIGGERED 0x4000
+#define DUMP_PERFORMED 0x8000
+
 
 /**
  * init_dump_header() - initialize the header declaring a dump
@@ -182,9 +186,15 @@ static void print_dump_header(const stru
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
-   ph-cpu_data.destination_address += addr;
-   ph-hpte_data.destination_address += addr;
-   ph-kernel_data.destination_address += addr;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   /* ToDo Invalidate kdump and free memory range. */
 
do {
rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
@@ -198,6 +208,30 @@ static void register_dump_area(struct ph
}
 }
 
+static
+void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+   2, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc) {
+   printk(KERN_ERR phyp-dump: unexpected error (%d) 
+   on invalidate\n, rc);
+   print_dump_header(ph);
+   }
+}
+
 /* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -208,8 +242,8 @@ static void register_dump_area(struct ph
  * lmb_reserved in early boot. The released memory becomes
  * available for genreal use.
  */
-static void
-release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+static
+void release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
 {
struct page *rpage;
unsigned long end_pfn;
@@ -270,8 +304,29 @@ static ssize_t store_release_region(stru
return count;
 }
 
+static ssize_t show_release_region(struct kobject *kobj,
+   struct kobj_attribute *attr, char *buf)
+{
+   u64 second_addr_range;
+
+   /* total reserved size - start of scratch area */
+   second_addr_range = phyp_dump_info-init_reserve_size -
+   phyp_dump_info-reserved_scratch_size;
+   return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:
+DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n,
+   phdr.cpu_data.destination_address,
+   phdr.cpu_data.length_copied,
+   phdr.hpte_data.destination_address,
+   phdr.hpte_data.length_copied,
+   phdr.kernel_data.destination_address,
+   phdr.kernel_data.length_copied,
+   phyp_dump_info-init_reserve_start,
+   second_addr_range);
+}
+
 static struct kobj_attribute rr = __ATTR(release_region, 0600,
-NULL, store_release_region);
+   show_release_region,
+   store_release_region);
 
 static int __init phyp_dump_setup(void)
 {
@@ -314,6 +369,22 @@ static int __init phyp_dump_setup(void)
return 0;
}
 
+   /* re-register the dump area, if old dump was invalid */
+   if ((dump_header)  (dump_header-status  DUMP_ERROR_FLAG)) {
+   invalidate_last_dump(phdr, dump_area_start);
+   register_dump_area(phdr, dump_area_start);
+   return 0;
+   }
+
+   if (dump_header) {
+   phyp_dump_info-reserved_scratch_addr

[PATCH 7/8] pseries: phyp dump: Tracking memory range freed.

2008-02-28 Thread Manish Ahuja

This patch tracks the size freed. For now it does a simple
rudimentary calculation of the ranges freed. The idea is
to keep it simple at the external shell script level and 
send in large chunks for now.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
-

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   35 +
 1 file changed, 35 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-28 
23:36:47.0 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-28 
23:36:49.0 -0600
@@ -262,6 +262,39 @@ void release_memory_range(unsigned long 
}
 }
 
+/**
+ * track_freed_range -- Counts the range being freed.
+ * Once the counter goes to zero, it re-registers dump for
+ * future use.
+ */
+static void
+track_freed_range(unsigned long addr, unsigned long length)
+{
+   static unsigned long scratch_area_size, reserved_area_size;
+
+   if (addr  phyp_dump_info-init_reserve_start)
+   return;
+
+   if ((addr = phyp_dump_info-init_reserve_start) 
+   (addr = phyp_dump_info-init_reserve_start +
+phyp_dump_info-init_reserve_size))
+   reserved_area_size += length;
+
+   if ((addr = phyp_dump_info-reserved_scratch_addr) 
+   (addr = phyp_dump_info-reserved_scratch_addr +
+phyp_dump_info-reserved_scratch_size))
+   scratch_area_size += length;
+
+   if ((reserved_area_size == phyp_dump_info-init_reserve_size) 
+   (scratch_area_size == phyp_dump_info-reserved_scratch_size)) {
+
+   invalidate_last_dump(phdr,
+   phyp_dump_info-reserved_scratch_addr);
+   register_dump_area(phdr,
+   phyp_dump_info-reserved_scratch_addr);
+   }
+}
+
 /* - */
 /**
  * sysfs_release_region -- sysfs interface to release memory range.
@@ -286,6 +319,8 @@ static ssize_t store_release_region(stru
if (ret != 2)
return -EINVAL;
 
+   track_freed_range(start_addr, length);
+
/* Range-check - don't free any reserved memory that
 * wasn't reserved for phyp-dump */
if (start_addr  phyp_dump_info-init_reserve_start)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 8/8] pseries: phyp dump: config file

2008-02-28 Thread Manish Ahuja



Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

-
 arch/powerpc/Kconfig |   11 +++
 1 file changed, 11 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/Kconfig
===
--- 2.6.25-rc1.orig/arch/powerpc/Kconfig2008-02-18 03:22:06.0 
-0600
+++ 2.6.25-rc1/arch/powerpc/Kconfig 2008-02-18 03:22:45.0 -0600
@@ -306,6 +306,17 @@ config CRASH_DUMP
 
  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+   bool Hypervisor-assisted dump (EXPERIMENTAL)
+   depends on PPC_PSERIES  EXPERIMENTAL
+   default y
+   help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say Y
+
 config PPCBUG_NVRAM
bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC
default y if PPC_PREP
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump

2008-02-17 Thread Manish Ahuja
The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides 
documentation explaining what this is  :-) . Yes, its supposed
to be an improvement over kdump.

A list of open issues / todo list is included in the documentation.
It also appears that the not-yet-released firmware versions this was tested 
on are still, ahem, incomplete; this work is also pending.

I have included most of the changes requested. Although, I did find
one or two, fixed in a later patch file rather than the first location
they appeared at.

Also it now does not block any memory on machines other than power6 boxes
which have the requisite firmware. This is from a power5 box.

from jal-lp6 a power5 machine.
.
Phyp-dump not supported on this hardware
Using pSeries machine description
console [udbg-1] enabled
...

I think I incorporated everyones comments so far. 


-- Manish  Linas.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/8] pseries: phyp dump: Documentation

2008-02-17 Thread Manish Ahuja

Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Manish Ahuja [EMAIL PROTECTED]


 Documentation/powerpc/phyp-assisted-dump.txt |  127 +++
 1 file changed, 127 insertions(+)

Index: 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt 2008-02-18 
03:22:33.0 -0600
@@ -0,0 +1,127 @@
+
+   Hypervisor-Assisted Dump
+   
+   November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x1: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x1000, 0x1000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+ echo 0x4000 0x1000  /sys/kernel/release_region
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+--
+
+During boot, a check is made to see if firmware supports
+this feature on this particular machine. If it does, then
+we check to see if a active dump is waiting for us. If yes
+then everything but 256 MB of RAM is reserved during early
+boot. This area is released once we collect a dump from user
+land scripts that are run. If there is dump data, then
+the /sys/kernel/release_region file is created, and
+the reserved memory is held.
+
+If there is no waiting dump data, then only the highest
+256MB of the ram is reserved as a scratch area. This area
+is *not* be released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy
+of the low 256MB in the case a crash does occur. See,
+however, open issues below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+General notes:
+--
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead

[PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem

2008-02-17 Thread Manish Ahuja


Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo 0x4000 0x1000  /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |   81 +++--
 1 file changed, 76 insertions(+), 5 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-18 
03:23:47.0 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-18 
04:32:13.0 -0600
@@ -12,18 +12,23 @@
  */
 
 #include linux/init.h
+#include linux/kobject.h
 #include linux/mm.h
+#include linux/of.h
 #include linux/pfn.h
 #include linux/swap.h
+#include linux/sysfs.h
 
 #include asm/page.h
 #include asm/phyp_dump.h
 #include asm/machdep.h
+#include asm/rtas.h
 
 /* Global, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = phyp_dump_global;
 
+/* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -53,18 +58,84 @@ release_memory_range(unsigned long start
}
 }
 
-static int __init phyp_dump_setup(void)
+/* - */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   echo start addr length  /sys/kernel/release_region
+ *
+ * Example:
+ *   echo 0x4000 0x1000  /sys/kernel/release_region
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t store_release_region(struct kobject *kobj,
+   struct kobj_attribute *attr,
+   const char *buf, size_t count)
 {
+   unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+   ssize_t ret;
+
+   ret = sscanf(buf, %lx %lx, start_addr, length);
+   if (ret != 2)
+   return -EINVAL;
+
+   /* Range-check - don't free any reserved memory that
+* wasn't reserved for phyp-dump */
+   if (start_addr  phyp_dump_info-init_reserve_start)
+   start_addr = phyp_dump_info-init_reserve_start;
+
+   end_addr = phyp_dump_info-init_reserve_start +
+   phyp_dump_info-init_reserve_size;
+   if (start_addr+length  end_addr)
+   length = end_addr - start_addr;
+
+   /* Release the region of memory assed in by user */
+   start_pfn = PFN_DOWN(start_addr);
+   nr_pages = PFN_DOWN(length);
+   release_memory_range(start_pfn, nr_pages);
+
+   return count;
+}
+
+static struct kobj_attribute rr = __ATTR(release_region, 0600,
+NULL, store_release_region);
+
+static int __init phyp_dump_setup(void)
+{
+   struct device_node *rtas;
+   const int *dump_header = NULL;
+   int header_len = 0;
+   int rc;
 
/* If no memory was reserved in early boot, there is nothing to do */
if (phyp_dump_info-init_reserve_size == 0)
return 0;
 
-   /* Release memory that was reserved in early boot */
-   start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
-   nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
-   release_memory_range(start_pfn, nr_pages);
+   /* Return if phyp dump not supported */
+   if (!phyp_dump_info-phyp_dump_configured)
+   return -ENOSYS;
+
+   /* Is there dump data waiting for us? */
+   rtas = of_find_node_by_path(/rtas);
+   if (rtas) {
+   dump_header = of_get_property(rtas, ibm,kernel-dump,
+   header_len);
+   of_node_put(rtas);
+   }
+
+   if (dump_header == NULL)
+   return 0;
+
+   /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+   rc = sysfs_create_file(kernel_kobj, rr.attr);
+   if (rc) {
+   printk(KERN_ERR phyp-dump: unable to create sysfs file (%d)\n,
+   rc);
+   return 0;
+   }
 
return 0;
 }
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 4/8] pseries: phyp dump: register dump area.

2008-02-17 Thread Manish Ahuja

Set up the actual dump header, register it with the hypervisor.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |  137 +++--
 1 file changed, 131 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-18 
03:26:56.0 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-18 
04:30:28.0 -0600
@@ -28,6 +28,117 @@
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = phyp_dump_global;
 
+static int ibm_configure_kernel_dump;
+/* - */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+   u32 dump_flags;
+   u16 source_type;
+   u16 error_flags;
+   u64 source_address;
+   u64 source_length;
+   u64 length_copied;
+   u64 destination_address;
+};
+
+struct phyp_dump_header {
+   u32 version;
+   u16 num_of_sections;
+   u16 status;
+
+   u32 first_offset_section;
+   u32 dump_disk_section;
+   u64 block_num_dd;
+   u64 num_of_blocks_dd;
+   u32 offset_dd;
+   u32 maxtime_to_auto;
+   /* No dump disk path string used */
+
+   struct dump_section cpu_data;
+   struct dump_section hpte_data;
+   struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO  0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+   unsigned long addr_offset = 0;
+
+   /* Set up the dump header */
+   ph-version = DUMP_HEADER_VERSION;
+   ph-num_of_sections = NUM_DUMP_SECTIONS;
+   ph-status = 0;
+
+   ph-first_offset_section =
+   (u32)offsetof(struct phyp_dump_header, cpu_data);
+   ph-dump_disk_section = 0;
+   ph-block_num_dd = 0;
+   ph-num_of_blocks_dd = 0;
+   ph-offset_dd = 0;
+
+   ph-maxtime_to_auto = 0; /* disabled */
+
+   /* The first two sections are mandatory */
+   ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-cpu_data.source_type = DUMP_SOURCE_CPU;
+   ph-cpu_data.source_address = 0;
+   ph-cpu_data.source_length = phyp_dump_info-cpu_state_size;
+   ph-cpu_data.destination_address = addr_offset;
+   addr_offset += phyp_dump_info-cpu_state_size;
+
+   ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-hpte_data.source_type = DUMP_SOURCE_HPTE;
+   ph-hpte_data.source_address = 0;
+   ph-hpte_data.source_length = phyp_dump_info-hpte_region_size;
+   ph-hpte_data.destination_address = addr_offset;
+   addr_offset += phyp_dump_info-hpte_region_size;
+
+   /* This section describes the low kernel region */
+   ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-kernel_data.source_type = DUMP_SOURCE_RMO;
+   ph-kernel_data.source_address = PHYP_DUMP_RMR_START;
+   ph-kernel_data.source_length = PHYP_DUMP_RMR_END;
+   ph-kernel_data.destination_address = addr_offset;
+   addr_offset += ph-kernel_data.source_length;
+
+   return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+   1, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc)
+   printk(KERN_ERR phyp-dump: unexpected error (%d) on 
+   register\n, rc);
+}
+
 /* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -106,7 +217,9 @@ static struct kobj_attribute rr = __ATTR
 static int __init phyp_dump_setup(void)
 {
struct device_node *rtas;
-   const int *dump_header = NULL;
+   const struct phyp_dump_header *dump_header = NULL;
+   unsigned long dump_area_start;
+   unsigned long dump_area_length;
int header_len = 0;
int rc;
 
@@ -118,7 +231,13 @@ static int __init phyp_dump_setup(void)
if (!phyp_dump_info-phyp_dump_configured

[PATCH 5/8] pseries: phyp dump: debugging print routines.

2008-02-17 Thread Manish Ahuja

Provide some basic debugging support.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
-

 arch/powerpc/platforms/pseries/phyp_dump.c |   61 -
 1 file changed, 59 insertions(+), 2 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-18 
03:30:53.0 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-18 
04:25:19.0 -0600
@@ -122,6 +122,61 @@ static unsigned long init_dump_header(st
return addr_offset;
 }
 
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+#ifdef DEBUG
+   printk(KERN_INFO dump header:\n);
+   /* setup some ph-sections required */
+   printk(KERN_INFO version = %d\n, ph-version);
+   printk(KERN_INFO Sections = %d\n, ph-num_of_sections);
+   printk(KERN_INFO Status = 0x%x\n, ph-status);
+
+   /* No ph-disk, so all should be set to 0 */
+   printk(KERN_INFO Offset to first section 0x%x\n,
+   ph-first_offset_section);
+   printk(KERN_INFO dump disk sections should be zero\n);
+   printk(KERN_INFO dump disk section = %d\n, ph-dump_disk_section);
+   printk(KERN_INFO block num = %ld\n, ph-block_num_dd);
+   printk(KERN_INFO number of blocks = %ld\n, ph-num_of_blocks_dd);
+   printk(KERN_INFO dump disk offset = %d\n, ph-offset_dd);
+   printk(KERN_INFO Max auto time= %d\n, ph-maxtime_to_auto);
+
+   /*set cpu state and hpte states as well scratch pad area */
+   printk(KERN_INFO  CPU AREA \n);
+   printk(KERN_INFO cpu dump_flags =%d\n, ph-cpu_data.dump_flags);
+   printk(KERN_INFO cpu source_type =%d\n, ph-cpu_data.source_type);
+   printk(KERN_INFO cpu error_flags =%d\n, ph-cpu_data.error_flags);
+   printk(KERN_INFO cpu source_address =%lx\n,
+   ph-cpu_data.source_address);
+   printk(KERN_INFO cpu source_length =%lx\n,
+   ph-cpu_data.source_length);
+   printk(KERN_INFO cpu length_copied =%lx\n,
+   ph-cpu_data.length_copied);
+
+   printk(KERN_INFO  HPTE AREA \n);
+   printk(KERN_INFO HPTE dump_flags =%d\n, ph-hpte_data.dump_flags);
+   printk(KERN_INFO HPTE source_type =%d\n, ph-hpte_data.source_type);
+   printk(KERN_INFO HPTE error_flags =%d\n, ph-hpte_data.error_flags);
+   printk(KERN_INFO HPTE source_address =%lx\n,
+   ph-hpte_data.source_address);
+   printk(KERN_INFO HPTE source_length =%lx\n,
+   ph-hpte_data.source_length);
+   printk(KERN_INFO HPTE length_copied =%lx\n,
+   ph-hpte_data.length_copied);
+
+   printk(KERN_INFO  SRSD AREA \n);
+   printk(KERN_INFO SRSD dump_flags =%d\n, ph-kernel_data.dump_flags);
+   printk(KERN_INFO SRSD source_type =%d\n, ph-kernel_data.source_type);
+   printk(KERN_INFO SRSD error_flags =%d\n, ph-kernel_data.error_flags);
+   printk(KERN_INFO SRSD source_address =%lx\n,
+   ph-kernel_data.source_address);
+   printk(KERN_INFO SRSD source_length =%lx\n,
+   ph-kernel_data.source_length);
+   printk(KERN_INFO SRSD length_copied =%lx\n,
+   ph-kernel_data.length_copied);
+#endif
+}
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
@@ -134,9 +189,11 @@ static void register_dump_area(struct ph
1, ph, sizeof(struct phyp_dump_header));
} while (rtas_busy_delay(rc));
 
-   if (rc)
+   if (rc) {
printk(KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+   print_dump_header(ph);
+   }
 }
 
 /* - */
@@ -245,8 +302,8 @@ static int __init phyp_dump_setup(void)
of_node_put(rtas);
}
 
+   print_dump_header(dump_header);
dump_area_length = init_dump_header(phdr);
-
/* align down */
dump_area_start = phyp_dump_info-init_reserve_start  PAGE_MASK;
 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.

2008-02-17 Thread Manish Ahuja

Routines to 
a. invalidate dump 
b. Calculate region that is reserved and needs to be freed. This is 
   exported through sysfs interface.

Unregister has been removed for now as it wasn't being used.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
-

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   83 ++---
 include/asm-powerpc/phyp_dump.h|3 +
 2 files changed, 80 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-18 
04:25:19.0 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-18 
04:25:32.0 -0600
@@ -69,6 +69,10 @@ static struct phyp_dump_header phdr;
 #define DUMP_SOURCE_CPU 0x0001
 #define DUMP_SOURCE_HPTE 0x0002
 #define DUMP_SOURCE_RMO  0x0011
+#define DUMP_ERROR_FLAG 0x2000
+#define DUMP_TRIGGERED 0x4000
+#define DUMP_PERFORMED 0x8000
+
 
 /**
  * init_dump_header() - initialize the header declaring a dump
@@ -180,9 +184,15 @@ static void print_dump_header(const stru
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
-   ph-cpu_data.destination_address += addr;
-   ph-hpte_data.destination_address += addr;
-   ph-kernel_data.destination_address += addr;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   /* ToDo Invalidate kdump and free memory range. */
 
do {
rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
@@ -196,6 +206,30 @@ static void register_dump_area(struct ph
}
 }
 
+static
+void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+   2, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc) {
+   printk(KERN_ERR phyp-dump: unexpected error (%d) 
+   on invalidate\n, rc);
+   print_dump_header(ph);
+   }
+}
+
 /* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -206,8 +240,8 @@ static void register_dump_area(struct ph
  * lmb_reserved in early boot. The released memory becomes
  * available for genreal use.
  */
-static void
-release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+static
+void release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
 {
struct page *rpage;
unsigned long end_pfn;
@@ -268,8 +302,29 @@ static ssize_t store_release_region(stru
return count;
 }
 
+static ssize_t show_release_region(struct kobject *kobj,
+   struct kobj_attribute *attr, char *buf)
+{
+   u64 second_addr_range;
+
+   /* total reserved size - start of scratch area */
+   second_addr_range = phyp_dump_info-init_reserve_size -
+   phyp_dump_info-reserved_scratch_size;
+   return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:
+DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n,
+   phdr.cpu_data.destination_address,
+   phdr.cpu_data.length_copied,
+   phdr.hpte_data.destination_address,
+   phdr.hpte_data.length_copied,
+   phdr.kernel_data.destination_address,
+   phdr.kernel_data.length_copied,
+   phyp_dump_info-init_reserve_start,
+   second_addr_range);
+}
+
 static struct kobj_attribute rr = __ATTR(release_region, 0600,
-NULL, store_release_region);
+   show_release_region,
+   store_release_region);
 
 static int __init phyp_dump_setup(void)
 {
@@ -312,6 +367,22 @@ static int __init phyp_dump_setup(void)
return 0;
}
 
+   /* re-register the dump area, if old dump was invalid */
+   if ((dump_header)  (dump_header-status  DUMP_ERROR_FLAG)) {
+   invalidate_last_dump(phdr, dump_area_start);
+   register_dump_area(phdr, dump_area_start);
+   return 0;
+   }
+
+   if (dump_header) {
+   phyp_dump_info-reserved_scratch_addr

[PATCH 7/8] pseries: phyp dump: Tracking memory range freed.

2008-02-17 Thread Manish Ahuja

This patch tracks the size freed. For now it does a simple
rudimentary calculation of the ranges freed. The idea is
to keep it simple at the external shell script level and 
send in large chunks for now.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
-

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   35 +
 1 file changed, 35 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-18 
03:31:22.0 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-18 
03:31:30.0 -0600
@@ -260,6 +260,39 @@ void release_memory_range(unsigned long 
}
 }
 
+/**
+ * track_freed_range -- Counts the range being freed.
+ * Once the counter goes to zero, it re-registers dump for
+ * future use.
+ */
+static void
+track_freed_range(unsigned long addr, unsigned long length)
+{
+   static unsigned long scratch_area_size, reserved_area_size;
+
+   if (addr  phyp_dump_info-init_reserve_start)
+   return;
+
+   if ((addr = phyp_dump_info-init_reserve_start) 
+   (addr = phyp_dump_info-init_reserve_start +
+phyp_dump_info-init_reserve_size))
+   reserved_area_size += length;
+
+   if ((addr = phyp_dump_info-reserved_scratch_addr) 
+   (addr = phyp_dump_info-reserved_scratch_addr +
+phyp_dump_info-reserved_scratch_size))
+   scratch_area_size += length;
+
+   if ((reserved_area_size == phyp_dump_info-init_reserve_size) 
+   (scratch_area_size == phyp_dump_info-reserved_scratch_size)) {
+
+   invalidate_last_dump(phdr,
+   phyp_dump_info-reserved_scratch_addr);
+   register_dump_area(phdr,
+   phyp_dump_info-reserved_scratch_addr);
+   }
+}
+
 /* - */
 /**
  * sysfs_release_region -- sysfs interface to release memory range.
@@ -284,6 +317,8 @@ static ssize_t store_release_region(stru
if (ret != 2)
return -EINVAL;
 
+   track_freed_range(start_addr, length);
+
/* Range-check - don't free any reserved memory that
 * wasn't reserved for phyp-dump */
if (start_addr  phyp_dump_info-init_reserve_start)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 8/8] pseries: phyp dump: config file

2008-02-17 Thread Manish Ahuja


Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

-
 arch/powerpc/Kconfig |   11 +++
 1 file changed, 11 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/Kconfig
===
--- 2.6.25-rc1.orig/arch/powerpc/Kconfig2008-02-18 03:22:06.0 
-0600
+++ 2.6.25-rc1/arch/powerpc/Kconfig 2008-02-18 03:22:45.0 -0600
@@ -306,6 +306,17 @@ config CRASH_DUMP
 
  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+   bool Hypervisor-assisted dump (EXPERIMENTAL)
+   depends on PPC_PSERIES  EXPERIMENTAL
+   default y
+   help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say Y
+
 config PPCBUG_NVRAM
bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC
default y if PPC_PREP
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept

2008-02-14 Thread Manish Ahuja

Olof,

I will run it through checkpatch before resubmitting.

Thanks,
Manish



Olof Johansson wrote:
 On Thu, Feb 14, 2008 at 02:46:21PM +1100, Tony Breeds wrote:
 
 Hi Manish,
  Sorry for the minor nits but this should be:

 ---
  * Linas Vepstas, Manish Ahuja 2008
  * Copyright 2008 IBM Corp.
 ---

 You can optionally use the '??' symbol after word 'Copyright' but you
 shouldn't use '(c)' anymore.

 Also in at least one place you've misspelt Copyright
 
 If we're going to nitpick, then I'd like to point out that the whole
 series needs to be run through checkpatch and at least the whitespace
 issues should be taken care of.
 
 I'm still not convinced that this is a useful feature compared to
 hardening kdump, especially now that ehea can handle kexec/kdump (patch
 posted the other day). But in the end it's up to Paul if he wants to
 take it or not, not me.
 
 
 -Olof

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem

2008-02-14 Thread Manish Ahuja
Tony Breeds wrote:
 On Tue, Feb 12, 2008 at 01:11:58AM -0600, Manish Ahuja wrote:
 
 snip
 
 +static ssize_t
 +show_release_region(struct kset * kset, char *buf)
 +{
 +return sprintf(buf, ola\n);
 +}
 +
 +static struct subsys_attribute rr = __ATTR(release_region, 0600,
 + show_release_region,
 + store_release_region);
 
 Any reason this sysfs attribute can't be write only? The show method
 doesn't seem needed.

yes, its used later in the code.

 
 +static int __init phyp_dump_setup(void)
 +{
 
 snip
 
 +/* Is there dump data waiting for us? */
 +rtas = of_find_node_by_path(/rtas);
 +dump_header = of_get_property(rtas, ibm,kernel-dump, header_len);
 
 Hmm this isn't good.  You need to check rtas != NULL.


yes, will fix this as well.



 
 +if (dump_header == NULL) {
 +release_all();
 +return 0;
 +}
 +
 +/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
 +rc = subsys_create_file(kernel_subsys, rr);
 +if (rc) {
 +printk (KERN_ERR phyp-dump: unable to create sysfs file 
 (%d)\n, rc);
 +release_all();
 +return 0;
 +}
  
  return 0;
  }
 -
  subsys_initcall(phyp_dump_setup);
 
 Hmm I think this really should be a:
   machine_subsys_initcall(pseries, phyp_dump_setup)
 
 Yours Tony
 
   linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
   Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!
 

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.

2008-02-13 Thread Manish Ahuja

Changed asm to asm-powerpc.
Hopefully this was the last of them.

-Manish


Routines to 
a. invalidate dump 
b. Calculate region that is reserved and needs to be freed. This is 
   exported through sysfs interface.

Unregister has been removed for now as it wasn't being used.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
-

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   85 +
 include/asm-powerpc/phyp_dump.h|3 +
 2 files changed, 77 insertions(+), 11 deletions(-)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-13 
21:21:00.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-13 
21:21:48.0 -0600
@@ -69,6 +69,10 @@ static struct phyp_dump_header phdr;
 #define DUMP_SOURCE_CPU 0x0001
 #define DUMP_SOURCE_HPTE 0x0002
 #define DUMP_SOURCE_RMO  0x0011
+#define DUMP_ERROR_FLAG 0x2000
+#define DUMP_TRIGGERED 0x4000
+#define DUMP_PERFORMED 0x8000
+
 
 /**
  * init_dump_header() - initialize the header declaring a dump
@@ -180,9 +184,15 @@ static void print_dump_header(const stru
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
-   ph-cpu_data.destination_address += addr;
-   ph-hpte_data.destination_address += addr;
-   ph-kernel_data.destination_address += addr;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   /* ToDo Invalidate kdump and free memory range. */
 
do {
rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
@@ -195,6 +205,30 @@ static void register_dump_area(struct ph
}
 }
 
+static
+void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+  2, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc) {
+   printk (KERN_ERR phyp-dump: unexpected error (%d) 
+   on invalidate\n, rc);
+   print_dump_header(ph);
+   }
+}
+
 /* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -205,8 +239,8 @@ static void register_dump_area(struct ph
  * lmb_reserved in early boot. The released memory becomes
  * available for genreal use.
  */
-static void
-release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+static
+void release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
 {
struct page *rpage;
unsigned long end_pfn;
@@ -237,8 +271,8 @@ release_memory_range(unsigned long start
  *
  * will release 256MB starting at 1GB.
  */
-static ssize_t
-store_release_region(struct kset *kset, const char *buf, size_t count)
+static
+ssize_t store_release_region(struct kset *kset, const char *buf, size_t count)
 {
unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
@@ -266,10 +300,23 @@ store_release_region(struct kset *kset, 
return count;
 }
 
-static ssize_t
-show_release_region(struct kset * kset, char *buf)
+static ssize_t show_release_region(struct kset * kset, char *buf)
 {
-   return sprintf(buf, ola\n);
+   u64 second_addr_range;
+
+   /* total reserved size - start of scratch area */
+   second_addr_range = phyp_dump_info-init_reserve_size -
+   phyp_dump_info-reserved_scratch_size;
+   return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:
+DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n,
+   phdr.cpu_data.destination_address,
+   phdr.cpu_data.length_copied,
+   phdr.hpte_data.destination_address,
+   phdr.hpte_data.length_copied,
+   phdr.kernel_data.destination_address,
+   phdr.kernel_data.length_copied,
+   phyp_dump_info-init_reserve_start,
+   second_addr_range);
 }
 
 static struct subsys_attribute rr = __ATTR(release_region, 0600,
@@ -293,7 +340,6 @@ static int __init phyp_dump_setup(void)
if (!phyp_dump_info-phyp_dump_configured) {
return -ENOSYS

[PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept

2008-02-12 Thread Manish Ahuja

Michael,

Fixed.

-Manish


--
Initial patch for reserving memory in early boot, and freeing it later.
If the previous boot had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 arch/powerpc/kernel/prom.c |   50 
 arch/powerpc/kernel/rtas.c |   32 +
 arch/powerpc/platforms/pseries/Makefile|1 
 arch/powerpc/platforms/pseries/phyp_dump.c |   71 +
 include/asm-powerpc/phyp_dump.h|   38 +++
 include/asm-powerpc/rtas.h |3 +
 6 files changed, 195 insertions(+)

Index: 2.6.24-rc5/include/asm-powerpc/phyp_dump.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.24-rc5/include/asm-powerpc/phyp_dump.h  2008-02-12 16:12:45.0 
-0600
@@ -0,0 +1,38 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyright (c) 2007 IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END   (1UL28)
+
+struct phyp_dump {
+   /* Memory that is reserved during very early boot. */
+   unsigned long init_reserve_start;
+   unsigned long init_reserve_size;
+   /* Check status during boot if dump supported, active  present*/
+   unsigned long phyp_dump_configured;
+   unsigned long phyp_dump_is_active;
+   /* store cpu  hpte size */
+   unsigned long cpu_state_size;
+   unsigned long hpte_region_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-12 
16:12:45.0 -0600
@@ -0,0 +1,71 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyrhgit (c) 2007 IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include linux/init.h
+#include linux/mm.h
+#include linux/pfn.h
+#include linux/swap.h
+
+#include asm/page.h
+#include asm/phyp_dump.h
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+   struct page *rpage;
+   unsigned long end_pfn;
+   long i;
+
+   end_pfn = start_pfn + nr_pages;
+
+   for (i=start_pfn; i = end_pfn; i++) {
+   rpage = pfn_to_page(i);
+   if (PageReserved(rpage)) {
+   ClearPageReserved(rpage);
+   init_page_count(rpage);
+   __free_page(rpage);
+   totalram_pages++;
+   }
+   }
+}
+
+static int __init phyp_dump_setup(void)
+{
+   unsigned long start_pfn, nr_pages;
+
+   /* If no memory was reserved in early boot, there is nothing to do */
+   if (phyp_dump_info-init_reserve_size == 0)
+   return 0;
+
+   /* Release memory that was reserved in early boot */
+   start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
+   nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
+   release_memory_range(start_pfn, nr_pages);
+
+   return 0;
+}
+
+subsys_initcall(phyp_dump_setup);
Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/Makefile
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/Makefile 2008-02-12 
16:11:44.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/Makefile  2008-02-12 
16:12:45.0 -0600
@@ -18,3 +18,4

Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem

2008-02-12 Thread Manish Ahuja
As noted, its fixed in patch 4. 

If its okay for this time, I will prefer to leave it there.

-Manish


Stephen Rothwell wrote:
 Hi Manish,
 
 Just a small comment.
 
 On Tue, 12 Feb 2008 01:11:58 -0600 Manish Ahuja [EMAIL PROTECTED] wrote:
 +/* Is there dump data waiting for us? */
 +rtas = of_find_node_by_path(/rtas);
 +dump_header = of_get_property(rtas, ibm,kernel-dump, header_len);
 
 You need an of_node_put(rtas) here.
 
 +if (dump_header == NULL) {
 +release_all();
 +return 0;
 +}
 


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 4/8] pseries: phyp dump: register dump area.

2008-02-12 Thread Manish Ahuja
For now, if we can leave this patch as is, that will be great. That move 
requires me 
to work all remaining patches as they apply uncleanly after that.

I will bunch those two together functionally next time onwards.

Thanks,
Manish


Stephen Rothwell wrote:
 Hi Manish,
 
 -/* Is there dump data waiting for us? */
 +/* Is there dump data waiting for us? If there isn't,
 + * then register a new dump area, and release all of
 + * the rest of the reserved ram.
 + *
 + * The /rtas/ibm,kernel-dump rtas node is present only
 + * if there is dump data waiting for us.
 + */
  rtas = of_find_node_by_path(/rtas);
  dump_header = of_get_property(rtas, ibm,kernel-dump, header_len);
 +of_node_put(rtas);
 
 Oh, here is the of_node_put() - you should move that to patch 3.
 

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept

2008-02-11 Thread Manish Ahuja
Sorry,

I think i sent the wrong patch file, it shouldn't have my printk statement in 
there. Let me re-send 
the correct file and let me test it once more to make sure it does the right 
thing.

-Manish



Paul Mackerras wrote:
 Manish Ahuja writes:
 
 Initial patch for reserving memory in early boot, and freeing it later.
 If the previous boot had ended with a crash, the reserved memory would 
 contain
 a copy of the crashed kernel data.
 
 [snip]
 
 +static void __init reserve_crashed_mem(void)
 +{
 +unsigned long base, size;
 +
 +if (phyp_dump_info-phyp_dump_is_active) {
 +/* Reserve *everything* above RMR. We'll free this real soon.*/
 +base = PHYP_DUMP_RMR_END;
 +size = lmb_end_of_DRAM() - base;
 +
 +/* XXX crashed_ram_end is wrong, since it may be beyond
 +* the memory_limit, it will need to be adjusted. */
 +lmb_reserve(base, size);
 +
 +phyp_dump_info-init_reserve_start = base;
 +phyp_dump_info-init_reserve_size = size;
 +}
 +else {
 +size = phyp_dump_info-cpu_state_size +
 +phyp_dump_info-hpte_region_size +
 +PHYP_DUMP_RMR_END;
 +base = lmb_end_of_DRAM() - size;
 +printk(KERN_ERR Manish reserve regular kernel space is %ld %ld\n, 
 base, size);
 +lmb_reserve(base, size);
 
 This is still reserving memory even on systems that aren't running on
 pHyp at all.  Please rework this so that no memory is reserved if the
 system doesn't support phyp-assisted dump.
 
 Paul.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/8] pseries: phyp dump: Docmentation

2008-02-11 Thread Manish Ahuja
Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Manish Ahuja [EMAIL PROTECTED]


 Documentation/powerpc/phyp-assisted-dump.txt |  127 +++
 1 file changed, 127 insertions(+)

Index: 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt 2008-02-12 
06:38:25.0 -0600
@@ -0,0 +1,127 @@
+
+   Hypervisor-Assisted Dump
+   
+   November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x1: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x1000, 0x1000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+ echo 0x4000 0x1000  /sys/kernel/release_region
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+--
+
+During boot, a check is made to see if firmware supports
+this feature on this particular machine. If it does, then
+we check to see if a active dump is waiting for us. If yes
+then everything but 256 MB of RAM is reserved during early
+boot. This area is released once we collect a dump from user
+land scripts that are run. If there is dump data, then
+the /sys/kernel/release_region file is created, and
+the reserved memory is held.
+
+If there is no waiting dump data, then only the highest
+256MB of the ram is reserved as a scratch area. This area
+is *not* be released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy
+of the low 256MB in the case a crash does occur. See,
+however, open issues below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+General notes:
+--
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead

[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump

2008-02-11 Thread Manish Ahuja

The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides 
documentation explaining what this is :-). Yes, its supposed
to be an improvement over kdump.

A list of open issues / todo list is included in the documentation.
It also appears that the not-yet-released firmware versions this was tested 
on are still, ahem, incomplete; this work is also pending.

I have included most of the changes requested. Although, I did find
one or two, fixed in a later patch file rather than the first location
they appeared at.

Also it now does not block any memory on machines other than power6 boxes
which have the requisite firmware. This is from a power5 box.

from jal-lp6 a power5 machine.
.
Phyp-dump not supported on this hardware
Using pSeries machine description
console [udbg-1] enabled
...

Since I changed a few more things, I am reposting all the patches.

-- Manish  Linas.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 5/8] pseries: phyp dump: debugging print routines.

2008-02-11 Thread Manish Ahuja

Provide some basic debugging support.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepsts [EMAIL PROTECTED]
-

 arch/powerpc/platforms/pseries/phyp_dump.c |   64 +++--
 1 file changed, 60 insertions(+), 4 deletions(-)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-12 
06:13:01.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-12 
06:13:06.0 -0600
@@ -2,7 +2,7 @@
  * Hypervisor-assisted dump
  *
  * Linas Vepstas, Manish Ahuja 2007
- * Copyrhgit (c) 2007 IBM Corp.
+ * Copyright (c) 2007 IBM Corp.
  *
  *  This program is free software; you can redistribute it and/or
  *  modify it under the terms of the GNU General Public License
@@ -122,6 +122,61 @@ static unsigned long init_dump_header(st
return addr_offset;
 }
 
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+#ifdef DEBUG
+   printk(KERN_INFO dump header:\n);
+   /* setup some ph-sections required */
+   printk(KERN_INFO version = %d\n, ph-version);
+   printk(KERN_INFO Sections = %d\n, ph-num_of_sections);
+   printk(KERN_INFO Status = 0x%x\n, ph-status);
+
+   /* No ph-disk, so all should be set to 0 */
+   printk(KERN_INFO Offset to first section 0x%x\n,
+   ph-first_offset_section);
+   printk(KERN_INFO dump disk sections should be zero\n);
+   printk(KERN_INFO dump disk section = %d\n, ph-dump_disk_section);
+   printk(KERN_INFO block num = %ld\n, ph-block_num_dd);
+   printk(KERN_INFO number of blocks = %ld\n, ph-num_of_blocks_dd);
+   printk(KERN_INFO dump disk offset = %d\n, ph-offset_dd);
+   printk(KERN_INFO Max auto time= %d\n, ph-maxtime_to_auto);
+
+   /*set cpu state and hpte states as well scratch pad area */
+   printk(KERN_INFO  CPU AREA \n);
+   printk(KERN_INFO cpu dump_flags =%d\n, ph-cpu_data.dump_flags);
+   printk(KERN_INFO cpu source_type =%d\n, ph-cpu_data.source_type);
+   printk(KERN_INFO cpu error_flags =%d\n, ph-cpu_data.error_flags);
+   printk(KERN_INFO cpu source_address =%lx\n,
+   ph-cpu_data.source_address);
+   printk(KERN_INFO cpu source_length =%lx\n,
+   ph-cpu_data.source_length);
+   printk(KERN_INFO cpu length_copied =%lx\n,
+   ph-cpu_data.length_copied);
+
+   printk(KERN_INFO  HPTE AREA \n);
+   printk(KERN_INFO HPTE dump_flags =%d\n, ph-hpte_data.dump_flags);
+   printk(KERN_INFO HPTE source_type =%d\n, ph-hpte_data.source_type);
+   printk(KERN_INFO HPTE error_flags =%d\n, ph-hpte_data.error_flags);
+   printk(KERN_INFO HPTE source_address =%lx\n,
+   ph-hpte_data.source_address);
+   printk(KERN_INFO HPTE source_length =%lx\n,
+   ph-hpte_data.source_length);
+   printk(KERN_INFO HPTE length_copied =%lx\n,
+   ph-hpte_data.length_copied);
+
+   printk(KERN_INFO  SRSD AREA \n);
+   printk(KERN_INFO SRSD dump_flags =%d\n, ph-kernel_data.dump_flags);
+   printk(KERN_INFO SRSD source_type =%d\n, ph-kernel_data.source_type);
+   printk(KERN_INFO SRSD error_flags =%d\n, ph-kernel_data.error_flags);
+   printk(KERN_INFO SRSD source_address =%lx\n,
+   ph-kernel_data.source_address);
+   printk(KERN_INFO SRSD source_length =%lx\n,
+   ph-kernel_data.source_length);
+   printk(KERN_INFO SRSD length_copied =%lx\n,
+   ph-kernel_data.length_copied);
+#endif
+}
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
@@ -134,9 +189,9 @@ static void register_dump_area(struct ph
   1, ph, sizeof(struct phyp_dump_header));
} while (rtas_busy_delay(rc));
 
-   if (rc)
-   {
-   printk (KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+   if (rc) {
+   printk(KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+   print_dump_header(ph);
}
 }
 
@@ -238,6 +293,7 @@ static int __init phyp_dump_setup(void)
if (!phyp_dump_info-phyp_dump_configured) {
return -ENOSYS;
}
+   print_dump_header(dump_header);
 
/* Is there dump data waiting for us? If there isn't,
 * then register a new dump area, and release all of
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.

2008-02-11 Thread Manish Ahuja

Routines to 
a. invalidate dump 
b. Calculate region that is reserved and needs to be freed. This is 
   exported through sysfs interface.

Unregister has been removed for now as it wasn't being used.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
-

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   85 +
 include/asm/phyp_dump.h|6 +-
 2 files changed, 79 insertions(+), 12 deletions(-)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-12 
06:13:06.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-12 
06:13:17.0 -0600
@@ -69,6 +69,10 @@ static struct phyp_dump_header phdr;
 #define DUMP_SOURCE_CPU 0x0001
 #define DUMP_SOURCE_HPTE 0x0002
 #define DUMP_SOURCE_RMO  0x0011
+#define DUMP_ERROR_FLAG 0x2000
+#define DUMP_TRIGGERED 0x4000
+#define DUMP_PERFORMED 0x8000
+
 
 /**
  * init_dump_header() - initialize the header declaring a dump
@@ -180,9 +184,15 @@ static void print_dump_header(const stru
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
-   ph-cpu_data.destination_address += addr;
-   ph-hpte_data.destination_address += addr;
-   ph-kernel_data.destination_address += addr;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   /* ToDo Invalidate kdump and free memory range. */
 
do {
rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
@@ -195,6 +205,30 @@ static void register_dump_area(struct ph
}
 }
 
+static
+void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+  2, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc) {
+   printk (KERN_ERR phyp-dump: unexpected error (%d) 
+   on invalidate\n, rc);
+   print_dump_header(ph);
+   }
+}
+
 /* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -205,8 +239,8 @@ static void register_dump_area(struct ph
  * lmb_reserved in early boot. The released memory becomes
  * available for genreal use.
  */
-static void
-release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+static
+void release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
 {
struct page *rpage;
unsigned long end_pfn;
@@ -237,8 +271,8 @@ release_memory_range(unsigned long start
  *
  * will release 256MB starting at 1GB.
  */
-static ssize_t
-store_release_region(struct kset *kset, const char *buf, size_t count)
+static
+ssize_t store_release_region(struct kset *kset, const char *buf, size_t count)
 {
unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
@@ -266,10 +300,23 @@ store_release_region(struct kset *kset, 
return count;
 }
 
-static ssize_t
-show_release_region(struct kset * kset, char *buf)
+static ssize_t show_release_region(struct kset * kset, char *buf)
 {
-   return sprintf(buf, ola\n);
+   u64 second_addr_range;
+
+   /* total reserved size - start of scratch area */
+   second_addr_range = phyp_dump_info-init_reserve_size -
+   phyp_dump_info-reserved_scratch_size;
+   return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:
+DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n,
+   phdr.cpu_data.destination_address,
+   phdr.cpu_data.length_copied,
+   phdr.hpte_data.destination_address,
+   phdr.hpte_data.length_copied,
+   phdr.kernel_data.destination_address,
+   phdr.kernel_data.length_copied,
+   phyp_dump_info-init_reserve_start,
+   second_addr_range);
 }
 
 static struct subsys_attribute rr = __ATTR(release_region, 0600,
@@ -293,7 +340,6 @@ static int __init phyp_dump_setup(void)
if (!phyp_dump_info-phyp_dump_configured) {
return -ENOSYS;
}
-   print_dump_header(dump_header);
 
/* Is there dump data waiting for us? If there isn't,
 * then register a new dump

[PATCH 7/8] pseries: phyp dump: Tracking memory range freed.

2008-02-11 Thread Manish Ahuja
This patch tracks the size freed. For now it does a simple
rudimentary calculation of the ranges freed. The idea is
to keep it simple at the external shell script level and 
send in large chunks for now.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
-

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   35 +
 1 file changed, 35 insertions(+)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-12 
06:13:17.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-12 
06:13:21.0 -0600
@@ -259,6 +259,39 @@ void release_memory_range(unsigned long 
}
 }
 
+/**
+ * track_freed_range -- Counts the range being freed.
+ * Once the counter goes to zero, it re-registers dump for
+ * future use.
+ */
+static void
+track_freed_range(unsigned long addr, unsigned long length)
+{
+   static unsigned long scratch_area_size, reserved_area_size;
+
+   if (addr  phyp_dump_info-init_reserve_start)
+   return;
+
+   if ((addr = phyp_dump_info-init_reserve_start) 
+   (addr = phyp_dump_info-init_reserve_start +
+phyp_dump_info-init_reserve_size))
+   reserved_area_size += length;
+
+   if ((addr = phyp_dump_info-reserved_scratch_addr) 
+   (addr = phyp_dump_info-reserved_scratch_addr +
+phyp_dump_info-reserved_scratch_size))
+   scratch_area_size += length;
+
+   if ((reserved_area_size == phyp_dump_info-init_reserve_size) 
+   (scratch_area_size == phyp_dump_info-reserved_scratch_size)) {
+
+   invalidate_last_dump(phdr,
+   phyp_dump_info-reserved_scratch_addr);
+   register_dump_area (phdr,
+   phyp_dump_info-reserved_scratch_addr);
+   }
+}
+
 /* - */
 /**
  * sysfs_release_region -- sysfs interface to release memory range.
@@ -282,6 +315,8 @@ ssize_t store_release_region(struct kset
if (ret != 2)
return -EINVAL;
 
+   track_freed_range(start_addr, length);
+
/* Range-check - don't free any reserved memory that
 * wasn't reserved for phyp-dump */
if (start_addr  phyp_dump_info-init_reserve_start)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 8/8] pseries: phyp dump: config file

2008-02-11 Thread Manish Ahuja

Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

-
 arch/powerpc/Kconfig |   11 +++
 1 file changed, 11 insertions(+)

Index: 2.6.24-rc5/arch/powerpc/Kconfig
===
--- 2.6.24-rc5.orig/arch/powerpc/Kconfig2008-02-12 06:12:08.0 
-0600
+++ 2.6.24-rc5/arch/powerpc/Kconfig 2008-02-12 06:13:24.0 -0600
@@ -266,6 +266,17 @@ config CRASH_DUMP
 
  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+   bool Hypervisor-assisted dump (EXPERIMENTAL)
+   depends on PPC_PSERIES  EXPERIMENTAL
+   default y
+   help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say Y
+
 config PPCBUG_NVRAM
bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC
default y if PPC_PREP
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem

2008-02-11 Thread Manish Ahuja

Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-space tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo 0x4000 0x1000  /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |   88 +++--
 1 file changed, 82 insertions(+), 6 deletions(-)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-12 
06:12:37.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-12 
06:12:55.0 -0600
@@ -12,17 +12,24 @@
  */
 
 #include linux/init.h
+#include linux/kobject.h
 #include linux/mm.h
+#include linux/of.h
 #include linux/pfn.h
 #include linux/swap.h
+#include linux/sysfs.h
 
 #include asm/page.h
 #include asm/phyp_dump.h
+#include asm/rtas.h
 
 /* Global, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = phyp_dump_global;
 
+static int ibm_configure_kernel_dump;
+
+/* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -52,20 +59,89 @@ release_memory_range(unsigned long start
}
 }
 
-static int __init phyp_dump_setup(void)
+/* - */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   echo start addr length  /sys/kernel/release_region
+ *
+ * Example:
+ *   echo 0x4000 0x1000  /sys/kernel/release_region
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t
+store_release_region(struct kset *kset, const char *buf, size_t count)
 {
+   unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+   ssize_t ret;
+
+   ret = sscanf(buf, %lx %lx, start_addr, length);
+   if (ret != 2)
+   return -EINVAL;
+
+   /* Range-check - don't free any reserved memory that
+* wasn't reserved for phyp-dump */
+   if (start_addr  phyp_dump_info-init_reserve_start)
+   start_addr = phyp_dump_info-init_reserve_start;
+
+   end_addr = phyp_dump_info-init_reserve_start +
+   phyp_dump_info-init_reserve_size;
+   if (start_addr+length  end_addr)
+   length = end_addr - start_addr;
+
+   /* Release the region of memory assed in by user */
+   start_pfn = PFN_DOWN(start_addr);
+   nr_pages = PFN_DOWN(length);
+   release_memory_range (start_pfn, nr_pages);
+
+   return count;
+}
+
+static ssize_t
+show_release_region(struct kset * kset, char *buf)
+{
+   return sprintf(buf, ola\n);
+}
+
+static struct subsys_attribute rr = __ATTR(release_region, 0600,
+show_release_region,
+store_release_region);
+
+static int __init phyp_dump_setup(void)
+{
+   struct device_node *rtas;
+   const int *dump_header;
+   int header_len = 0;
+   int rc;
 
/* If no memory was reserved in early boot, there is nothing to do */
if (phyp_dump_info-init_reserve_size == 0)
return 0;
 
-   /* Release memory that was reserved in early boot */
-   start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
-   nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
-   release_memory_range(start_pfn, nr_pages);
+   /* Return if phyp dump not supported */
+   if (!phyp_dump_info-phyp_dump_configured) {
+   return -ENOSYS;
+   }
+
+   /* Is there dump data waiting for us? */
+   rtas = of_find_node_by_path(/rtas);
+   dump_header = of_get_property(rtas, ibm,kernel-dump, header_len);
+   if (dump_header == NULL) {
+   release_all();
+   return 0;
+   }
+
+   /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+   rc = subsys_create_file(kernel_subsys, rr);
+   if (rc) {
+   printk (KERN_ERR phyp-dump: unable to create sysfs file 
(%d)\n, rc);
+   release_all();
+   return 0;
+   }
 
return 0;
 }
-
 subsys_initcall(phyp_dump_setup);
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 4/8] pseries: phyp dump: register dump area.

2008-02-11 Thread Manish Ahuja

Set up the actual dump header, register it with the hypervisor.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |  136 +++--
 1 file changed, 129 insertions(+), 7 deletions(-)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-02-12 
06:12:55.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-02-12 
06:13:01.0 -0600
@@ -30,6 +30,117 @@ struct phyp_dump *phyp_dump_info = phyp
 static int ibm_configure_kernel_dump;
 
 /* - */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+   u32 dump_flags;
+   u16 source_type;
+   u16 error_flags;
+   u64 source_address;
+   u64 source_length;
+   u64 length_copied;
+   u64 destination_address;
+};
+
+struct phyp_dump_header {
+   u32 version;
+   u16 num_of_sections;
+   u16 status;
+
+   u32 first_offset_section;
+   u32 dump_disk_section;
+   u64 block_num_dd;
+   u64 num_of_blocks_dd;
+   u32 offset_dd;
+   u32 maxtime_to_auto;
+   /* No dump disk path string used */
+
+   struct dump_section cpu_data;
+   struct dump_section hpte_data;
+   struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO  0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+   unsigned long addr_offset = 0;
+
+   /* Set up the dump header */
+   ph-version = DUMP_HEADER_VERSION;
+   ph-num_of_sections = NUM_DUMP_SECTIONS;
+   ph-status = 0;
+
+   ph-first_offset_section =
+   (u32)offsetof(struct phyp_dump_header, cpu_data);
+   ph-dump_disk_section = 0;
+   ph-block_num_dd = 0;
+   ph-num_of_blocks_dd = 0;
+   ph-offset_dd = 0;
+
+   ph-maxtime_to_auto = 0; /* disabled */
+
+   /* The first two sections are mandatory */
+   ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-cpu_data.source_type = DUMP_SOURCE_CPU;
+   ph-cpu_data.source_address = 0;
+   ph-cpu_data.source_length = phyp_dump_info-cpu_state_size;
+   ph-cpu_data.destination_address = addr_offset;
+   addr_offset += phyp_dump_info-cpu_state_size;
+
+   ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-hpte_data.source_type = DUMP_SOURCE_HPTE;
+   ph-hpte_data.source_address = 0;
+   ph-hpte_data.source_length = phyp_dump_info-hpte_region_size;
+   ph-hpte_data.destination_address = addr_offset;
+   addr_offset += phyp_dump_info-hpte_region_size;
+
+   /* This section describes the low kernel region */
+   ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-kernel_data.source_type = DUMP_SOURCE_RMO;
+   ph-kernel_data.source_address = PHYP_DUMP_RMR_START;
+   ph-kernel_data.source_length = PHYP_DUMP_RMR_END;
+   ph-kernel_data.destination_address = addr_offset;
+   addr_offset += ph-kernel_data.source_length;
+
+   return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+  1, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc)
+   {
+   printk (KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+   }
+}
+
+/* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -113,7 +224,9 @@ static struct subsys_attribute rr = __AT
 static int __init phyp_dump_setup(void)
 {
struct device_node *rtas;
-   const int *dump_header;
+   const struct phyp_dump_header *dump_header;
+   unsigned long dump_area_start;
+   unsigned long dump_area_length;
int header_len = 0;
int rc;
 
@@ -126,22 +239,31 @@ static int __init phyp_dump_setup(void)
return -ENOSYS;
}
 
-   /* Is there dump data waiting for us

[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump

2008-01-22 Thread Manish Ahuja
The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides 
documentation explaining what this is   :-)  . Yes, its supposed
to be an improvement over kdump.

A list of open issues / todo list is included in the documentation.
It also appears that the not-yet-released firmware versions this was tested 
on are still, ahem, incomplete; this work is also pending.

I have included most of the changes requested. Although, I did find
one or two, fixed in a later patch file rather than the first location
they appeared at.

-- Manish  Linas.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump

2008-01-22 Thread Manish Ahuja
The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides 
documentation explaining what this is:-)   . Yes, its supposed
to be an improvement over kdump.

A list of open issues / todo list is included in the documentation.
It also appears that the not-yet-released firmware versions this was tested 
on are still, ahem, incomplete; this work is also pending.

I have included most of the changes requested. Although, I did find
one or two, fixed in a later patch file rather than the first location
they appeared at.

-- Manish  Linas.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem

2008-01-22 Thread Manish Ahuja

Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo 0x4000 0x1000  /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |  102 +++--
 1 file changed, 96 insertions(+), 6 deletions(-)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-01-18 
07:37:33.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-01-18 
22:43:00.0 -0600
@@ -12,17 +12,24 @@
  */
 
 #include linux/init.h
+#include linux/kobject.h
 #include linux/mm.h
+#include linux/of.h
 #include linux/pfn.h
 #include linux/swap.h
+#include linux/sysfs.h
 
 #include asm/page.h
 #include asm/phyp_dump.h
+#include asm/rtas.h
 
 /* Global, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = phyp_dump_global;
 
+static int ibm_configure_kernel_dump;
+
+/* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -52,20 +59,103 @@ release_memory_range(unsigned long start
}
 }
 
-static int __init phyp_dump_setup(void)
+/* - */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   echo start addr length  /sys/kernel/release_region
+ *
+ * Example:
+ *   echo 0x4000 0x1000  /sys/kernel/release_region
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t
+store_release_region(struct kset *kset, const char *buf, size_t count)
 {
+   unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+   ssize_t ret;
 
-   /* If no memory was reserved in early boot, there is nothing to do */
-   if (phyp_dump_info-init_reserve_size == 0)
-   return 0;
+   ret = sscanf(buf, %lx %lx, start_addr, length);
+   if (ret != 2)
+   return -EINVAL;
+
+   /* Range-check - don't free any reserved memory that
+* wasn't reserved for phyp-dump */
+   if (start_addr  phyp_dump_info-init_reserve_start)
+   start_addr = phyp_dump_info-init_reserve_start;
+
+   end_addr = phyp_dump_info-init_reserve_start +
+   phyp_dump_info-init_reserve_size;
+   if (start_addr+length  end_addr)
+   length = end_addr - start_addr;
+
+   /* Release the region of memory assed in by user */
+   start_pfn = PFN_DOWN(start_addr);
+   nr_pages = PFN_DOWN(length);
+   release_memory_range (start_pfn, nr_pages);
 
-   /* Release memory that was reserved in early boot */
+   return count;
+}
+
+static ssize_t
+show_release_region(struct kset * kset, char *buf)
+{
+   return sprintf(buf, ola\n);
+}
+
+static struct subsys_attribute rr = __ATTR(release_region, 0600,
+show_release_region,
+store_release_region);
+
+/* - */
+
+static void release_all (void)
+{
+   unsigned long start_pfn, nr_pages;
+
+   /* Release all memory that was reserved in early boot */
start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
release_memory_range(start_pfn, nr_pages);
+}
+
+static int __init phyp_dump_setup(void)
+{
+   struct device_node *rtas;
+   const int *dump_header;
+   int header_len = 0;
+   int rc;
+
+   /* If no memory was reserved in early boot, there is nothing to do */
+   if (phyp_dump_info-init_reserve_size == 0)
+   return 0;
+
+   /* Return if phyp dump not supported */
+   ibm_configure_kernel_dump = rtas_token(ibm,configure-kernel-dump);
+   if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) {
+   release_all();
+   return -ENOSYS;
+   }
+
+   /* Is there dump data waiting for us? */
+   rtas = of_find_node_by_path(/rtas);
+   dump_header = of_get_property(rtas, ibm,kernel-dump, header_len);
+   if (dump_header == NULL) {
+   release_all();
+   return 0;
+   }
+
+   /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+   rc = subsys_create_file(kernel_subsys, rr);
+   if (rc) {
+   printk (KERN_ERR phyp-dump: unable to create sysfs file 
(%d)\n, rc);
+   

Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept

2008-01-22 Thread Manish Ahuja
Reposted this one. I got the email id wrong in this one.

Sorry about that. 

Manish

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 5/8] pseries: phyp dump: debugging print routines.

2008-01-22 Thread Manish Ahuja

Provide some basic debugging support.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
-

 arch/powerpc/platforms/pseries/phyp_dump.c |   64 +++--
 1 file changed, 60 insertions(+), 4 deletions(-)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-01-21 
02:51:54.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-01-21 
02:58:41.0 -0600
@@ -2,7 +2,7 @@
  * Hypervisor-assisted dump
  *
  * Linas Vepstas, Manish Ahuja 2007
- * Copyrhgit (c) 2007 IBM Corp.
+ * Copyright (c) 2007 IBM Corp.
  *
  *  This program is free software; you can redistribute it and/or
  *  modify it under the terms of the GNU General Public License
@@ -122,6 +122,61 @@ static unsigned long init_dump_header(st
return addr_offset;
 }
 
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+#ifdef DEBUG
+   printk(KERN_INFO dump header:\n);
+   /* setup some ph-sections required */
+   printk(KERN_INFO version = %d\n, ph-version);
+   printk(KERN_INFO Sections = %d\n, ph-num_of_sections);
+   printk(KERN_INFO Status = 0x%x\n, ph-status);
+
+   /* No ph-disk, so all should be set to 0 */
+   printk(KERN_INFO Offset to first section 0x%x\n,
+   ph-first_offset_section);
+   printk(KERN_INFO dump disk sections should be zero\n);
+   printk(KERN_INFO dump disk section = %d\n, ph-dump_disk_section);
+   printk(KERN_INFO block num = %ld\n, ph-block_num_dd);
+   printk(KERN_INFO number of blocks = %ld\n, ph-num_of_blocks_dd);
+   printk(KERN_INFO dump disk offset = %d\n, ph-offset_dd);
+   printk(KERN_INFO Max auto time= %d\n, ph-maxtime_to_auto);
+
+   /*set cpu state and hpte states as well scratch pad area */
+   printk(KERN_INFO  CPU AREA \n);
+   printk(KERN_INFO cpu dump_flags =%d\n, ph-cpu_data.dump_flags);
+   printk(KERN_INFO cpu source_type =%d\n, ph-cpu_data.source_type);
+   printk(KERN_INFO cpu error_flags =%d\n, ph-cpu_data.error_flags);
+   printk(KERN_INFO cpu source_address =%lx\n,
+   ph-cpu_data.source_address);
+   printk(KERN_INFO cpu source_length =%lx\n,
+   ph-cpu_data.source_length);
+   printk(KERN_INFO cpu length_copied =%lx\n,
+   ph-cpu_data.length_copied);
+
+   printk(KERN_INFO  HPTE AREA \n);
+   printk(KERN_INFO HPTE dump_flags =%d\n, ph-hpte_data.dump_flags);
+   printk(KERN_INFO HPTE source_type =%d\n, ph-hpte_data.source_type);
+   printk(KERN_INFO HPTE error_flags =%d\n, ph-hpte_data.error_flags);
+   printk(KERN_INFO HPTE source_address =%lx\n,
+   ph-hpte_data.source_address);
+   printk(KERN_INFO HPTE source_length =%lx\n,
+   ph-hpte_data.source_length);
+   printk(KERN_INFO HPTE length_copied =%lx\n,
+   ph-hpte_data.length_copied);
+
+   printk(KERN_INFO  SRSD AREA \n);
+   printk(KERN_INFO SRSD dump_flags =%d\n, ph-kernel_data.dump_flags);
+   printk(KERN_INFO SRSD source_type =%d\n, ph-kernel_data.source_type);
+   printk(KERN_INFO SRSD error_flags =%d\n, ph-kernel_data.error_flags);
+   printk(KERN_INFO SRSD source_address =%lx\n,
+   ph-kernel_data.source_address);
+   printk(KERN_INFO SRSD source_length =%lx\n,
+   ph-kernel_data.source_length);
+   printk(KERN_INFO SRSD length_copied =%lx\n,
+   ph-kernel_data.length_copied);
+#endif
+}
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
@@ -134,9 +189,9 @@ static void register_dump_area(struct ph
   1, ph, sizeof(struct phyp_dump_header));
} while (rtas_busy_delay(rc));
 
-   if (rc)
-   {
-   printk (KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+   if (rc) {
+   printk(KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+   print_dump_header(ph);
}
 }
 
@@ -249,6 +304,7 @@ static int __init phyp_dump_setup(void)
release_all();
return -ENOSYS;
}
+   print_dump_header(dump_header);
 
/* Is there dump data waiting for us? If there isn't,
 * then register a new dump area, and release all of
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 6/8] pseries: phyp dump: Unregister and print dump areas.

2008-01-22 Thread Manish Ahuja

Routines to invalidate and unregister dump routines. 
Unregister has not been used yet, I will release another
patch for that at a later stage with the kdump integration patches.

There is also a routine which calculates the regions to be
freed and exports that through sysfs.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
-

---
 arch/powerpc/platforms/pseries/phyp_dump.c |  101 +
 include/asm/phyp_dump.h|3 
 2 files changed, 93 insertions(+), 11 deletions(-)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-01-21 
23:06:20.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-01-21 
23:49:10.0 -0600
@@ -69,6 +69,10 @@ static struct phyp_dump_header phdr;
 #define DUMP_SOURCE_CPU 0x0001
 #define DUMP_SOURCE_HPTE 0x0002
 #define DUMP_SOURCE_RMO  0x0011
+#define DUMP_ERROR_FLAG 0x2000
+#define DUMP_TRIGGERED 0x4000
+#define DUMP_PERFORMED 0x8000
+
 
 /**
  * init_dump_header() - initialize the header declaring a dump
@@ -180,9 +184,15 @@ static void print_dump_header(const stru
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
-   ph-cpu_data.destination_address += addr;
-   ph-hpte_data.destination_address += addr;
-   ph-kernel_data.destination_address += addr;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   /* ToDo Invalidate kdump and free memory range. */
 
do {
rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
@@ -195,6 +205,46 @@ static void register_dump_area(struct ph
}
 }
 
+static
+void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+
+   /* Add addr value if not initialized before */
+   if (ph-cpu_data.destination_address == 0) {
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+   }
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+  2, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc) {
+   printk (KERN_ERR phyp-dump: unexpected error (%d) 
+   on invalidate\n, rc);
+   print_dump_header(ph);
+   }
+}
+
+static void unregister_dump_area(struct phyp_dump_header *ph)
+{
+   int rc;
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+  3, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc) {
+   printk (KERN_ERR phyp-dump: unexpected error (%d) 
+   on unregister\n, rc);
+   print_dump_header(ph);
+   }
+}
+
 /* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -205,8 +255,8 @@ static void register_dump_area(struct ph
  * lmb_reserved in early boot. The released memory becomes
  * available for genreal use.
  */
-static void
-release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+static
+void release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
 {
struct page *rpage;
unsigned long end_pfn;
@@ -237,8 +287,8 @@ release_memory_range(unsigned long start
  *
  * will release 256MB starting at 1GB.
  */
-static ssize_t
-store_release_region(struct kset *kset, const char *buf, size_t count)
+static
+ssize_t store_release_region(struct kset *kset, const char *buf, size_t count)
 {
unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
@@ -266,10 +316,23 @@ store_release_region(struct kset *kset, 
return count;
 }
 
-static ssize_t
-show_release_region(struct kset * kset, char *buf)
+static ssize_t show_release_region(struct kset * kset, char *buf)
 {
-   return sprintf(buf, ola\n);
+   u64 second_addr_range;
+
+   /* total reserved size - start of scratch area */
+   second_addr_range = phyp_dump_info-init_reserve_size -
+   phyp_dump_info-reserved_scratch_size;
+   return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:
+DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n,
+   phdr.cpu_data.destination_address,
+   phdr.cpu_data.length_copied,
+   phdr.hpte_data.destination_address

[PATCH 7/8] pseries: phyp dump: Tracking memory range freed.

2008-01-22 Thread Manish Ahuja

This patch tracks the size freed. For now it does a simple
rudimentary calculation of the ranges freed. The idea is
to keep it simple at the external shell script level and 
send in large chunks for now.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
-

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   35 +
 1 file changed, 35 insertions(+)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-01-21 
23:30:18.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-01-21 
23:42:04.0 -0600
@@ -275,6 +275,39 @@ void release_memory_range(unsigned long 
}
 }
 
+/**
+ * track_freed_range -- Counts the range being freed.
+ * Once the counter goes to zero, it re-registers dump for
+ * future use.
+ */
+static void
+track_freed_range(unsigned long addr, unsigned long length)
+{
+   static unsigned long scratch_area_size, reserved_area_size;
+
+   if (addr  phyp_dump_info-init_reserve_start)
+   return;
+
+   if ((addr = phyp_dump_info-init_reserve_start) 
+   (addr = phyp_dump_info-init_reserve_start +
+phyp_dump_info-init_reserve_size))
+   reserved_area_size += length;
+
+   if ((addr = phyp_dump_info-reserved_scratch_addr) 
+   (addr = phyp_dump_info-reserved_scratch_addr +
+phyp_dump_info-reserved_scratch_size))
+   scratch_area_size += length;
+
+   if ((reserved_area_size == phyp_dump_info-init_reserve_size) 
+   (scratch_area_size == phyp_dump_info-reserved_scratch_size)) {
+
+   invalidate_last_dump(phdr,
+   phyp_dump_info-reserved_scratch_addr);
+   register_dump_area (phdr,
+   phyp_dump_info-reserved_scratch_addr);
+   }
+}
+
 /* - */
 /**
  * sysfs_release_region -- sysfs interface to release memory range.
@@ -298,6 +331,8 @@ ssize_t store_release_region(struct kset
if (ret != 2)
return -EINVAL;
 
+   track_freed_range(start_addr, length);
+
/* Range-check - don't free any reserved memory that
 * wasn't reserved for phyp-dump */
if (start_addr  phyp_dump_info-init_reserve_start)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 8/8] pseries: phyp dump: config file

2008-01-22 Thread Manish Ahuja
To: linuxppc-dev@ozlabs.org


Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

-
 arch/powerpc/Kconfig |   11 +++
 1 file changed, 11 insertions(+)

Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig
===
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig 2007-11-14 
16:39:20.0 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig  2007-11-15 14:27:33.0 
-0600
@@ -261,6 +261,17 @@ config CRASH_DUMP
 
  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+   bool Hypervisor-assisted dump (EXPERIMENTAL)
+   depends on PPC_PSERIES  EXPERIMENTAL
+   default y
+   help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say Y
+
 config PPCBUG_NVRAM
bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC
default y if PPC_PREP
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/8] pseries: phyp dump: Docmentation

2008-01-09 Thread Manish Ahuja
 
 I used the word actually.  I already know that it is intended to be
 faster.  :)
 
 it should blow it away, as, after all,
 it requires one less reboot!
 
 There's more than rebooting going on during system dump processing.
 Depending on the system type, booting may not be where most time is
 spent.
 
 
 As a side effect, the system is in
 production *while* the dump is being taken;
 
 A dubious feature IMO.  Seems that the design potentially trades
 reliability of first failure data capture for availability.
 E.g. system crashes, reboots, resumes processing while copying dump,
 crashes again before dump procedure is complete.  How is that handled,
 if at all?

This is a simple version. The intent was not to have a complex dump taking
mechanism in version 1. Subsequent versions will see planned improvement
on the way the pages are tracked and freed. 

Also it is very easily possible now, to register for another dump as soon as the
scratch area is copied to a user designated region. But for now this simple 
implementation
exists. 

It is also possible to extend this further to only preserve pages that are
kernel pages and free the non required pages like user/data pages etc. This
would reduce the space preserved and would prevent any issues that are
caused by reserving everything in memory except for the first 256 MB. 

Improvements and future versions are planned to make this efficient. But for
now the intent is to get this off the ground and handle simple cases.

 
 
 with kdump,
 you can't go into production until after the dump is finished,
 and the system has been rebooted a second time.  On
 systems with terabytes of RAM, the time difference can be
 hours.
 
 The difference in time it takes to resume the normal workload may be
 significant, yes.  But the time it takes to get a usable dump image
 would seem to be the basically the same.
 
 Since you bring up large systems... a system with terabytes of RAM is
 practically guaranteed to be a NUMA configuration with dozens of cpus.
 When processing a dump on such a system, I wonder how well we fare:
 can we successfully boot with (say) 128 cpus and 256MB of usable
 memory?  Do we have to hot-online nodes as system memory is freed up
 (and does that even work)?  We need to be able to restore the system
 to its optimal topology when the dump is finished; if the best we can
 do is a degraded configuration, the workload will suffer and the
 system admin is likely to just reboot the machine again so the kernel
 will have the right NUMA topology.
 
 
 +Implementation details:
 +--
 +In order for this scheme to work, memory needs to be reserved
 +quite early in the boot cycle. However, access to the device
 +tree this early in the boot cycle is difficult, and device-tree
 +access is needed to determine if there is a crash data waiting.
 I don't think this bit about early device tree access is correct.  By
 the time your code is reserving memory (from early_init_devtree(), I
 think), RTAS has been instantiated and you are able to test for the
 existence of /rtas/ibm,dump-kernel.
 If I remember right, it was still too early to look up this token directly,
 so we wrote some code to crawl the flat device tree to find it.  But
 not only was that a lot of work, but I somehow decided that doing this
 to the flat tree was wrong, as otherwise someone would surely have
 written the access code.  If this can be made to work, that would be
 great, but we couldn't make it work at the time.

 +To work around this problem, all but 256MB of RAM is reserved
 +during early boot. A short while later in boot, a check is made
 +to determine if there is dump data waiting. If there isn't,
 +then the reserved memory is released to general kernel use.
 So I think these gymnastics are unneeded -- unless I'm
 misunderstanding something, you should be able to determine very early
 whether to reserve that memory.
 Only if you can get at rtas, but you can't get at rtas at that point.
 
 Sorry, but I think you are mistaken (see Michael's earlier reply).
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@ozlabs.org
 https://ozlabs.org/mailman/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/8] pseries: phyp dump: Docmentation

2008-01-09 Thread Manish Ahuja
 It's in production with 256MB of RAM? Err. Sure as the dump progresses
 more RAM will be freed, but that's hardly production. I think Nathan's
 right, any sysadmin who wants predictability will probably double reboot
 anyway.

Thats a changeable parameter. Its something we chose for now. It by no means
is set in stone. Its not a design parameter. If you like to allocate 1GB we can.
But that is something we did for now. we expect this to be a variable value
dependent upon the size of the system. So if you have 128 GB system and you 
can spare 10 gb, you should be able to have 10 GB to boot with. 

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas.

2008-01-08 Thread Manish Ahuja
Stephen,

 +/* Add addr value if not initialized before */
 +if (ph-cpu_data.destination_address == 0) {
 +ph-cpu_data.destination_address += addr;
 
 Could be just '=' like further down, right?

Actually the one below should be += as well. Thanks for catching it.

 +/* total reserved size - start of scratch area */
 +second_addr_range = phdr.cpu_data.destination_address -
 +phyp_dump_info-init_reserve_size;
 +return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:
 + DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n,
 +  phdr.cpu_data.destination_address, phdr.cpu_data.length_copied,
 +  phdr.hpte_data.destination_address, phdr.hpte_data.length_copied,
 +  phdr.kernel_data.destination_address, phdr.kernel_data.length_copied,
 +  phyp_dump_info-init_reserve_start, second_addr_range);
 
 This indentation should be (probably) two tabs.

I kept it one with a few spaces as otherwise it was exceeding 80, I guess, I 
can just have one per line
and that should take care of that.

 
 +/* re-register the dump area, if old dump was invalid */
 +if ((dump_header)  (dump_header-status  DUMP_ERROR_FLAG)) {
 ^   ^
 Extra parentheses.

Just for clarity.. I would prefer that, if thats okay.

 
 +invalidate_last_dump (phdr, dump_area_start);
 +register_dump_area (phdr, dump_area_start);
 
 No spaces after function names.
 


Yeah, will take that out from here and other files as well.

Thanks,
Manish

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump

2008-01-07 Thread Manish Ahuja
The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides 
documentation explaining what this is  :-) . Yes, its supposed
to be an improvement over kdump.

The patches mostly work; a list of open issues / todo list
is included in the documentation.  It also appears that 
the not-yet-released firmware versions this was tested 
on are still, ahem, incomplete; this work is also pending.

-- Linas  Manish
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/8] pseries: phyp dump: Docmentation

2008-01-07 Thread Manish Ahuja

Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Manish Ahuja [EMAIL PROTECTED]


 Documentation/powerpc/phyp-assisted-dump.txt |  129 +++
 1 file changed, 129 insertions(+)

Index: 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt 2008-01-07 
18:05:46.0 -0600
@@ -0,0 +1,129 @@
+
+   Hypervisor-Assisted Dump
+   
+   November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x1: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x1000, 0x1000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+ echo 0x4000 0x1000  /sys/kernel/release_region
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+--
+In order for this scheme to work, memory needs to be reserved
+quite early in the boot cycle. However, access to the device
+tree this early in the boot cycle is difficult, and device-tree
+access is needed to determine if there is a crash data waiting.
+To work around this problem, all but 256MB of RAM is reserved
+during early boot. A short while later in boot, a check is made
+to determine if there is dump data waiting. If there isn't,
+then the reserved memory is released to general kernel use.
+If there is dump data, then the /sys/kernel/release_region
+file is created, and the reserved memory is held.
+
+If there is no waiting dump data, then all but 256MB of the
+reserved ram will be released for general kernel use. The
+highest 256 MB of RAM will *not* be released: this region
+will be kept permanently reserved, so that it can act as
+a receptacle for a copy of the low 256MB in the case a crash
+does occur. See, however, open issues below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+
+General notes:
+--
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+
+ o The various code paths that tell the hypervisor

[PATCH 2/8] pseries: phyp dump: config file

2008-01-07 Thread Manish Ahuja

Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

-
 arch/powerpc/Kconfig |   11 +++
 1 file changed, 11 insertions(+)

Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig
===
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig 2007-11-14 
16:39:20.0 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig  2007-11-15 14:27:33.0 
-0600
@@ -261,6 +261,17 @@ config CRASH_DUMP
 
  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+   bool Hypervisor-assisted dump (EXPERIMENTAL)
+   depends on PPC_PSERIES  EXPERIMENTAL
+   default y
+   help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say Y
+
 config PPCBUG_NVRAM
bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC
default y if PPC_PREP
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept

2008-01-07 Thread Manish Ahuja

Initial patch for reserving memory in early boot, and freeing it later.
If the previous boot had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 arch/powerpc/kernel/prom.c |   33 +
 arch/powerpc/platforms/pseries/Makefile|1 
 arch/powerpc/platforms/pseries/phyp_dump.c |   71 +
 include/asm-powerpc/phyp_dump.h|   32 +
 4 files changed, 137 insertions(+)

Index: linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h   2007-11-19 
17:44:21.0 -0600
@@ -0,0 +1,32 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyright (c) 2007 IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END   (1UL28)
+
+struct phyp_dump {
+   /* Memory that is reserved during very early boot. */
+   unsigned long init_reserve_start;
+   unsigned long init_reserve_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c
2007-11-19 19:07:49.0 -0600
@@ -0,0 +1,71 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyrhgit (c) 2007 IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include linux/init.h
+#include linux/mm.h
+#include linux/pfn.h
+#include linux/swap.h
+
+#include asm/page.h
+#include asm/phyp_dump.h
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+   struct page *rpage;
+   unsigned long end_pfn;
+   long i;
+
+   end_pfn = start_pfn + nr_pages;
+
+   for (i=start_pfn; i = end_pfn; i++) {
+   rpage = pfn_to_page(i);
+   if (PageReserved(rpage)) {
+   ClearPageReserved(rpage);
+   init_page_count(rpage);
+   __free_page(rpage);
+   totalram_pages++;
+   }
+   }
+}
+
+static int __init phyp_dump_setup(void)
+{
+   unsigned long start_pfn, nr_pages;
+
+   /* If no memory was reserved in early boot, there is nothing to do */
+   if (phyp_dump_info-init_reserve_size == 0)
+   return 0;
+
+   /* Release memory that was reserved in early boot */
+   start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
+   nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
+   release_memory_range(start_pfn, nr_pages);
+
+   return 0;
+}
+
+subsys_initcall(phyp_dump_setup);
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile
===
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/Makefile  
2007-11-19 17:43:52.0 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile   
2007-11-19 17:44:21.0 -0600
@@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu
 obj-$(CONFIG_HVC_CONSOLE)  += hvconsole.o
 obj-$(CONFIG_HVCS) += hvcserver.o
 obj-$(CONFIG_HCALL_STATS)  += hvCall_inst.o
+obj-$(CONFIG_PHYP_DUMP)+= phyp_dump.o
Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c

[PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem

2008-01-07 Thread Manish Ahuja

Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo 0x4000 0x1000  /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |  101 +++--
 1 file changed, 96 insertions(+), 5 deletions(-)

Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c   
2007-11-21 13:15:05.0 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
2007-11-21 13:24:30.0 -0600
@@ -12,17 +12,24 @@
  */
 
 #include linux/init.h
+#include linux/kobject.h
 #include linux/mm.h
+#include linux/of.h
 #include linux/pfn.h
 #include linux/swap.h
+#include linux/sysfs.h
 
 #include asm/page.h
 #include asm/phyp_dump.h
+#include asm/rtas.h
 
 /* Global, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = phyp_dump_global;
 
+static int ibm_configure_kernel_dump;
+
+/* - */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -52,18 +59,102 @@ release_memory_range(unsigned long start
}
 }
 
-static int __init phyp_dump_setup(void)
+/* - */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   echo start addr length  /sys/kernel/release_region
+ *
+ * Example:
+ *   echo 0x4000 0x1000  /sys/kernel/release_region
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t
+store_release_region(struct kset *kset, const char *buf, size_t count)
 {
+   unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+   ssize_t ret;
 
-   /* If no memory was reserved in early boot, there is nothing to do */
-   if (phyp_dump_info-init_reserve_size == 0)
-   return 0;
+   ret = sscanf(buf, %lx %lx, start_addr, length);
+   if (ret != 2)
+   return -EINVAL;
+
+   /* Range-check - don't free any reserved memory that
+* wasn't reserved for phyp-dump */
+   if (start_addr  phyp_dump_info-init_reserve_start)
+   start_addr = phyp_dump_info-init_reserve_start;
+
+   end_addr = phyp_dump_info-init_reserve_start +
+   phyp_dump_info-init_reserve_size;
+   if (start_addr+length  end_addr)
+   length = end_addr - start_addr;
+
+   /* Release the region of memory assed in by user */
+   start_pfn = PFN_DOWN(start_addr);
+   nr_pages = PFN_DOWN(length);
+   release_memory_range (start_pfn, nr_pages);
+
+   return count;
+}
+
+static ssize_t
+show_release_region(struct kset * kset, char *buf)
+{
+   return sprintf(buf, ola\n);
+}
+
+static struct subsys_attribute rr = __ATTR(release_region, 0600,
+show_release_region,
+store_release_region);
+
+/* - */
+
+static void release_all (void)
+{
+   unsigned long start_pfn, nr_pages;
 
-   /* Release memory that was reserved in early boot */
+   /* Release all memory that was reserved in early boot */
start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
release_memory_range(start_pfn, nr_pages);
+}
+
+static int __init phyp_dump_setup(void)
+{
+   struct device_node *rtas;
+   const int *dump_header;
+   int header_len = 0;
+   int rc;
+
+   /* If no memory was reserved in early boot, there is nothing to do */
+   if (phyp_dump_info-init_reserve_size == 0)
+   return 0;
+
+   /* Return if phyp dump not supported */
+   ibm_configure_kernel_dump = rtas_token(ibm,configure-kernel-dump);
+   if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) {
+   release_all();
+   return -ENOSYS;
+   }
+
+   /* Is there dump data waiting for us? */
+   rtas = of_find_node_by_path(/rtas);
+   dump_header = of_get_property(rtas, ibm,kernel-dump, header_len);
+   if (dump_header == NULL) {
+   release_all();
+   return 0;
+   }
+
+   /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+   rc = subsys_create_file(kernel_subsys, rr);
+   if (rc) {
+   printk (KERN_ERR phyp-dump: unable to create 

[PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept

2008-01-07 Thread Manish Ahuja

Initial patch for reserving memory in early boot, and freeing it later.
If the previous boot had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 arch/powerpc/kernel/prom.c |   33 +
 arch/powerpc/platforms/pseries/Makefile|1 
 arch/powerpc/platforms/pseries/phyp_dump.c |   71 +
 include/asm-powerpc/phyp_dump.h|   32 +
 4 files changed, 137 insertions(+)

Index: linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h   2007-11-19 
17:44:21.0 -0600
@@ -0,0 +1,32 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyright (c) 2007 IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END   (1UL28)
+
+struct phyp_dump {
+   /* Memory that is reserved during very early boot. */
+   unsigned long init_reserve_start;
+   unsigned long init_reserve_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c
2007-11-19 19:07:49.0 -0600
@@ -0,0 +1,71 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyrhgit (c) 2007 IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include linux/init.h
+#include linux/mm.h
+#include linux/pfn.h
+#include linux/swap.h
+
+#include asm/page.h
+#include asm/phyp_dump.h
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+   struct page *rpage;
+   unsigned long end_pfn;
+   long i;
+
+   end_pfn = start_pfn + nr_pages;
+
+   for (i=start_pfn; i = end_pfn; i++) {
+   rpage = pfn_to_page(i);
+   if (PageReserved(rpage)) {
+   ClearPageReserved(rpage);
+   init_page_count(rpage);
+   __free_page(rpage);
+   totalram_pages++;
+   }
+   }
+}
+
+static int __init phyp_dump_setup(void)
+{
+   unsigned long start_pfn, nr_pages;
+
+   /* If no memory was reserved in early boot, there is nothing to do */
+   if (phyp_dump_info-init_reserve_size == 0)
+   return 0;
+
+   /* Release memory that was reserved in early boot */
+   start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start);
+   nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size);
+   release_memory_range(start_pfn, nr_pages);
+
+   return 0;
+}
+
+subsys_initcall(phyp_dump_setup);
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile
===
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/Makefile  
2007-11-19 17:43:52.0 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile   
2007-11-19 17:44:21.0 -0600
@@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu
 obj-$(CONFIG_HVC_CONSOLE)  += hvconsole.o
 obj-$(CONFIG_HVCS) += hvcserver.o
 obj-$(CONFIG_HCALL_STATS)  += hvCall_inst.o
+obj-$(CONFIG_PHYP_DUMP)+= phyp_dump.o
Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c

[PATCH 5/8] pseries: phyp dump: register dump area.

2008-01-07 Thread Manish Ahuja


Set up the actual dump header, register it with the hypervisor.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

--
 arch/powerpc/platforms/pseries/phyp_dump.c |  169 +++--
 1 file changed, 163 insertions(+), 6 deletions(-)

Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c   
2007-11-21 15:55:37.0 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
2007-11-21 16:06:52.0 -0600
@@ -30,6 +30,134 @@ struct phyp_dump *phyp_dump_info = phyp
 static int ibm_configure_kernel_dump;
 
 /* - */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+   u32 dump_flags;
+   u16 source_type;
+   u16 error_flags;
+   u64 source_address;
+   u64 source_length;
+   u64 length_copied;
+   u64 destination_address;
+};
+
+struct phyp_dump_header {
+   u32 version;
+   u16 num_of_sections;
+   u16 status;
+
+   u32 first_offset_section;
+   u32 dump_disk_section;
+   u64 block_num_dd;
+   u64 num_of_blocks_dd;
+   u32 offset_dd;
+   u32 maxtime_to_auto;
+   /* No dump disk path string used */
+
+   struct dump_section cpu_data;
+   struct dump_section hpte_data;
+   struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO  0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+   struct device_node *rtas;
+   const unsigned int *sizes;
+   int len;
+   unsigned long cpu_state_size = 0;
+   unsigned long hpte_region_size = 0;
+   unsigned long addr_offset = 0;
+
+   /* Get the required dump region sizes */
+   rtas = of_find_node_by_path(/rtas);
+   sizes = of_get_property(rtas, ibm,configure-kernel-dump-sizes, len);
+   if (!sizes || len  20)
+   return 0;
+
+   if (sizes[0] == 1)
+   cpu_state_size = *((unsigned long *) sizes[1]);
+
+   if (sizes[3] == 2)
+   hpte_region_size = *((unsigned long *) sizes[4]);
+
+   /* Set up the dump header */
+   ph-version = DUMP_HEADER_VERSION;
+   ph-num_of_sections = NUM_DUMP_SECTIONS;
+   ph-status = 0;
+
+   ph-first_offset_section =
+   (u32) (((struct phyp_dump_header *) 0)-cpu_data);
+   ph-dump_disk_section = 0;
+   ph-block_num_dd = 0;
+   ph-num_of_blocks_dd = 0;
+   ph-offset_dd = 0;
+
+   ph-maxtime_to_auto = 0; /* disabled */
+
+   /* The first two sections are mandatory */
+   ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-cpu_data.source_type = DUMP_SOURCE_CPU;
+   ph-cpu_data.source_address = 0;
+   ph-cpu_data.source_length = cpu_state_size;
+   ph-cpu_data.destination_address = addr_offset;
+   addr_offset += cpu_state_size;
+
+   ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-hpte_data.source_type = DUMP_SOURCE_HPTE;
+   ph-hpte_data.source_address = 0;
+   ph-hpte_data.source_length = hpte_region_size;
+   ph-hpte_data.destination_address = addr_offset;
+   addr_offset += hpte_region_size;
+
+   /* This section describes the low kernel region */
+   ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+   ph-kernel_data.source_type = DUMP_SOURCE_RMO;
+   ph-kernel_data.source_address = PHYP_DUMP_RMR_START;
+   ph-kernel_data.source_length = PHYP_DUMP_RMR_END;
+   ph-kernel_data.destination_address = addr_offset;
+   addr_offset += ph-kernel_data.source_length;
+
+   return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+   int rc;
+   ph-cpu_data.destination_address += addr;
+   ph-hpte_data.destination_address += addr;
+   ph-kernel_data.destination_address += addr;
+
+   do {
+   rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+  1, ph, sizeof(struct phyp_dump_header));
+   } while (rtas_busy_delay(rc));
+
+   if (rc)
+   {
+   printk (KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+   }
+}
+
+/* - */
 /**
  * release_memory_range -- release memory previously

[PATCH 6/8] pseries: phyp dump: debugging print routines.

2008-01-07 Thread Manish Ahuja


Provide some basic debugging support.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
Signed-off-by: Linas Vepsts [EMAIL PROTECTED]
-

 arch/powerpc/platforms/pseries/phyp_dump.c |   53 -
 1 file changed, 52 insertions(+), 1 deletion(-)

Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-01-01 
23:24:10.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-01-01 
23:24:27.0 -0600
@@ -2,7 +2,7 @@
  * Hypervisor-assisted dump
  *
  * Linas Vepstas, Manish Ahuja 2007
- * Copyrhgit (c) 2007 IBM Corp.
+ * Copyright (c) 2007 IBM Corp.
  *
  *  This program is free software; you can redistribute it and/or
  *  modify it under the terms of the GNU General Public License
@@ -139,6 +139,51 @@ static unsigned long init_dump_header(st
return addr_offset;
 }
 
+#ifdef DEBUG
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+   printk(KERN_INFO dump header:\n);
+   /* setup some ph-sections required */
+   printk(KERN_INFO version = %d\n, ph-version);
+   printk(KERN_INFO Sections = %d\n, ph-num_of_sections);
+   printk(KERN_INFO Status = 0x%x\n, ph-status);
+
+   /* No ph-disk, so all should be set to 0 */
+   printk(KERN_INFO Offset to first section 0x%x\n, 
ph-first_offset_section);
+   printk(KERN_INFO dump disk sections should be zero\n);
+   printk(KERN_INFO dump disk section = %d\n,ph-dump_disk_section);
+   printk(KERN_INFO block num = %ld\n,ph-block_num_dd);
+   printk(KERN_INFO number of blocks = %ld\n,ph-num_of_blocks_dd);
+   printk(KERN_INFO dump disk offset = %d\n,ph-offset_dd);
+   printk(KERN_INFO Max auto time= %d\n,ph-maxtime_to_auto);
+
+   /*set cpu state and hpte states as well scratch pad area */
+   printk(KERN_INFO  CPU AREA \n);
+   printk(KERN_INFO cpu dump_flags =%d\n,ph-cpu_data.dump_flags);
+   printk(KERN_INFO cpu source_type =%d\n,ph-cpu_data.source_type);
+   printk(KERN_INFO cpu error_flags =%d\n,ph-cpu_data.error_flags);
+   printk(KERN_INFO cpu source_address 
=%lx\n,ph-cpu_data.source_address);
+   printk(KERN_INFO cpu source_length =%lx\n,ph-cpu_data.source_length);
+   printk(KERN_INFO cpu length_copied =%lx\n,ph-cpu_data.length_copied);
+
+   printk(KERN_INFO  HPTE AREA \n);
+   printk(KERN_INFO HPTE dump_flags =%d\n,ph-hpte_data.dump_flags);
+   printk(KERN_INFO HPTE source_type =%d\n,ph-hpte_data.source_type);
+   printk(KERN_INFO HPTE error_flags =%d\n,ph-hpte_data.error_flags);
+   printk(KERN_INFO HPTE source_address 
=%lx\n,ph-hpte_data.source_address);
+   printk(KERN_INFO HPTE source_length 
=%lx\n,ph-hpte_data.source_length);
+   printk(KERN_INFO HPTE length_copied 
=%lx\n,ph-hpte_data.length_copied);
+
+   printk(KERN_INFO  SRSD AREA \n);
+   printk(KERN_INFO SRSD dump_flags =%d\n,ph-kernel_data.dump_flags);
+   printk(KERN_INFO SRSD source_type =%d\n,ph-kernel_data.source_type);
+   printk(KERN_INFO SRSD error_flags =%d\n,ph-kernel_data.error_flags);
+   printk(KERN_INFO SRSD source_address 
=%lx\n,ph-kernel_data.source_address);
+   printk(KERN_INFO SRSD source_length 
=%lx\n,ph-kernel_data.source_length);
+   printk(KERN_INFO SRSD length_copied 
=%lx\n,ph-kernel_data.length_copied);
+}
+#endif
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
int rc;
@@ -154,6 +199,9 @@ static void register_dump_area(struct ph
if (rc)
{
printk (KERN_ERR phyp-dump: unexpected error (%d) on 
register\n, rc);
+#ifdef DEBUG
+   print_dump_header (ph);
+#endif
}
 }
 
@@ -271,6 +319,9 @@ static int __init phyp_dump_setup(void)
release_all();
return -ENOSYS;
}
+#ifdef DEBUG
+   print_dump_header (dump_header);
+#endif
 
/* Is there dump data waiting for us? If there isn't,
 * then register a new dump area, and release all of
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 8/8] pseries: phyp dump: Tracking memory range freed.

2008-01-07 Thread Manish Ahuja


This patch tracks the size freed. For now it does a simple
rudimentary calculation of the ranges freed. The idea is
to keep it simple at the external shell script level and 
send in large chunks for now.

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
-

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   36 +
 include/asm-powerpc/phyp_dump.h|3 ++
 2 files changed, 39 insertions(+)

Index: 2.6.24-rc5/include/asm-powerpc/phyp_dump.h
===
--- 2.6.24-rc5.orig/include/asm-powerpc/phyp_dump.h 2008-01-07 
22:55:28.0 -0600
+++ 2.6.24-rc5/include/asm-powerpc/phyp_dump.h  2008-01-07 22:58:02.0 
-0600
@@ -24,6 +24,9 @@ struct phyp_dump {
/* Memory that is reserved during very early boot. */
unsigned long init_reserve_start;
unsigned long init_reserve_size;
+   /* Scratch area memory details */
+   unsigned long scratch_reserve_start;
+   unsigned long scratch_reserve_size;
 };
 
 extern struct phyp_dump *phyp_dump_info;
Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c
===
--- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c  2008-01-07 
22:57:27.0 -0600
+++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c   2008-01-07 
22:58:02.0 -0600
@@ -287,6 +287,39 @@ release_memory_range(unsigned long start
}
 }
 
+/**
+ * track_freed_range -- Counts the range being freed.
+ * Once the counter goes to zero, it re-registers dump for
+ * future use.
+ */
+static void
+track_freed_range(unsigned long addr, unsigned long length)
+{
+   static unsigned long scratch_area_size, reserved_area_size;
+
+   if (addr  phyp_dump_info-init_reserve_start)
+   return;
+
+   if ((addr = phyp_dump_info-init_reserve_start) 
+   (addr = phyp_dump_info-init_reserve_start +
+phyp_dump_info-init_reserve_size))
+   reserved_area_size += length;
+
+   if ((addr = phyp_dump_info-scratch_reserve_start) 
+   (addr = phyp_dump_info-scratch_reserve_start +
+phyp_dump_info-scratch_reserve_size))
+   scratch_area_size += length;
+
+   if ((reserved_area_size == phyp_dump_info-init_reserve_start) 
+   (scratch_area_size == phyp_dump_info-scratch_reserve_size)) {
+
+   invalidate_last_dump(phdr,
+   phyp_dump_info-scratch_reserve_start);
+   register_dump_area (phdr,
+   phyp_dump_info-scratch_reserve_start);
+   }
+}
+
 /* - */
 /**
  * sysfs_release_region -- sysfs interface to release memory range.
@@ -310,6 +343,8 @@ store_release_region(struct kset *kset, 
if (ret != 2)
return -EINVAL;
 
+   track_freed_range(start_addr, length);
+
/* Range-check - don't free any reserved memory that
 * wasn't reserved for phyp-dump */
if (start_addr  phyp_dump_info-init_reserve_start)
@@ -414,6 +449,7 @@ static int __init phyp_dump_setup(void)
}
 
/* Don't allow user to release the 256MB scratch area */
+   /* this might be wrong */
phyp_dump_info-init_reserve_size = free_area_length;
 
/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept

2008-01-07 Thread Manish Ahuja
Arnd,

Sorry this patch ended up out of sequence. I reposted it properly again in the 
other thread.

We did talk about using gmail address, but Linas was more comfortable using 
this as this is
where he was, when we did this. Hence the use of Austin address with gmail 
being on the cc list.

I am sure he will chime in with more details about it when he gets the 
opportunity.

Thanks,
Manish



Arnd Bergmann wrote:
 On Tuesday 08 January 2008, Manish Ahuja wrote:
 
 Initial patch for reserving memory in early boot, and freeing it later.
 If the previous boot had ended with a crash, the reserved memory would 
 contain
 a copy of the crashed kernel data.

 Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
 Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
 
 I think the signed-off-by chain needs to be modified. The way it appears,
 you handled the patch first, then sent it to Linas, who forwarded it
 to whoever will take the patches from the list.
 
 This obviously isn't true, since you are actually the one who is sending
 out the patches. Moreover, I believe that the [EMAIL PROTECTED]
 address is now dead, and shouldn't be used for this any more.
 
 So, depending on which of you two wrote the majority of a patch, I think
 it should be either
 
 | Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
 | Acked-by: Linas Vepstas [EMAIL PROTECTED]
 
 or 
 
 | From: Linas Vepstas [EMAIL PROTECTED]
 | Signed-off-by: Manish Ahuja [EMAIL PROTECTED]
 
   Arnd 

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Infinite loop/always true check possible with unsigned counter.

2007-07-10 Thread Manish Ahuja
Paul Mackerras wrote:
 Andreas Schwab writes:

   
 ??? There is no rgn-cnt involved in the comparison.
 
 Look further down in lmb_add_region; there is a second for loop that
 does

 for (i = rgn-cnt-1; i = 0; i--)
   
 Which is exactly the one quoted above.  I still don't see your point.
 

 You're right - my mistake.

 Paul.
   
I presume the patch is good then. Do I need to change anything ?

Thanks,
Manish
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] Infinite loop/always true check possible with unsigned counter.

2007-07-09 Thread Manish Ahuja
Fix to correct a possible infinite loop or an always true check when the 
unsigned long counter i is used in

lmb_add_region() in the following for loop:

for (i = rgn-cnt-1; i = 0; i--)

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]

---
 arch/powerpc/mm/lmb.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: 2.6.22-rc4/arch/powerpc/mm/lmb.c
===
--- 2.6.22-rc4.orig/arch/powerpc/mm/lmb.c   2007-06-11 21:10:46.0 
-0500
+++ 2.6.22-rc4/arch/powerpc/mm/lmb.c2007-07-06 21:47:40.0 -0500
@@ -138,8 +138,8 @@ void __init lmb_analyze(void)
 static long __init lmb_add_region(struct lmb_region *rgn, unsigned long base,
  unsigned long size)
 {
-   unsigned long i, coalesced = 0;
-   long adjacent;
+   unsigned long coalesced = 0;
+   long adjacent, i;
 
/* First try and coalesce this LMB with another. */
for (i=0; i  rgn-cnt; i++) {
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Infinite loop/always true check possible with unsigned counter.

2007-07-09 Thread Manish Ahuja

Repost to fix my email id.

Fix to correct a possible infinite loop or an always true check when the 
unsigned long counter i is used in

lmb_add_region() in the following for loop:

for (i = rgn-cnt-1; i = 0; i--)

Signed-off-by: Manish Ahuja [EMAIL PROTECTED]


---
 arch/powerpc/mm/lmb.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: 2.6.22-rc4/arch/powerpc/mm/lmb.c
===
--- 2.6.22-rc4.orig/arch/powerpc/mm/lmb.c   2007-06-11 21:10:46.0 
-0500
+++ 2.6.22-rc4/arch/powerpc/mm/lmb.c2007-07-06 21:47:40.0 -0500
@@ -138,8 +138,8 @@ void __init lmb_analyze(void)
 static long __init lmb_add_region(struct lmb_region *rgn, unsigned long base,
  unsigned long size)
 {
-   unsigned long i, coalesced = 0;
-   long adjacent;
+   unsigned long coalesced = 0;
+   long adjacent, i;
 
/* First try and coalesce this LMB with another. */
for (i=0; i  rgn-cnt; i++) {
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] pseries: Re: Minor: Removed double return.

2007-07-07 Thread Manish Ahuja
Ah yes, my mistake. Does it require a repost then ?

Thanks,
Manish

Linas Vepstas wrote:
 You want to say its a patch in the subject line.

 --linas

 On Fri, Jul 06, 2007 at 04:59:55PM -0500, Manish Ahuja wrote:
   
 Found 2 instances of return one right after each other in 
 arch_add_memory(). This minor patch fixes it.
 Signed-off-by:Manish Ahuja [EMAIL PROTECTED]


 

   
 Index: 2.6.22-rc4/arch/powerpc/mm/mem.c
 ===
 --- 2.6.22-rc4.orig/arch/powerpc/mm/mem.c2007-06-11 21:10:46.0 
 -0500
 +++ 2.6.22-rc4/arch/powerpc/mm/mem.c 2007-06-29 22:52:42.0 -0500
 @@ -129,8 +129,6 @@
  zone = pgdata-node_zones;
  
  return __add_pages(zone, start_pfn, nr_pages);
 -
 -return 0;
  }
  
  /*
 

   

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev